In information theory, systems are modeled by a transmitter, channel, and receiver. The transmitter produces messages that are sent through the channel. The channel modifies the message in some way. The receiver attempts to infer which message was sent. In this context, entropy (more specifically, ) is the expected value (average) of the information contained in each message. 'Messages' can be modeled by any flow of information.
In a more technical sense, there are reasons (explained below) to define information as the negative of the logarithm of the probability distribution of possible events or messages. The amount of information of every event forms a random variable whose expected value, or average, is the Shannon entropy. Units of entropy are the shannon, nat, or hartley, depending on the base of the logarithm used to define it, though the shannon is commonly referred to as a bit.
The logarithm of the probability distribution is useful as a measure of entropy because it is additive for independent sources. For instance, the entropy of a coin toss is 1 shannon, whereas of m tosses it is m shannons. Generally, you need log2(n) bits to represent a variable that can take one of n values if n is a power of 2. If these values are equally probable, the entropy (in shannons) is equal to the number of bits. Equality between number of bits and shannons holds only while all outcomes are equally probable. If one of the events is more probable than others, observation of that event is less informative. Conversely, rarer events provide more information when observed. Since observation of less probable events occurs more rarely, the net effect is that the entropy (thought of as average information) received from non-uniformly distributed data is less than log2(n). Entropy is zero when one outcome is certain. Shannon entropy quantifies all these considerations exactly when a probability distribution of the source is known. The meaning of the events observed (the meaning of messages) does not matter in the definition of entropy. Entropy only takes into account the probability of observing a specific event, so the information it encapsulates is information about the underlying probability distribution, not the meaning of the events themselves.
Generally, entropy refers to disorder or uncertainty. Shannon entropy was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication". Shannon entropy provides an absolute limit on the best possible average length of lossless encoding or compression of an information source. Rényi entropy generalizes Shannon entropy.