Information entropy is defined as the average amount of information produced by a probabilistic stochastic source of data.
The measure of information entropy associated with each possible data value is the negative logarithm of the probability mass function for the value. Thus, when the data source has a lower-probability value (i.e., when a low-probability event occurs), the event carries more "information" ("surprisal") than when the source data has a higher-probability value. The amount of information conveyed by each event defined in this way becomes a random variable whose expected value is the information entropy. Generally, entropy refers to disorder or uncertainty, and the definition of entropy used in information theory is directly analogous to the definition used in statistical thermodynamics. The concept of information entropy was introduced by Claude Shannon in his 1948 paper "A Mathematical Theory of Communication".
The basic model of a data communication system is composed of three elements, a source of data, a channel, and a receiver, and – as expressed by Shannon, who essentially single-handedly created the field of information theory – the "fundamental problem of communication" is for the receiver to be able to identify what data was generated by the source, based on the signal it receives through the channel. The entropy provides an absolute limit on the shortest possible average length of a lossless compression encoding of the data produced by a source, and if the entropy of the source is less than the channel capacity of the communication channel, the data generated by the source can be reliably communicated to the receiver (at least in theory, possibly neglecting some practical considerations such as the complexity of the system needed to convey the data and the amount of time it may take for the data to be conveyed).