An information source yields symbols from a finite alphabet. A sequence of symbols are commonly referenced as a text. An example of an information source is a dice that is thrown repeatedly. This source will generate a text from the alphabet .
To be able to measure information in one or more symbols from a source, we must know what is meant by information. Intuitively, a seldom occurring symbol will deliver more information than one seen often. A measure for the information found by the occurrence of a given symbol, must be the inverse of the probability of the occurrence.
Shannon defined the information from the occurrence of symbol as
where is the probability that the symbol occurs.
In [80], the logarithm is explained by a wish that multiplied probability gives summed information, and a supporting example is given.
Going one step further, we may talk about average information in a text, or entropy. Entropy is defined as
If we use a base two logarithm, the entropy is a theoretical lower limit for the average number of bits per symbol that is required to encode the stream of symbols.
In the above formulas, each occurrence of a symbol is taken as a statistically independent event. In reality, the probability of the occurrence of one symbol often depends on the occurrence of another. These so called Markov sources, require extended formulas for entropy, [12]. Markov models and general data compression are treated thoroughly in [81].