Byte addressing refers to hardware architectures which support accessing individual bytes of data rather than only larger units called words, which would be word-addressable. Such computers are sometimes called byte machines (in contrast to word machines).
The basic unit of digital storage is called a bit, storing a single 0 or 1.
Many common architectures can address more than 8 bits of data at a time. For example, the Intel 386SX processor can handle 16-bit (two-byte) data, since data is transferred over a 16-bit bus. However, data in memory may be of various lengths.
A 64-bit architecture machine might still need to access 8-bit data over its 64-bit address line, and have the data returned in the bottom 8 bits of its longer data line.
Byte addressable memory refers to architectures where data can be accessed and addressed in units that are narrower than the bus. An eight bit processor like the Intel 8008 addresses eight bits, but as this is the full width of the bus, this is regarded as word addressable. The 386SX, which addresses memory in 8 bit units but can fetch and store it 16 bits at a time, is termed byte addressable.
Bytes have not always meant 8 bits, in fact, depending on the platform, byte sizes of 1 to 48 bits have been used in the past. Therefore, the term "octet" is used where the context makes byte-length ambiguous.
For example in the 1980s, Honeywell mainframes had 36 bit words, and were byte addressable in 9 bit bytes, or "nonets", They used 7- or 8-bit character codes, either of which were stored one to each 9-bit byte, making characters individually addressable.
To illustrate why byte addressing is useful, consider the IBM 7094 which is word addressable and has no concept of a byte. It has 36 bit words, and stores its six-bit character codes six to a word.
To change the 15th character in a string, the program has to determine that this is the fourth character of the third word in the string, fetch the third word, mask out the old value of the fourth character from the value held in the register, "or" in the new one, and then store back the amended word. At least six machine instructions. Usually these are relegated to a subroutine, so every store or fetch of a single character involves the overhead of calling a subroutine and returning.