Filename extension | .tar |
---|---|
Internet media type | application/x-tar |
Uniform Type Identifier (UTI) | public.tar-archive |
Magic number |
u s t a r \0 0 0 at byte offset 257 (absent in pre-POSIX versions) |
Latest release |
various
(various) |
Type of format | File archiver |
Standard | POSIX since POSIX.1, presently in the definition of pax[1] |
Open format? | Yes |
In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from (t)ape (ar)chive, as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, time stamps, ownership, file access permissions, and directory organization. The command line utility was first introduced in the seventh edition of unix (v7) in January 1979, replacing the tp program. The file structure to store this information was later standardized in POSIX.1-1988 and later POSIX.1-2001. and became a format supported by most modern file archiving systems.
Many historic tape drives read and write variable-length data blocks, leaving significant wasted space on the tape between blocks (for the tape to physically start and stop moving). Some tape drives (and raw disks) only support fixed-length data blocks. Also, when writing to any medium such as a filesystem or network, it takes less time to write one large block than many small blocks. Therefore, the tar command writes data in blocks of many 512 byte records. The user can specify a blocking factor, which is the number of records per block; the default is 20, producing 10 kilobyte blocks (which was large when UNIX was created, but now seems rather small).
A tar archive consists of a series of file objects, hence the popular term tarball, referencing how a tarball collects objects of all kinds that stick to its surface. Each file object includes any file data, and is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes. The original tar implementation did not care about the contents of the padding bytes, and left the buffer data unaltered, but most modern tar implementations fill the extra space with zeros. The end of an archive is marked by at least two consecutive zero-filled records. (The origin of tar's record size appears to be the 512-byte disk sectors used in the Version 7 Unix file system.) The final block of an archive is padded out to full length with zeros.