In the context of IBM mainframe computers, a data set (IBM preferred) or dataset is a computer file having a record organization. Use of this term began with OS/360 and is still used by its successors, including the current z/OS. Documentation for these systems historically preferred this term rather than file.
A data set is typically stored on a direct access storage device (DASD) or magnetic tape, however unit record devices, such as punch card readers, card punch, and line printers can provide input/output (I/O) for a data set (file).
Data sets are not unstructured streams of bytes, but rather are organized in various logical record and block structures determined by the DSORG
(data set organization), RECFM
(record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with Job Control Language DD
statements. Inside a job they are stored in the Data Control Block (DCB), which is a data structure used to access data sets, for example using access methods.
For OS/360, the DCB's DSORG
parameter specifies how the data set is organized. It may be physically sequential ("PS"), indexed sequential ("IS"), partitioned ("PO"), or Direct Access ("DA"). Data sets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated.
Programmers utilize various access methods (such as QSAM or VSAM) in programs for reading and writing data sets. Access method depends on the given data set organization.
Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the data set. This is specified in the DCB RECFM
parameter. RECFM=F
means that the records are of fixed length, specified via the LRECL
parameter, and RECFM=V
specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes. With RECFM=FB
and RECFM=VB
, multiple logical records are grouped together into a single physical block on tape or disk. FB and VB are fixed-blocked
, and variable-blocked
, respectively. The BLKSIZE
parameter specifies the maximum length of the block. RECFM=FBS
could be also specified, meaning fixed-blocked standard
, meaning all the blocks except the last one were required to be in full BLKSIZE
length. RECFM=VBS
, or variable-blocked spanned
, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.