*** Welcome to piglix ***

Variant Call Format


The Variant Call Format (VCF) specifies the format of a text file used in bioinformatics for storing gene sequence variations. The format has been developed with the advent of large-scale genotyping and DNA sequencing projects, such as the 1000 Genomes Project. Existing formats for genetic data such as General feature format (GFF) stored all of the genetic data, much of which is redundant because it will be shared across the genomes. By using the variant call format only the variations need to be stored along with a reference genome.

The standard is currently in version 4.3, although the 1000 Genomes Project has developed their own specification for structural variations such as duplications, which are not easily accommodated into the existing schema. A set of tools is also available for editing and manipulating the files.

The header begins the file and provides metadata describing the body of the file. Header lines are denoted as starting with #. Special keywords in the header are denoted with ##. Recommended keywords include fileformat, fileDate and reference.

The header contains keywords that optionally semantically and syntactically describe the fields used in the body of the file, notably INFO, FILTER, and FORMAT (see below).

The body of VCF follows the header, and is tab separated into 8 mandatory columns and an unlimited number of optional columns that may be used to record other information about the sample(s). When additional columns are used, the first of optional column is used to describe the format of the data in the columns that follow.

Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):


The VCF specification is no longer maintained by the 1000 Genomes Project. The group leading the management and expansion of the format is the Global Alliance for Genomics and Health Data Working group file format team, http://ga4gh.org/#/fileformats-team


...
Wikipedia

...