A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.
Genes are categorized into families based on shared nucleotide or protein sequences. Phylogenetic techniques can be used as a more rigorous test. The positions of exons within the coding sequence can be used to infer common ancestry. Knowing the sequence of the protein encoded by a gene can allow researchers to apply methods that find similarities among protein sequences that provide more information than similarities or differences among DNA sequences.
If the genes of a gene family encode proteins, the term protein family is often used in an analogous manner to gene family.
The expansion or contraction of gene families along a specific lineage can be due to chance, or can be the result of natural selection. To distinguish between these two cases is often difficult in practice. Recent work uses a combination of statistical models and algorithmic techniques to detect gene families that are under the effect of natural selection.
The HUGO Gene Nomenclature Committee (HGNC) creates nomenclature schemes using a "stem" (or "root") symbol for members of a gene family, with a hierarchical numbering system to distinguish the individual members. For example, for the peroxiredoxin family, PRDX is the root symbol, and the family members are PRDX1, PRDX2, PRDX3, PRDX4, PRDX5, and PRDX6.