A protein family is a group of evolutionarily-related proteins. In many cases a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term protein family should not be confused with family as it is used in taxonomy.
Proteins in a family descend from a common ancestor (see homology) and typically have similar three-dimensional structures, functions, and significant sequence similarity. The most important of these is sequence similarity (usually amino acid sequence) since it is the strictest indicator of homology and therefore the clearest indicator of common ancestry. There is a fairly well developed framework for evaluating the significance of similarity between a group of sequences using sequence alignment methods. Proteins that do not share a common ancestor are very unlikely to show statistically significant sequence similarity, making sequence alignment a powerful tool for identifying the members of protein families.
Families are sometimes grouped together into larger clades called superfamilies based on structural and mechanistic similarity, even if there is no identifiable sequence homology.
Currently, over 60,000 protein families have been defined, although ambiguity in the definition of protein family leads different researchers to wildly varying numbers.
As with many biological terms, the use of protein family is somewhat context dependent; it may indicate large groups of proteins with the lowest possible level of detectable sequence similarity, or very narrow groups of proteins with almost identical sequence, function, and three-dimensional structure, or any kind of group in-between. To distinguish between these situations, the term protein superfamily is often used for distantly related proteins whose relatedness is not detectable by sequence similarity, but only from shared structural features. Other terms such as protein class, group, clan and sub-family have been coined over the years, but all suffer similar ambiguities of usage. A common usage is that superfamilies (structural homology) contain families (sequence homology) which contain sub-families. Hence a superfamily, such as the PA clan of proteases, has far lower sequence conservation than one of the families it contains, the C04 family. It is unlikely that an exact definition will be agreed and to it is up to the reader to discern exactly how these terms are being used in a particular context.