*** Welcome to piglix ***

Gap penalty


The gap penalty is a scoring system used in bioinformatics for aligning a small portion of genetic code, more accurately, fragmented genetic sequence, also termed, reads against a reference genetic sequence (e.g. The Human Genome). The biological process of protein synthesis namely, transcription and or DNA replication can produce errors resulting in mutations in the final nucleic acid sequence. Therefore, in order to make more accurate decisions in aligning reads, mutations are annotated as gaps in the sequence. Gaps are penalised via various Gap Penalty scoring methods. Gaps in a DNA sequence result from either insertions or deletions in the sequence, sometimes referred to as indels. Insertions or deletions occur due to single mutations, unbalanced crossover in meiosis, slipped strand mispairing in the replication process and chromosomal translocation. In alignments gaps are represented as contiguous dashes on a protein/DNA sequence alignment. The scoring that occurs in Gap Penalty allows for the optimisation of sequence alignment in order to obtain the best alignment possible based on the information available. The three main types of gap penalties are constant, linear and affine gap penalty.

The notion of a gap in an alignment is important in many biological applications, since the insertions or deletions comprise an entire sub-sequence and often occur from a single mutational event. Furthermore, single mutational events can create gaps of different sizes. Therefore, when scoring, the gaps need to be scored as a whole when aligning two sequences of DNA. Considering multiple gaps in a sequence as a larger single gap will reduce the assignment of a high cost to the mutations. For instance, two protein sequences may be relatively similar however, may differ at certain intervals as one protein may have a different subunit compared to the other. Representing these differing sub-sequences as gaps will allow us to treat these cases as “good matches” even though there are long consecutive runs with indel operations in the sequence. Therefore, using a good gap penalty model will avoid low scores in alignments and improve the chances of finding a true alignment.


...
Wikipedia

...