End-sequence profiling (ESP) (sometimes “Paired-end mapping (PEM)”) is a method based on sequence-tagged connectors developed to facilitate genome sequencing to identify high-resolution copy number and structural aberration such as inversion and translocation.
Briefly, the target genomic DNA is isolated and partially digested with restriction enzymes into large fragments. Following size-fractionation, the fragments is cloned into plasmids to construct artificial chromosome such as bacterial artificial chromosomes (BAC) which will be sequenced and compared to the reference genome. The differences, including orientation and length variations between constructed chromosomes and the reference genome, will suggest copy number and structural aberration.
Before analyzing target genome structural aberration and copy number variation (CNV) with ESP, the target genome is usually amplified and conserved with artificial chromosome construction. The classic strategy to construct an artificial chromosome is bacterial artificial chromosome (BAC). Basically, the target chromosome is randomly digested and inserted into plasmids which are transformed and cloned in bacteria. The size of fragments inserted is 150–350 kb. Another commonly used artificial chromosome is fosmid. The difference between BAC and fosmids is the size of the DNA inserted. Fosmids can only hold 40 kb DNA fragments, which allows a more accurate breakpoint determination.
End sequence profiling (ESP) can be used to detect structural variations such as insertions, deletions, and chromosomal rearrangement. Compare to other methods that look at chromosomal abnormalities, ESP is particularly useful to identify copy neutral abnormalities such as inversions and translocations that would not be apparent when looking at copy number variation. From the BAC library, both ends of the inserted fragments are sequenced using a sequencing platform. Detection of variations is then achieved by mapping the sequenced reads onto a reference genome.
Inversions and translocations are relatively easy to detect by an invalid pair of sequenced-end. For instance, a translocation can be detected if the paired-ends are mapped onto different chromosomes on the reference genome. Inversion can be detected by divergent orientation of the reads, where the insert will have two plus-end or two minus-end.
In the case of an insertion or a deletion, mapping of the paired-end is consistent with the reference genome. But the read are disconcordant in apparent size. The apparent size is the distance of the BAC sequenced-ends mapped in the reference genome. If a BAC has an insert of length (l), a concordant mapping will show a fragment of size (l) in the reference genome. If the paired-ends are closer than distance (l), an insertion is suspected in the sampled DNA. A distance of (l< µ-3σ) can be used as a cut-off to detect an insertion, where µ is the mean length of the insert and σ is the standard deviation. In case of a deletion, the paired-ends are mapped further away in the reference genome compared to the expected distance (l> µ-3σ).