In the fields of bioinformatics and computational biology, Genome Survey Sequences (GSS) are nucleotide sequences similar to EST's that the only difference is that most of them are genomic in origin, rather than mRNA.
Genome Survey Sequences are typically generated and submitted to NCBI by labs performing genome sequencing and are used, amongst other things, as a framework for the mapping and sequencing of genome size pieces included in the standard GenBank divisions.
Genome survey sequencing is a new way to map the genome sequences since it is not dependent on mRNA. Current genome sequencing approaches are mostly high-throughput shotgun methods, and GSS is often used on the first step of sequencing. GSSs can provide an initial global view of a genome, which includes both coding and non-coding DNA and contain repetitive section of the genome unlike ESTs. For the estimation of repetitive sequences, GSS plays an important role in the early assessment of a sequencing project since these data can affect the assessment of sequences coverage, library quality and the construction process. For example, in the estimation of dog genome, it can estimate the global parameters, such as neutral mutation rate and repeat content.
GSS is also an effective way to large-scale and rapidly characterizing genomes of related species where there is only little gene sequences or maps. GSS with low coverage can generate abundant information of gene content and putative regulatory elements of comparative species. It can compare these genes of related species to find out relatively expanded or contracted families. And combined with physical clone coverage, researchers can navigate the genome easily and characterize the specific genomic section by more extensive sequencing.
The limitation of genomic survey sequence is that it lacks long-range continuity because of its fragmentary nature, which makes it harder to forecast gene and marker order. For example, to detect repetitive sequences in GSS data, it may not be possible to find out all the repeats since the repetitive genome may be longer than the reads, which is difficult to recognize.
The GSS division contains (but is not limited to) the following types of data:
Random “single pass read” genome survey sequences is GSSs that generated along single pass read by random selection. Single-pass sequencing with lower fidelity can be used on the rapid accumulation of genomic data but with a lower accuracy. It includes RAPD, RFLP, AFLP and so on.