Bacterial genomes are generally smaller and less variant in size among species when compared with genomes of animals and single cell eukaryotes. Bacterial genomes can range in size anywhere from about 130 kbp to over 14 Mbp. A study that included, but was not limited to, 478 bacterial genomes, concluded that as genome size increases, the number of genes increases at a disproportionately slower rate in eukaryotes than in non-eukaryotes. Thus, the proportion of non-coding DNA goes up with genome size more quickly in non-bacteria than in bacteria. This is consistent with the fact that most eukaryotic nuclear DNA is non-gene coding, while the majority of prokaryotic, viral, and organellar genes are coding. Right now, we have genome sequences from 50 different bacterial phyla and 11 different archaeal phyla. Second-generation sequencing has yielded many draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing might eventually yield a complete genome in a few hours. The genome sequences reveal much diversity in bacteria. Analysis of over 2000 Escherichia coli genomes reveals an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Genome sequences show that parasitic bacteria have 500-1200 genes, free-living bacteria have 1500-7500 genes, and archaea have 1500-2700 genes. A striking discovery by Cole et al. described massive amounts of gene decay when comparing Leprosy bacillus to ancestral bacteria. Studies have since shown that several bacteria have smaller genome sizes than their ancestors did. Over the years, researchers have proposed several theories to explain the general trend of bacterial genome decay and the relatively small size of bacterial genomes. Compelling evidence indicates that the apparent degradation of bacterial genomes is owed to a deletional bias.
As of 2014, there are over 30,000 sequenced bacterial genomes publicly available and thousands of metagenome projects. Projects such as the Genomic Encyclopedia of Bacteria and Archaea (GEBA) intend to add more genomes.
The single gene comparison is now being supplanted by more general methods. These methods have resulted in novel perspectives on genetic relationships that previously have only been estimated.
A significant achievement in the second decade of bacterial genome sequencing was the production of metagenomic data, which covers all DNA present in a sample. Previously, there were only two metagenomic projects published.
Bacteria possess a compact genome architecture distinct from eukaryotes in two important ways: bacteria show a strong correlation between genome size and number of functional genes in a genome, and those genes are structured into operons. The main reason for the relative density of bacterial genomes compared to eukaryotic genomes (especially multicellular eukaryotes) is the presence of noncoding DNA in the form of intergenic regions and introns. Some notable exceptions include recently formed pathogenic bacteria. This was initially described in a study by Cole et al. in which Mycobacterium leprae was discovered to have a significantly higher percentage of pseudogenes to functional genes (~40%) than its free-living ancestors.