If you’ve been following thus far, we’ve covered sequencing by pyrophosphate detection, sequencing by reversible terminators, and now we have sequencing by ligation. Note that the term ‘sequencing by synthesis’ is not used here (although Illumina likes to use the term and names some of their reagents ‘SBS’ accordingly) as all three methods use synthesis of a corresponding strand to determine the sequence of bases.
In 2003 George Church’s group at Harvard published a paper describing ‘fluorescent in-situ sequencing on polymerase colonies’, later dubbed ‘polony sequencing‘, where after template preparation via emulsion PCR, a ligase will discriminate between perfectly matched fluorescently-labeled probes, and the non-perfectly matched ones. In one of the original schemes a 9-base probe had multiple degenerate bases (i.e. where any given position had any one of four nucleotides, so the sequence would be ‘NNNNTNNNN’ with ‘T’ being the known base for that fluor).
The SOLiD and 5500 sequencing by synthesis approach would be similar, improving on the accuracy by using a set of two nucleotides instead of one, and still surrounded by degenerate bases like this: ‘ATNNNNN’, where the ‘AT’ is the known dibase, and the following Ns are degenerate bases. As there are only four colors used for sixteen combinations of all four bases two-at-a-time (AA, AT, AC, AG, TA, TT, TC, TG etc) a single colored fluor will represent four different dibase combinations. Thus the need for a ‘color space’ decoding scheme that essentially is a method of sequencing by overlap.
This overlap means multiple applications of primers and sequencing via complex mixtures of many probes (on the order of 1028 due to the math involved), and in its current readlength of 75 bases, SOLiD and 5500 use 5 rounds of primer application and sequencing by synthesis via ligation. Another aspect of the chemistry is that ligation occurs in the opposite DNA orientation (3′ to 5′) than what a polymerase will do (5′ to 3′), as the ligase is more efficient in that orientation. Currently, doing a paired-end reverse read the ligase is doing a 5′ to 3′ reaction, albeit at a lower efficiency, so the reverse reads on SOLiD and 5500 are limited currently to 35 bases.
So with a given round of ligation, a probe is ligated onto a primed strand that existed on an emulsified, amplified bead, and then imaged like the other two methods. After the imaging (which remember is a time- and informatics- intensive step), the fluor is cleaved due to the handy presence of a chemically-labile bond within the probe, and the cleaved end prepared for another round of ligation. Life Technologies currently uses fifteen rounds of ligation, and when you multiply fifteen ligation steps by five different primers offset by N-1 then N-2 etc. you get a 75-base read.
Well, it is not exactly so straightforward as this, as you are generating colors of dibases spaced apart by NNN bases. The figure here should help: each horizontal row is a unique primer, the two dots are the discrete bases interrogated successfully and determined to be a single color, and the three intervening boxes represent the three NNN bases that are unknown along that particular read. And all the colors in the figure represent unique ligation cycles where the base discrimination is occurring.
Thus compared to the other two methods of pyrosequencing and reversible terminators, sequencing by ligation is more complex to analyze, which hampered adoption in the marketplace, but offers the ability to get much higher accuracy as each base is interrogated on independent primers with a different round of ligation. One other limitation is de novo sequencing – where the reference sequence is unknown; in order to call bases one would need to have a color space translated reference to compare the sequence to. However, this limitation was solved in recent years with a development of a sixth primer and unique probe mixture called E.C.C. (which stands for ‘Exact Call Chemistry’, here’s a link to a LifeTech whitepaper about it). Through a set of remarkable mathematics, a probe set was developed so that the bases could be deconvoluted and call discrete bases without a reference, through the addition of a sixth primer round of sequencing by ligation (again, with a different probe set than was used in the first five rounds).
If all this discussion around sequencing methods, enzymes, fluors, probes and imaging sounds complicated, it is in the sense of a lot of different reagents and steps. (Looking at the operating manuals and protocols for these instruments can take up a fair amount of time and energy to review and become familiar with.) Developments of bench-top sequencers (Roche was first with the 454 Junior, then Ion Torrent’s PGM, then Illumina’s MiSeq) miniaturize the approach, and with the exception of the Ion Torrent technology, still rely on optical imaging and very similar informatic pipelines to get the job done. Yet these approaches will be used until the next shift in technology occurs, which as of this writing could be in 2013 – the advent of the nanopore. But more about that later.