Last year at Advances in Genome Technology and Biology, NanoString’s Joe Beechem presented their proof-of-concept work around a new single-molecule sequencing technology that had fulfilled a long-lived goal going back many years: sequencing by hybridization. (By the way, if you are interested here’s a 1994 paper by Lee Hood and Rade Drmanac describing this approach via microarray. And here at AGBT, a friend told me he was there at the NIH when Francis Collins discussed this same approach for sequencing BRCA in 1993.)
So a full 24 years later at AGBT 2017, NanoString held a sponsored workshop where progress with Hyb & Seq technology was described in a talk titled “Hyb & Seq technology: a no amp, a no library, single molecule sequencer designed for the clinic”. (NanoString was kind enough to give me a preview, and thus am able to publish this post right after Joe’s talk.)
For those not familiar with the technology, here is a video from last year on YouTube explaining the basics of the technology. But first of all, the main benefits of this approach (compared to more conventional NGS approaches) are as follows:
- <1 hour sample processing time, of which <15 minutes hands-on time, even for clinically relevant samples such as FFPE
- Sample to Answer system, sequencing both RNA and DNA directly, capable of both short and long reads of several ten’s of kilobases (so far)
- No amplification nor enzymology up-front (such as library ligation and PCR amplification)
Sample preparation is as simple as doing a 30 minute deparaffination, followed by a simple filtration step, quantitation then loading into the instrument.
The sample of DNA is hybridized in-situ to a flow-cell, tethered and captured in specific locations on the slide. A six-base sequencing probe has three additional regions contiguous with the probe, labeled R1, R2 and R3. The R1 region is used to translate the identity of the first two of the six bases; the R2 region translates the second two (bases three and four) of the six bases; and the R3 region translates the last two of the six bases.
Joe pointed out that the features of the existing NanoString detection chemistry probes (details are here) are about 2,000 nm long, which were too large for the kind of kinetics needed for Hyb & Seq to work. The detection molecules, the six base sequencing probe plus the R1/R2/R3 translation region, is 20 to 50 nm long, or 1% of the size.
The translation of R1, R2 and R3 to six-mer sequences of DNA is not a complex decoding scheme, but rather a direct readout or translation of the first two, second two, and third two bases of the six base probe as mentioned previously. For reference, the sequence of DNA can be called b1, b2, b3, b4, b5 and b6. R1, R2 and R3 are stretches of contiguous DNA to b1-b6, and each given probe is a unique set of 6 bases. (Full size of the sequencing probe is an 82-mer oligonucleotide.) Given four bases of DNA (the A’s, G’s, T’s and G’s) the total number of combinations of each R1/R2/R3 translation regions is 4 to the 6th power or 4,096 possibilities of 6-mers.
The translation probes are sets of 512 six-mers which are simply unique, non-overlapping subsets of the total 4,096 combinations of the six bases that comprise the R1/R2/R3 translation regions. The reason for separate subsets of 512 is due to the nature of the complementary nature of DNA for sequencing-by-hybridization to work: within the universe of 4096 total combinations, there are plenty of individual sequences that have complete or partial complementarity to other translation sequences, and need to be physically separated. (Anyone who has designed a set of PCR probes and have primer-dimer artifact has observed this effect first-hand.)
Since 4096/512 = 8, there are eight sets of 512 translation probes. Each of these probes are uniquely identified with two dye system, which are setup in different combinations. Using four different colors in pairs, the total number of unique pairs of colors for the translation scheme is 4 to the 2 power (four colors in pairs) or 16 total colors.
This is a confusing point, admittedly. NanoString (and more specifically, Joe Beechem) is using one of sixteen unique colors to translate two unique bases (say b1/b2 for R1) of DNA which has a total of sixteen combinations. Think about it this way: sixteen colors to determine the sequence of sixteen possible combinations of bases, with 4096 probes with sixteen colors working as an intermediate information layer. The short length of these R1 probes and very short hybridization and imaging times is a secret to why this technology works.
(I’d point out here for the sake of clarity, that this is not any kind of 2-base encoding like in the old SOLiD NGS method, which was sequencing by ligation and more specifically sequencing by adjacency. Rather, this is direct readout of dinucleotides via sixteen color combinations.)
What you get for this effort
These molecules of DNA fixed onto a flow-cell are sequenced directly without any library preparation, cluster amplification, or enzymatic sequencing by polymerization of adding sequential bases of the ‘ensemble sequencing’ paradigm that is the most common method today. Oxford Nanopore has simplified their library preparation significantly, and PacBio has increased throughput and lowered costs with their new Sequel system, and these single-molecule systems have long reads (I understand Oxford has gone up to 118 kb read-lengths).
But this single-molecule approach doesn’t involve engineered membrane-bound biological nanopores (in Oxford’s case) nor zero-mode waveguides (as for PacBio) but a short oligo and single-molecule dye translation scheme, with fast kinetics and imaging. Thus any single-molecule with various lengths can be read, and the ability to scale without too much wrestling with the limits of physics, chemistry or biology.
I’ve seen this technology in action; above is a photo of a modified NanoString Sprint instrument sequencing a specific gene (they were showing a targeted run of 120 COSMIC cancer mutations) and showing the sequencing images. On the left side of the monitor, the different features of one of sixteen colors are shown on the left; in the middle are the individual b1-b2 bases identified on a feature-by-feature highlight (appears as squares on the monitor); on the right are the registration marks for knowing what image is what during the scanning of the flowcell.
Quality of sequence
While this method is still in development, it is obvious a lot of progress has been made since Joe first announced Hyb & Seq at last year’s AGBT. (GenomeWeb link, premium subscription required.)
The proof-of-concept on the longest read capability has been 33kb; they are positioning this platform for 10 to 1,000 gene targets (at least in its initial configuration); and first-pass sequencing is Q14, second-pass is Q22; four passes gets you to Q31, and five goes all the way to Q40. A for typical sequencing run, every base will be covered (passed) on average of 5-6 times. Being able to do additional passes in other single-molecule platforms has involved circular consensus or other manipulations, while for sequencing by synthesis (SBS) is not possible for the original molecule; here it involves simply re-interrogating the R1/R2/R3 regions which should be simple enough to perform.
One interesting aspect is how this system will handle homopolymers: since you know the targeted region to sequence, you can skip over these known regions with a blocker oligonucleotide (effectively ‘skipping over it’) as well as adjusting the components of the 512 probes in each deck (which Joe called ‘stacking the deck in your favor’).
Reverse calculation of Phred means that Q14 is about a 4% single-pass error rate, already exceeding the Oxford Nanopore’s latest chemistry and pore biology iteration, at 5% error rate.
NanoString expects a Hyb & Seq instrument to start early beta testing in 2019. Certainly well-positioned to get to the ‘sample in, answer out’ goal for a clinical environment, as well as furthering the democratization of next-generation (and third-generation) sequencing.
Update: For another helpful perspective on the Hyb & Seq system, see the DeciBio blog here. Update 2: Small edits and additional figure kindly provided by NanoString