A few attending the Advances in Genome Biology and Technology meeting in Marco Island Florida (February 12 – 15 2014) have blogged about a presentation from David Jaffe (Broad Institute), presenting the first data the next-generation sequencing community has publicly seen from Oxford Nanopore Technologies. For those not familiar with Oxford Nanopore (or ONT as I’ll refer to them), it was AGBT12 that they absolutely stunned the crowd in attendance with their announcements of both a GridION™ nanopore sequencing ‘module’, and a MinION™ USB-stick portable DNA sequencer that got a lot of press. They planned to commercialize ‘by the end of the year’ (that is, 2012), and since I was not there ‘in’ the meeting first-hand (I was supporting the meeting on-site at Marco Island for Life Technologies, just not as a conference attendee), I heard first-hand from several that year with interest.
And way back in that February of 2012, I still remember speaking to several at an Ion Torrent customer get-together. (Okay, frankly it is billed as the ‘Ion Lounge’ and it’s held in a nice restaurant / patio next door to the conference center at the Marco Island Resort but enough about those details.) “If ONT does even half of what they promised yesterday, it’s going to absolutely be a game-changer” one friend told me, who buys a lot of NGS equipment from all the vendors. Later that year at ASHG they disappointed those hungry for early-access, and the following year at AGBT they had representatives at the meeting and a poster (if I remember correctly) but no presentation, and this note in GenomeWeb about their lack of presence there at last year’s AGBT meeting.
And in October of 2013, at ASHG in Boston with a large marketing presence (20 foot booth and swamped with visitors during the entire exhibit duration, so much so that I wasn’t able to engage with their representatives as they had so much interest), with the announcement that early-access signups would be ‘accepted soon’, and news dribbling out about their signup process until right before #AGBT14 the first users were selected. And of course the day Dr. Jaffe gave his talk, an hour or two earlier the invitations to participate in their early access program are sent out.
Now for the presentation: entitled “Assembly of Bacterial Genomes Using Long Nanopore Reads” by Dr. Jaffe – for those not familiar with his work, he’s the director of Computational R&D for the Broad Institute, whose background is in genome assembly and the writing of DNA assemblers, the most recent of which is ALLPATHS-LG. Thus he’s pre-eminently qualified, from a genome informatics perspective, to comment on the utility and usability of a particular data-type that hasn’t been seen before (i.e. long reads from a new single-molecule sequencing technology). He can give useful functional perspective of this DNA data rather than a ‘reads and feeds’ type of simplistic comparison, which is what the business-types (or those not very familiar with genomics) would perhaps prefer. (Disclaimer – I’m one of those ‘business-types’ too, but I do not subscribe to such reductionist thinking with regard to NGS platforms.)
The talk itself described a two datasets from DNA given to ONT from the Broad, an E. coli spp. and a Scardovia spp. Giving a short introduction to the technology, he illustrated the need for a ‘ratcheting’ protein to sit on top of the nanopore, so that multiple bases could be interrogated a time. There had been previous iterations of the technology where company representatives were talking about other multiple groups of bases (last time I heard it was three) but it appears that for early-access customers (and presumably a launched MinION instrument) this is the enzymology they’ve settled upon.
And then onward to the data, the E. Coli reads had a mean of 5.4 kb and the Scardovia ones were 4.9 kb. He showed a plot of the Scardovia readlengths histogram, a relatively symmetrical plot tapering to a maximum x-axis value of 10 kb, and mentioned the longest read at 20 kb. Then he showed an overlay of the size distribution of the DNA they sent, and it matched very closely to the observed read-length distribution.
Some commentary on this point: as the reads are fast (the reported speed can range from 1 to 100 bases / sec, the one used for this experiment was 25 bases / sec), and library shearing is trivial (if any institution worldwide are experts at DNA fragmentation, the Broad could certainly be among them), there is a ‘we will show the best of what we can do in this environment’ without laying out too many of the more-unpleasant details. And if I were at ONT responsible for releasing this platform, that’s exactly how I would do it – specify to the Broad that the modal readlength is about 5kb with a minority at a maximum of 20kb, and then delivering data that matched that distribution.
Back to the talk: David mentioned their DISCOVAR tool they developed, and how ONT reads can help resolve ambiguities in an assembly graph, although is not useful yet for a ground-up de novo assembly. He pointed out a particular 6-base sequence (remember due to the ratcheting enzyme at the top of the pore, discrete signals are determined from each k-mer passing through the pore) that of a six-fold ‘coverage’ in a particular genomic region, five of the reads had a missing ‘T’ base, while one of the reads had a ‘ground truth’ T inserted in that position.
This point deserves dwelling upon – the nanopore is electrically detecting a collection of k-mers, in this case with the ratcheting enzyme used it is a six-mer. 4 DNA bases as a unique 6-mer yields a total of 46 different possible 6-mers (4,096 of them) passing through the pore at any given point in the sequencing run. A figure not unlike this one of different levels of electrical signal (but without any markings on the x-axis for the relative milliseconds) was shown, along with a series of 6-mers below it corresponding to a given discrete signal. (The ratcheting enzyme is needed to ‘hold’ the DNA strand in place long enough for the electrical detecting circuitry to detect a distinct differentiated current flow – presumably one of 4,096 possible species, each with a unique electrical signal.)
Speaking with David after the talk, he mentioned that the calling algorithm was ‘black box’ – there was some complicated analysis going on behind the scenes, and when a mathematician (or computational biologist for that matter) tells you something is complicated I’ll take his word for it. He did mention two possibilities for improvement at the end of his talk however – that ONT could mix the types of ratcheting enzymes and/or pores in the 512 nodes of the system (thus evening out any particular bias that one pore type would have versus another), and the other purely informatic in improving the base-calling algorithm. This being the first ‘real data’ anyone has seen, this is very early days for this technology.
He also went on to explain the proportion of reads that produced ‘perfect’ sequence data (since both of the strains used they already had reference genome information). Some 84% of the reads they obtained had a perfect 50-base portions, and it went up to 100% when they looked for 25-base stretches. He didn’t share information on the total yield (i.e. what level of fold-coverage was attained in a given run) so it isn’t clear how many reads a 512-pore MinION could give, nor what was the rate of data capture, which would give an idea of overall efficiency / yield / turnaround time for a given experiment. Certainly when the early-access recipients start to share both their experiences and their data metrics a number of pieces of missing information will be filled in.
I also asked David about library preparation – they sent 5 µg of DNA of each sample, and he mentioned that ONT attached library adapters but only could refer to a single distal ‘anchoring’ adapter that interfaced with the synthetic lipid bilayer that serves as a substrate for the nanopore to reside in. (Here is some ONT material on their nanopore-producing expertise, but alas it was likely in a technical talk that I heard about the need for a synthetic lipid bilayer.) From the diagram that was in David’s presentation, it appeared that it was not nucleic acid in nature, rather a protein moiety or perhaps a chemical moiety with particular properties that would co-locate individual DNA library molecules in close proximity to the pore. He told me that he didn’t have any information about the nature of that anchor nor what the library preparation process involved; the idea of putting in unpurified, unmodified DNA and have it sequenced as surmised in some quarters was clearly untrue, at least in this initial iteration.
Here’s a 1996 PNAS paper that Jeffrey Schloss mentioned in his overview of NGS – and to think that it took 18 years to get to here illustrates the technical complexity involved in such an approach. So the fact that ONT was able to get sequence – and sequence that could be used to solve an alignment ambiguity as presented by David in this talk – is a big step forward.
But this is not to overlook its limitations. The sequence quality was not good enough to perform a de novo assembly of a 1.6MB genome, and no error metrics were given. Arguably it’s too early to evaluate a system on such a metric, which indicates is relative readiness for commercialization. A commercial system needs to have some minimum performance specifications, which a business would need to adhere to in terms of service and support. There’s no way you can sell a sequencer without a firm ‘this is the minimum performance specification’, as customers would not know beforehand what they were purchasing to begin with.
Also looking at all the presentations and posters using Pacific Biosciences, it was clear to me that they are making nice strides in increasing yields, readlengths, and accuracy. According to one person’s calculations (disclosure – it wasn’t me) the Oxford MinION could give similar yields per dollar (as in $/MB) as a PacBio RS II without the capital expenditure, with similar readlengths. (On the topic of the error rate of both platforms that is yet to be determined.)
To conclude, an interesting talk of a closely-watched technology that can point the way to a future of further disruption. A capital cost of $1,000 for the MinION will make NGS within the reach of many, many more groups and research environments than at present. (Take a look at this European tour my colleague put together – a PGM on the back of a Mini Cooper Clubman, complete with a great video – but still the purchase cost of about $65K USD holds back many.) The audience at AGBT heard several ecology- and environment-related talks, where sampling biodiversity in many contexts and remote areas posed unique challenges, and this is where this technology could be deployed that much easier.