Oxford Nanopore, based in Oxford U.K., made a remarkable announcement that surprised many in February’s AGBT meeting in Marco Island. A GridION and MiniION single-molecule sequencers were announced, promising 15 minute runtimes, no sample preparation, and a disposable USB-stick sequencer for $900 (in the case of the MiniION), with 50kb long readlengths (and 100kb promised) at only a 4% error rate it appears to be a dream come true for many research challenges that await.
(I mentioned last week what made single-molecule sequencing so attractive, and in prior posts reviewed Helicos Bioscience’s short-read single molecule approach, and Pacific Bioscience has been covered briefly, but their Zero Mode Waveguide technology will have to await another day to get into those details.)
So Oxford Nanopore would be the third single-molecule sequencing company to commercialize a sequencing technology. And it would be the first company to do it in a manner originally envisioned way back in 1989 – to use a protein nanopore sitting across a bilipid membrane and detect nucleotide sequences via electrical potential across the membrane. Nothing short of magic, and it is difficult in this day of high-technology advances (whether the development of a computer in your pocket known as a smartphone, or the social media craze that has spawned a renaissance of startup activity, or a private company launching a reusable vehicle that can dock with the International Space Station) to realize there are some technical feats that represent true barriers.
These barriers are not insignificant, the technological challenges are not trivial, and reducing a laboratory experiment to a manufacture-able process (otherwise known as going from Research phase to Pilot phase to Production phase to Internal Testing phase to External [beta-] Testing phase to Launch phase to Sustaining phase) there are an almost infinite variety of places to fail. And all along this tortuous process the decisions get made to inform the market about it – Oxford was able to land a ‘surprise’ £31M additional funding round recently (bringing its total funding to some £101.3M, and observers indicate its total value has doubled since last year to a whopping $2B). So the informing of the market of what a company is working on, as it pushes through this tortuous process of product development, is a careful balancing act, one that certainly places an enormous amount of pressure on the developers.
And a new sequencing platform is incredibly complex. A ‘raw signal’ can be one of a multitude of types of data, depending on the technology used. On a light-based technology (454, Solexa, SOLiD, Helicos, PacBio) a camera is taking an image and a raw signal is a set of pixels in a given location on a given cycle with a given color filter. These pixels are brighter than the background, and indicate a given level of intensity that can be extracted to come up with a number. Multiply that by trillions. To explain, on a HiSeq 2000, a single run is 600 billion bases of sequence data, each base has four colors it could have been, thus four images were taken for a given base, and the four images were four discrete numerical values of which one was far above the rest, and that one represents one of four bases A/T/G/C, so 600 billion x 4 = 2.4 trillion pixel-groups were generated per run.
Back to the ‘raw signal’ – the images have their intensities extracted; the intensities are compared to each other numerically, and then bases called from those comparisons. (These ‘base-callers’ are often tweaked and tuned, as the chemistry is changed and improved.) And then quality needs to be assigned to these bases, as a function of signal-to-noise. And then the bases (as a string to form an individual ‘read’) are then aligned to a reference dataset, and a BAM file produced. (This Binary Alignment File format is a standard format, a by-product of the 1000 Genomes Project.)
On top of this informatic wizardry that connects an instrument to an output, software has to be written to control the instrument itself. Starting a run, checking for presence of all the reagents, feedback loops for proper operation, putting the raw data into a data pipeline, stopping a run if necessary, there are hundreds if not thousands of ‘use cases’ where something goes wrong – either from something inherent in the process (i.e. one of the reagents has gone awry) or from user error (how about a power outage or an earthquake, both of which do happen) or from software problems (what if the controlling software runs on Windows 7 and needs an update and reboots?).
And on top of all this is other software to process the data, whether for further massaging of the raw data (otherwise known as ‘trimming’, where individual reads could be thrown out altogether if the quality of bases along the read is below a certain threshold, or bases toward one of the ends has lower quality and the end is trimmed back to eliminate the stretch of low-quality bases.
One large question in my own mind is the revolutionary nature of this technology – a single molecule of DNA measuring electrical capacitance across a membrane via a single protein macromolecule embedded into that membrane – and how such a device will be manufactured. Nothing like this has been done before, in terms of a manufacture-able product. (I would welcome comments on this if I am mistaken!)
And then there’s user-documentation, training of the field application scientists and field service engineers, the writing of service documentation, the training of salespeople, the production of sales literature, the production of presentations for salespeople to use, the list goes on. And for a startup, there’s the infrastructure one often takes for granted – the ordering mechanism of customer service and finance, the web development for e-commerce. Thus the frenetic pace of development in the NGS industry reflects an enormous amount of work by many people over many months, to gain traction in a market where a given instrument is generally assumed to be in active use for about three years, and another generation or type of instrument is expected to take its place, since it is faster / better / cheaper to run (and often all three at once).
Back to Oxford. Now in mid-year of 2012 (July as of this writing) there has been no ‘news’ from Oxford, although they promised to launch both products ‘in the latter half of 2012′, and have selected already several ‘early access’ sites. I have personally launched new platforms into the marketplace (back in the day, it was Illumina’s BeadStation genotyping and gene expression system that was the ‘junior’ version of the $1M BeadLab genotyping system for large laboratories), and it is no mean feat to launch a system. And a good seven to nine months before commercial launch (i.e. ‘ordinary’ customer availability), you want to get these systems into a few well-chosen laboratories to start using them (the External (beta-) Testing phase referred to above), and point out all the room for improvement, so the engineers can incorporate it into the finished product.
Seven to nine months is a requirement as it takes a lot of time for a beta-test customer to get properly trained and get experiments done on the new system, and get feedback to the development team. At that point the development team has to identify and prioritize what will be done and by when, and make all the needed changes (of course some are much easier to make than others). And the documentation has to reflect these changes, and the training put into place for not only the application scientists to assist customers in running the instrument, but also the service engineers who understand the inner workings to fix it when something goes wrong. (And with time, systems break down, which is why there are service contracts, and service engineers, to put it plainly.)
Oxford Nanopore had many shows and conferences at which it could make more news – from a bacterial genome finishing meeting in New Mexico in May, to the European Society for Human Genetics in late June. Perhaps the market will need to wait until November’s ASHG meeting in San Francisco. In the meantime – Oxford advertised for a manager for an internal services business, yet no one I know of has heard anything about it. Nor has anyone heard of any beta-testing going on.
At this stage the chances of having a commercial product launched into the marketplace by the end of 2012 is thin. The next ‘phase’ in their development would be the release of a dataset to the research community, along with the acceptance of customer samples into their development lab so that they can get feedback from the ‘real world’ about performance. (Of course this is a slippery slope, as early data in the midst of frantic development can look less-than-pristine.) And this next milestone will often dictate the soonest timeframe that a commercial product can be available, given the development cycle time required.
Lastly, how does the upcoming Ion Torrent Proton compare to what Oxford promises? I can say today that there is too much uncertainty at this point what to expect the final performance of ONT to be, as PacBio once promised a 15 minute whole human genome by 2013, which is definitely not going to happen. (The promise was made four years ago, and PacBio has seen rough times lately as a company.) But ONT is promising their system by the end of this year.
So the question remains – how does it compare? ONT has said that the cost/Gb will be ‘comparable to incumbent offerings’ at $25 – 40 / Gb. The Ion Torrent Proton’s second chip (named Proton 2 – yes it is unfortunate that the chip has such a similar name to the instrument), will get that price / Gb down to around $15 / Gb or perhaps even to $10 / Gb. But it is the readlength which customers will pay a premium for, and the ONT at 100kb long reads will have that in abundance.
But $10-$15 is less than half of $25 – $40, and with those kinds of economics (remember a human genome is on the order of 90G-bases of data, so the Proton 2 can offer a ‘$1000 Genome’ depending on what output per run it delivers, which is a moving target), it can easily be envisioned that ONT will be a ‘must-have’ yet complementary technology for whole-genome sequencing. And for other research applications (like tag-counting RNA-Seq or ChIP-Seq) not cost-effective at all.
Am I scared of ONT? No. Do I wish them every success for the sake of genomic research? Yes, unequivocally.