In contrast to BioNano Genomics, who is starting commercialization with early access customers now and full commercial launch in the Spring of 2013, OpGen launched the Argus™ Optical Mapping system in the summer of 2010. Their customers use this system for microbial strain mapping, mainly for infectious disease research or finishing reference bacterial strains.
It was at last year’s ASHG in Montreal that I noticed a poster illustrating a collaboration between OpGen and BGI, mapping the goat genome from de novo short-read sequence in order to assist with raising the quality of the reference sequence. (I wrote about OpGen and BioNano Genomics in an earlier post from July 2012 here.) The resulting increase in N50 lengths are convincing, from 2.3 kb to 16.9 kb; the number of contigs decrease from 1236 to 181. (For those interested, here’s a whitepaper on this work from the OpGen website.)
But what is not indicated anywhere is how many runs and how long a time it took to generate this data. Given the existing capacity of the system, it must have taken quite a few runs.
The original mapping card had the ability to examine about 750Mb of sequence; for a given microbial sample, assuming a genome size of 5Mb, that is a coverage redundancy of some 150-fold. The new card in the same Argus instrument has scaled the capacity to 3Gb; what was not explained to me clearly was the level of redundancy for a complex genome (like human) for a needed coverage level (i.e. completeness). But even at 20x coverage for the human genome, you would need to do 20 runs, which isn’t practical. This is reminiscent of BioNano Genomics, where the existing platform may not quite meet the need for human genomics researchers, but exhibit at a human genetics meeting anyway.
Could there be a method for subdividing the human genome in order to map just that region? One would think that there could be a ‘targeted selection’ method similar to whole-exome enrichment. But the problem is with the sample.
These methods depend on the ability to look at very large single molecules hundreds to over a million base-pairs long. (According to this calculator, 100Kb is 33um, while 1Mb is 0.3mm.) They have to be very long as they depend upon being able to look at relatively rare restriction or other recognition sites on the order of 1000 bases apart, and you need enough information to be able to resolve major landmarks along an entire 6Gb diploid genome. Thus you have to deal with a different method of sample preparation (like I mentioned before here with BioNano Genomics); OpGen said that they haven’t optimized their sample preparation for human cells, so their existing sample prep takes over 10 million cells (!).
Anyway, being very gentle with the cells and DNA in situ, you simply cannot do what you normally can do with NGS library preparation where the molecules are 300 – 600 bp in length, where they can be hybridized to streptavidin baits and purified with magnetic beads; these very long molecules have to be treated with the utmost care.
It isn’t terribly difficult – the sample preparation is a few hours – but a different process, and one that currently cannot subdivide particular regions. If only human chromosomes could be easily manipulated by flow cytometry and collected by size, then put into this process, that could be a solution. I have been told however, that technically this is so difficult to perform there are only two facilities in the world that can do it, one at the Sanger in the UK, the other at the NCI here in Maryland. (And I have been told that the NCI facility is fraught with internal politics as far as access to this technology goes, which is understandable from observations of human nature – wouldn’t you do the same if you were one of two facilities in the world who could perform a valuable technique?)
An early (2008) NGS paper using chromosome sorting and then NGS for looking at balanced chromosomal rearrangements used this method, back when whole-genome sequencing was very expensive. This group followed this work up in 2010 using paired-end WGS on both the Illumina 1G and SOLiD 2. So chromosome sorting can be done to do useful work, but if the method is is in the realm of art (and in particular not commercialized for different reasons) it will not gain widespread use.
Another method has been developed based upon the hydrodynamic focus technique (the principle used for flow cytometry) applied at the microscale, labeling the DNA with fluorescent PNA (peptide nucleic acid). Due to the constant flow of molecules however, its resolution is limited and was not commercialized.
Use of BAC clones, while feasible in the BioNano Genomics paper for the MHC region for an already-existing construct (on the order of 50kb to 200kb insert size) is simply not practical in terms of scalability. (“Unfeasible” and “unpleasant prospect” are two terms that come to mind.)
OpGen did not indicate publicly whether they were working on a larger-scale instrument, which could solve some problems with the multiple-run issue of their existing system. However they are looking at more expensive optics, and a much heavier footprint of computational analysis – they would need to scale their existing platform some 20x, so that instead of twenty 3 Gb runs they could do a single 60 Gb one.
But since 2010 the resolution of CCD cameras has not gone up 20x; the oft-mentioned Moore’s Law (okay I’m heavily involved with Ion Torrent™ so I am biased here) would only allow a doubling in that time-frame of 18- to 24 months. In order to get to 20x, that would be at least four doublings to get to 16x and five doublings to 32x, so that’s four and a half doublings, or 4.5 x 24 months = 9 years of Moore’s Law to get there (at the same price point). Regardless of what year the engineering team used for their camera, it is the most expensive component of the system, and to get a system at a similar price (the OpGen system is priced very similarly to the BioNano Genomics Irys™ at about $300K) one would need to wait for 9 years.
I’ve been told that the 454 FLX system (launched as the GS20 in 2005) had the largest CCD camera available, at a cost of $50K apiece. This can explain why the system is so expensive (originally around $450K but lower now). And that system was not scalable as the camera could not go any bigger, although new chemistries and plate technology had improved the readlength substantially, but the density only incrementally. Fast forward to 2012, while Roche could make a similarly-priced system ($450K) with 10x or 20x the resolution (presuming they could get smaller beads and detect a similar signal) thus upping their sequencing capacity 10x, it still won’t be able to catch up to where we are with the HiSeq™s and Proton™s of the world. Moore’s Law can’t keep a system competitive.
So in short I would not hold my breath for a breakthrough here, watching BioNano Genomics may be time better spent, or perhaps looking at something else in the sample preparation side of things (tackling the chromosome separation piece perhaps).