Single molecule mapping OpGen making slow progress 2


Photo of an OpGen mapping card

In contrast to BioNano Genomics, who is starting commercialization with early access customers now and full commercial launch in the Spring of 2013, OpGen launched the Argus™ Optical Mapping system in the summer of 2010. Their customers use this system for microbial strain mapping, mainly for infectious disease research or finishing reference bacterial strains.

It was at last year’s ASHG in Montreal that I noticed a poster illustrating a collaboration between OpGen and BGI, mapping the goat genome from de novo short-read sequence in order to assist with raising the quality of the reference sequence. (I wrote about OpGen and BioNano Genomics in an earlier post from July 2012 here.) The resulting increase in N50 lengths are convincing, from 2.3 kb to 16.9 kb; the number of contigs decrease from 1236 to 181. (For those interested, here’s a whitepaper on this work from the OpGen website.)

But what is not indicated anywhere is how many runs and how long a time it took to generate this data. Given the existing capacity of the system, it must have taken quite a few runs.

The original mapping card had the ability to examine about 750Mb of sequence; for a given microbial sample, assuming a genome size of 5Mb, that is a coverage redundancy of some 150-fold. The new card in the same Argus instrument has scaled the capacity to 3Gb; what was not explained to me clearly was the level of redundancy for a complex genome (like human) for a needed coverage level (i.e. completeness). But even at 20x coverage for the human genome, you would need to do 20 runs, which isn’t practical. This is reminiscent of BioNano Genomics, where the existing platform may not quite meet the need for human genomics researchers, but exhibit at a human genetics meeting anyway.

Could there be a method for subdividing the human genome in order to map just that region? One would think that there could be a ‘targeted selection’ method similar to whole-exome enrichment. But the problem is with the sample.

These methods depend on the ability to look at very large single molecules hundreds to over a million base-pairs long. (According to this calculator, 100Kb is 33um, while 1Mb is 0.3mm.) They have to be very long as they depend upon being able to look at relatively rare restriction or other recognition sites on the order of 1000 bases apart, and you need enough information to be able to resolve major landmarks along an entire 6Gb diploid genome. Thus you have to deal with a different method of sample preparation (like I mentioned before here with BioNano Genomics); OpGen said that they haven’t optimized their sample preparation for human cells, so their existing sample prep takes over 10 million cells (!).

Anyway, being very gentle with the cells and DNA in situ, you simply cannot do what you normally can do with NGS library preparation where the molecules are 300 – 600 bp in length, where they can be hybridized to streptavidin baits and purified with magnetic beads; these very long molecules have to be treated with the utmost care.

It isn’t terribly difficult – the sample preparation is a few hours – but a different process, and one that currently cannot subdivide particular regions. If only human chromosomes could be easily manipulated by flow cytometry and collected by size, then put into this process, that could be a solution. I have been told however, that technically this is so difficult to perform there are only two facilities in the world that can do it, one at the Sanger in the UK, the other at the NCI here in Maryland. (And I have been told that the NCI facility is fraught with internal politics as far as access to this technology goes, which is understandable from observations of human nature – wouldn’t you do the same if you were one of two facilities in the world who could perform a valuable technique?)

An early (2008) NGS paper using chromosome sorting and then NGS for looking at balanced chromosomal rearrangements used this method, back when whole-genome sequencing was very expensive. This group followed this work up in 2010 using paired-end WGS on both the Illumina 1G and SOLiD 2. So chromosome sorting can be done to do useful work, but if the method is is in the realm of art (and in particular not commercialized for different reasons) it will not gain widespread use.

Another method has been developed based upon the hydrodynamic focus technique (the principle used for flow cytometry) applied at the microscale, labeling the DNA with fluorescent PNA (peptide nucleic acid). Due to the constant flow of molecules however, its resolution is limited and was not commercialized.

Use of BAC clones, while feasible in the BioNano Genomics paper for the MHC region for an already-existing construct (on the order of 50kb to 200kb insert size) is simply not practical in terms of scalability. (“Unfeasible” and “unpleasant prospect” are two terms that come to mind.)

OpGen did not indicate publicly whether they were working on a larger-scale instrument, which could solve some problems with the multiple-run issue of their existing system. However they are looking at more expensive optics, and a much heavier footprint of computational analysis – they would need to scale their existing platform some 20x, so that instead of twenty 3 Gb runs they could do a single 60 Gb one.

But since 2010 the resolution of CCD cameras has not gone up 20x; the oft-mentioned Moore’s Law (okay I’m heavily involved with Ion Torrent™ so I am biased here) would only allow a doubling in that time-frame of 18- to 24 months. In order to get to 20x, that would be at least four doublings to get to 16x and five doublings to 32x, so that’s four and a half doublings, or 4.5 x 24 months = 9 years of Moore’s Law to get there (at the same price point). Regardless of what year the engineering team used for their camera, it is the most expensive component of the system, and to get a system at a similar price (the OpGen system is priced very similarly to the BioNano Genomics Irys™ at about $300K) one would need to wait for 9 years.

I’ve been told that the 454 FLX system (launched as the GS20 in 2005) had the largest CCD camera available, at a cost of $50K apiece. This can explain why the system is so expensive (originally around $450K but lower now). And that system was not scalable as the camera could not go any bigger, although new chemistries and plate technology had improved the readlength substantially, but the density only incrementally. Fast forward to 2012, while Roche could make a similarly-priced system ($450K) with 10x or 20x the resolution (presuming they could get smaller beads and detect a similar signal) thus upping their sequencing capacity 10x, it still won’t be able to catch up to where we are with the HiSeq™s and Proton™s of the world. Moore’s Law can’t keep a system competitive.

So in short I would not hold my breath for a breakthrough here, watching BioNano Genomics may be time better spent, or perhaps looking at something else in the sample preparation side of things (tackling the chromosome separation piece perhaps).

 


About Dale Yuzuki

A sales and marketing professional in the life sciences research-tools area, Dale currently is employed by Olink as their Americas Field Marketing Director. https://olink.com For additional biographical information, please see my LinkedIn profile here: http://www.linkedin.com/in/daleyuzuki and also find me on Twitter @DaleYuzuki.

Leave a Reply to Dale Yuzuki Cancel reply

Your email address will not be published. Required fields are marked *

2 thoughts on “Single molecule mapping OpGen making slow progress

  • Trevor Wagner

    Based on your post of November 16th, I am compelled to address some of the misperceptions and inaccuracies in your post about Whole Genome Mapping (WGM) and the Argus System.
    The Argus System generates data at a rate comparable with the highest throughput NGS platforms. An Argus produces about 1 Gbp of map data per hour while an Illumina HiSeq 1500 in high output mode generates about 300 Gbp in 8.5 days which is about 1.5 Gbp per hour. The Argus uses disposables (MapCards) that each produce about 3 Gbp. The use of smaller disposables is advantageous because high capacity disposables such as those used on the HiSeq cost more and can’t be easily scaled to smaller genomes. The Argus user can select the number of disposables required for their genome and save both time and money.
     
     Sample prep for WGM is not complex. The Argus does not require highly purified DNA and the Argus disposable is not subject to clogging (unlike the BioNano device). This allows the Argus to use a simple crude lysis and dilution strategy for sample prep. It’s also important to point out that unlike NGS and the BioNano platform, the Argus does not require complicated and costly labeling or library prep. The Argus uses genomic DNA directly from the cells without modification. There is no amplification, no adapters, and no purification steps. This also means that the Argus can be run confidently in a single room without the worry of sample cross contamination.
     
     In your post you discussed the need for ‘bigger’ cameras to improve throughput. Unlike the Roche example that you used, the Argus throughput is not significantly limited by the camera.  Therefore, the comparison and data you present are not exactly relevant to the data collection by the Argus even though they may be relevant for NGS systems. 
     
    Your supposition that OpGen is making “slow progress” is relatively subjective.  In 2010 the Argus was released for de novo assembly of microbial genome maps.  The large genome application, Genome-Builder, was released in 2011.  In 2012, OpGen presented proof of principle data on whole human chromosome mapping and subsequently has produced data on other animals.  Since the release of the Argus, there have been over 80 publications on Whole Genome Mapping and thousands of Whole Genome Maps produced by OpGen and customers.
     
    Comparisons of WGM to NGS only make sense at a somewhat superficial level. WGM provides long range information that is impossible to get with NGS.  WGM allows for a sequence-independent validation of microbial whole genome assemblies that are needed given the level of misassemblies that are observed in standard published finished genomes.  Furthermore, WGM provides a way to achieve very high resolution, objective strain typing of microorganisms that is similar in concept to what people know (e.g. PFGE) without the data deluge of sequencing everything.  Finally, WGM provides the only, commercially available, practical way, and in many cases the only way, to detect SV’s larger than the few kbp detectable by NGS and smaller than the few Mbp detectable by FISH.
     
    Trevor Wagner
    Senior Manager of Applications
    OpGen, Inc.

    • Dale Yuzuki

      This response (IMHO) is funny, as I thought the audience who reads this blog understands the distinction between a single molecule map versus a whole genome sequence. I do not equate optical maps to WGS, and re-reading my original comments I don’t see where I do so.

      So to make it clear for the record: optical mapping != whole genome sequence.

      Thanks Trevor for your thoughts on this though, and good point on the data throughput non-equivalence. Yet as an optical mapping company, OpGen does the scientific customer a great disservice in using terms like ‘Gb throughput’, which you yourself use. You need to come up with a new nomenclature for amount of fragments mapped, in order to differentiate it from next-gen sequence data. (In my original draft of this post I did make a comment about the confusing jargon, but it did not make the final post.)

      While you state no ‘significant limitations’ on the camera, the fact remains that until the system and technology scales to look at human genomes, this will be a niche application and market. And should the promise of single-molecule sequence with high accuracy be fulfilled in the next few years, the market need for these maps will disappear.

      Respectfully,
      Dale