Although the whole genome versus whole exome discussion was held previously, details around the methods of selecting out the whole exome have been not discussed (also called ‘targeted selection’), and the wide array of methods, costs, and effort required can be a rather complicated affair.
Put rather simply, there are three basic approaches for selecting out regions of the genome as part of the ‘up-front’ sample preparation: the first is via PCR (in small compartments such as Fluidigm or RainDance or highly multiplexed like AmpliSeq); the second is by a hybridization and extension method (such as Halo Genomics / Agilent or TruSeq by Illumina); the third is by a hybridization-only method (SureSelect by Agilent and NimbleGen /Roche SeqCap EZ, in addition Life Technologies offers a product called TargetSeq that is very similar).
For the PCR-based method, both Fluidigm and RainDance compartmentalize the PCR into very small volumes. RainDance is on the order of 30 picoliters (using an emulsion droplet microfluidics cartridge), and Fluidigm is on the order of 30 nanoliters (using a fixed-dimension channel Access Array cartridge). These two companies require instrumentation, which is its major drawback. (RainDance’s first system, the single-sample RDT1000, was $225K, and the Fluidigm system was on the order of $80K.) I know of customers who have both systems, due to their unique strengths and economics; Fluidigm for smaller targets (less than 100kb of sequence to capture) and more flexibility to pick and choose individual primer pairs, yet a fixed 48-sample per enrichment run format. The RainDance system largely for targets 100kb up to several Mb, but the primer pairs are fixed as a ‘library pool’ that is difficult to supplement to or reformat. (Another limitation for the RainDance system is the throughput, which they have addressed recently with an 8-library, 96-sample ThunderStorm system.)
The beauty of PCR is that as a technology it is well-understood, and primer design is straightforward. (Many researchers use Primer3 for their designs.) Another strength of a PCR-based approach is its specificity for pseudo-genes and other paralogous sequences, which a hybridization approach will by necessity not be able to distinguish (and thus won’t allow its design).
A third PCR-based method was introduced by Life Technologies recently, called AmpliSeq. Combining efforts from the TaqMan assay development group, and the Ion Torrent R&D, three standard panels (two cancer-specific and one for inherited disease) and a custom offering (only for human at present) have been launched. Single-tube, 3000-plex PCR means all the benefits of PCR specificity without the need for equipment and easily automated if need be.
The second method is what I call ‘hybridization and extension’. Halo Genomics was a Swedish firm that used molecular inversion probes (the basic technology behind Parallele which was acquired by Affymetrix for a targeted genotyping offering, which struggled to get market share several years ago) to straddle the target region of interest, then using a polymerase and a ligase to extend and close a ‘loop’ of DNA. The target is released from the loop by a set of restriction enzymes, and the target is purified away from the loop sequences. Illumina took its GoldenGate assay for multiplexed genotyping (from 96 SNPs per sample up to 1536 and then to over 20K with the DASL FFPE assay) and applied it to target selection; instead of a loop of DNA, it has two discrete oligos that hybridize to the region of interest, and each oligo has universal primer sites distal to the hybridization sites for amplification after purification of the extended target. These hybridization and extension methods do require a bit of hands-on manipulation (hybridization takes time, multiple clean-up steps, and in the case of Halo the need for multiple restriction digests). Illumina has scaled up their assay to include the entire exome (unlike any of the prior offerings discussed so far), however customers have complained that while inexpensive, its performance is measurably lower compared to SureSelect and NimbleGen.
For the third method, Agilent’s SureSelect and Roche / NimbleGen’s SeqCap EZ, the probes are synthesized (Agilent uses a method they call SurePrint via inkjet printing to build oligonucleotides; NimbleGen uses light-directed ‘Maskless Array Synthesis’ for directing how to synthesize oligonucleotides on a chip), and labeled with biotin for pulling down the target sequences of interest. One wrinkle with SureSelect is the transcription of the oligo DNA into an RNA probe ‘bait’; it is expected that the DNA/RNA hybrids bind more tightly, theoretically increasing its efficiency of capture. These hybridization methods take several days to complete, and are most often used for whole-exome capture, where the sequencing takes much longer than the target capture portion. (How the Ion Torrent Proton will change this dynamic remains to be seen, however, as fast-turnaround exome sequencing may put pressure on the target selection to become quicker as well.)
As mentioned before, whole exome sequencing (WES) has definite advantages, and will retain these advantages even with the advent of the $1000 genome. With Illumina, Agilent, NimbleGen (and Life Technologies with TargetSeq), is there room for additional whole-exome target capture development?
With Agilent’s purchase of Halo Genomics, it is unlikely that the HaloPlex technology would extend to exomes (unless it could offer a much better cost basis, which is unlikely due to the cost of long-oligo manufacturing). RainDance Technologies was often asked if their technology could scale to exomes, and while theoretically possible, the up-front oligo synthesis costs are formidable (hundreds of thousands if not over a million $$$) as well as prohibitive per-sample costs (due to the format of their microfluidic chips and the number of chip runs needed per sample, or engineer a new platform and chip format). Fluidigm is unlikely to scale to exomes, as it would require new instrumentation and chips as well. And so that leaves AmpliSeq, also by Life Technologies, and no one is saying if this technology can scale to that extent. (The number of amplicons required would be on the order of 250K – 300K.)
Yet for the translational research market, where there is a sizable market opportunity, there is a lot of activity around the several genes analyzed by NGS (thus AmpliSeq and its simple and fast workflow) along with a growing interest in WES (thus TargetSeq, SureSelect or SeqCap EZ).