Targeted RNA Sequencing Approaches

Happiness is getting more of what you want, in RNA-Seq as in other things... — Happiness is getting more of what you want, in RNA-Seq as in other things…

There are several commercial methods for looking at 10’s or 100’s of gene expression levels via a high throughput TaqMan™ assay from Life Technologies / Thermo Fisher Scientific, a competitive offering from Roche, Douglas Scientific, or also Fluidigm. The limitation of these technologies however is the amount of multiplexing a single assay in a given volume, which regardless of the amount of miniaturization does limit the samples by genes evaluated throughput.

To perform RNA-Seq, one looks at all the particular RNA species present, dependent upon the up-front sample preparation. (To clarify, a miRNA experiment would purify small RNAs then go into cDNA synthesis and sequencing; mature polyA+ RNA can be purified and then cDNA made and sequenced etc.) But what about a targeted set of expressed genes to evaluate via NGS?

There are targeted approaches for RNA, that mirror the approaches for targeted selection of DNA – either hybridization only, hybridization followed by extension, or PCR-based. (I reviewed these methods here.) For hybridization only, Agilent was the first to launch SureSelect Custom RNA in 2011 (coinciding with that year’s AGBT meeting).

Agilent now has competing approaches – Illumina has TruSeq RNA ™, which is based upon a hybridization-extension approach, just like their DNA target capture method, and Ion Torrent using AmpliSeq™ technology, based upon highly multiplexed PCR, has produced an AmpliSeq RNA™ product. Choice of either is dependent of the platform already chosen (i.e. the AmpliSeq RNA libraries don’t work on Illumina and vice versa).

The equipment-dependent PCR-based enrichment strategies (RainDance ThunderStorm™ and the Fluidigm AccessArray™) could also develop their own targeted RNA-Seq enrichment via their respective platforms (it is technically feasible from what I understand of them), but do no have a commercial product as of early 2014.

There is a question still though of throughput, as while targeted RNA-Seq could easily interrogate 10’s to 100’s of gene targets, possible processing and workflow hindrances on the NGS side hamper high sample numbers. So a fair ‘breakpoint’ of sample numbers by genes evaluated throughput calculation is about 100 samples by 100 genes. Below that number of samples and genes, a rather straightforward real-time PCR approach would make sense, while above that there are other choices to make (that is, targeted RNA sequencing isn’t necessarily a one-size fits everyone answer).

The QuantStudio 12K Flex™ builds in an OpenArray™ capability, which enables sample numbers in the several hundreds and targets up to 500 or so.

But if you don’t have access to a QuantStudio 12K Flex, an Ion Torrent PGM™ running AmpliSeq RNA can do 100 samples in 1.5 days for 100 to 500 genes; an Ion Torrent Proton multiplex much higher than that (given its greater sequencing capacity), only limited by barcode number. There are only 96 commercially available DNA barcodes for AmpliSeq (due to technical reasons, the DNA AmpliSeq barcodes are used for this “AmpliSeq RNA” product, not the ones ordinarily used for the Ion Torrent RNA-Seq library kit), but I understand additional designs have been made available from the company on an individual basis.

So where does targeted RNA sequencing fit? There are three clear cases where this approach would be useful: FFPE RNA analysis, allele-specific expression or allelic imbalance, and looking at fusion transcripts.

For FFPE (formalin-fixed, paraffin-embedded) tissue, RNA from FFPE is fragmented; here’s a nice paper laying out the two prime variables. The quality of the fixed tissue is one, and the embedding conditions is the other. On top of this, particular mutational artifacts can occur (possibly due to the de-amination of cytosine residues, resulting in cytosine conversion to uracil, which is interpreted as a ‘T’). Thus yields of RNA from a limited amount of FFPE tissue is generally low.

This is where the low input amounts for AmpliSeq DNA translates into a benefit for the AmpliSeq RNA product: a 5ng RNA from FFPE input requirement.

The second case, allele-specific expression or measuring allelic imbalance, has to do with the fact that over a distribution of RNA-Seq reads over a the genes that are transcribed at that moment in time, there will be widely varying numbers of reads that cover a given base in that transcript. It is not uncommon to have a 10,000-fold or even 1,000,000 fold differences in gene expression between genes at the extremes. And given the two alleles, some genes will co-express transcripts from each parental allele (i.e. express genes in a 50/50 ratio between both alleles), while others it will be 0/100 ratio or somewhere in-between. The extreme case is x-chromosome inactivation in females (however there is one gene Xist which is transcribed from that inactive chromosome (keeping that chromosome inactive).

The allele is determined by a coding variant (cSNV), but in order to get the proportions correct there must be enough of them to make that ratio call. In other words if the variant cutoff is 5 bases, at a given base there needs to be at least 5 bases from each allele present in the reads examined. For example, at a given cSNV in the RNA-Seq experiment, if there are 80 reads from one allele and 20 reads from the other you have a 4:1 ratio of one to another with high confidence. But what if there are only 8 of one allele and 2 of the other? (Those two cSNV’s from the other allele could be a sequencing error.)

One way around this would be to sequence the entire transcriptome via RNA-Seq at very high depth, in order to up the number of cSNV reads at a given base, to maximize the number of allelic imbalance events detected, as well as improve the precision of that measurement. But 100M or 200M reads per sample an experiment can get very expensive.

Thus targeted RNA sequencing gives an economical way to focus in on the 10’s or 100’s of genes of interest for this allelic imbalance effect. If you want to hear about a customer at Ohio State University talk about their work as it applies to pharmacogenetics, there’s a YouTube talk recorded here.

Third, AmpliSeq RNA provides a convenient method of detecting fusion transcripts from gene rearrangements in cancer. Certainly entwined with the history of cancer from the bcr-abl fusion discovered in 1960, the discovery of gene fusions has naturally accelerated with the advent of RNA-Seq. Even last week news about a new fusion-gene driver mutation in childhood glioma was reported at St. Jude in Memphis TN. Ion Torrent has a custom design service for AmpliSeq RNA, so if you are an Ion Torrent customer you can certainly take advantage of this capability.

Lastly there is a recent publication (the first for AmpliSeq RNA) in the Journal of Allergy and Clinical Immunology entitled “Dissecting childhood asthma with nasal transcriptomics distinguishes subphenotypes of disease” (subscription required), looking at 100 genes, and they note increased sensitivity using targeted RNA sequencing compared to microarrays. (Hmm, should have added a fourth point above, oh well.)

1 thought on “Targeted RNA Sequencing Approaches”

Leave a Comment Cancel reply