Single-cell isoforms from long-reads take the stage at the Advances for Genome Biology and Technology, along with structural variation and better reference genomes. Underlying all these advances is better long-read technology from Pacific Biosciences and Oxford Nanopore.
Thinking a little further about the overarching theme of this year’s Advances in Genome Biology and Technology conference (#AGBT19) has been the enabling availability and utility of long-reads for genomics. It was only at last year’s AGBT that the irrepressible Keith Brown wrote “PromethION: Straining at the Starting Gate” and Ken McGrath’s talk on the last day of AGBT18 a year ago made it appear that routine Oxford Nanopore runs of 60 Gb were around the corner. Keith’s post then had the following two graphics illustration 75 Gb and 80 Gb runs.
This year there has been a noticeable undercurrent from many presentations of new kinds of data coming from long-read, single-molecule sequencing. Yesterday there were a pair of talks on Oxford Nanopore sequencing, and the titles speak for themselves. (Alas I had a business meeting at that time and could not make it to the plenary session.)
- Wilfried Haerty, Earlham Institute. Characterization of splicing diversity and gene fusions through Nanopore sequencing.
- Karen Miga, University of California, Santa Cruz. Generating high-quality reference human genomes using PromethION nanopore sequencing.
Pacific Biosciences had a trio of talks yesterday, of which the titles were as follows.
- Marty Badgett, Pacific Biosciences. The Sequel II System the next evolution of SMRT sequencing.
- Mike Hunkapiller Pacific Biosciences. High fidelity long reads for comprehensive genomic analysis.
- Jason Underwood, Pacific Biosciences. Single cell isoform sequencing (scIso-Seq) identifies novel full-length mRNAs and cell type-specific expression.
And the day before, there was this talk.
- Hagen Tilgner Weill Cornell. Isoform sequencing in tissues, cell types and thousands of cells.
A little more about Pacific Biosciences’ Sequel II and the 8M ZMW flowcells
There is a ton of science presented here at #AGBT19 and no there’s not the time nor inclination this morning to go into some of the details of the aforementioned talks. However it was of interest that Pacific Biosciences is now rolling out an eight-fold increase in sequencer throughput.
Dr. Tilgner from Weill Cornell shared the following in his talk.
And in Dr. Badgett’s talk, he had this throughput slide:
With Circular Consensus Sequencing (CCS) showing tens of gigabase amounts of data from 13kb inserts, and promises of 20kb inserts on the horizon, and Continuous Long Reads (CLR) generating 62kb readlengths, this is impressive performance from an almost order-of-magnitude increase in density / throughput.
A few words about isoform discovery
I wasn’t able to take a photo of a slide from one of yesterday’s collection of single-cell isoform talks, with a chart of several diseases (Duchenne Muscular Dystrophy among them) caused by an alternative splice variant. Mutations in the splice junction, in the intron or even synonymous mutations in the coding region affect the secondary structure of the pre-mRNA and thus exhibit alternative splice variants.
A friend from RIKEN (Masayoshi Itoh, Poster 516) presented a poster titled “Whole stretch sequencing of various isoforms from TACC2 gene by MinION sequencer”. Using a method of characterizing the 5’ end of mRNA to determine the transcriptional start site (TSS) called Cap Analysis of Gene Expression (CAGE), as well as MinION sequencing, they were able to look into a new potential biomarker for ovarian cancer. (A brief PubMed search indicates that TACC2 is a tumor suppressor gene.)
Dr. Masayoshi’s poster illustrated over 800 alternative splice variants, and dozens of new TSS sites. Think about that: a single gene, and almost countless variants, with unknown function. The more you look, the more you find.
Structural variant discovery and by association copy number analysis on display too
One talk I missed was by Mike Schatz.
- Michael Schatz, Johns Hopkins University. “100 genomes in 100 days: The structural variant landscape in tomato genomes”
He kindly made his slides available (PDF here). And he made the point: Structural variation are the drivers of quantitative variation, playing a major role in phenotypic variation. Using PromethION and some remarkable throughput numbers (the highest throughput run was 140GB, readlengths in the 15kb – 25kb range) he could rightly claim ‘100 genomes in 100 days’, 12 to 16 samples per week.
With the ability to get 15kb to 25kb reads, he used assembly-based analysis (de novo assembly followed by whole genome alignment) and notes that it is “slow, demanding analysis limited by contig length, heterozygous variants are challenging”. The section on validating their first SV association makes for interesting reading, on the usefulness of SV detection in plant breeding.
A week ago I wrote up this post about Nabys 2.0 and their single molecule imaging platform. I neglected a few salient details that make this an important alternative to BioNanoGenomics’ Saphyr platform.
First is the limited throughput for human genome analysis. With only 32 features (nanopores), the Nabsys 2.0 system will need a full two days analysis time for a human sample. Second is the accuracy. They showed me convincing proof-data that between 300 bases and 1500 bases their system reliably detects structural variants (inversions, deletions, duplications etc.) whereas the BioNano platform was problematic for these shorter lesions. Third is the modest price of the reader. I was told the BioNanoGenomics system is in the hundreds of thousands of dollars, while theirs is less than $30K. Afterwards I realized that if throughput was an issue (i.e. only 2 or 3 human samples a week) you can simply purchase multiple systems without too much trouble.
And finally their method of getting DNA past their detector is different, something more akin to electrophoresis rather than fluidic pressure. One thing that has always bothered me about the BioNano approach was the precision of the measurement – there was some 10% variation in the size of individual DNA molecules as they were optically imaged due to the viscoelasticity of the DNA molecules in solution.
A few concluding thoughts
So far at #AGBT19 (as I write this we have the final and last day today) these themes of isoform mRNA detection and structural variant discovery resonate with me, along with the popularity, utility and availability of single-molecule, long-read sequencing. While there isn’t much diagnostic application of either of these themes at present, the tools are being built and refined for further discovery, and happening right here and now.