Every human being came from a single cell. While that fact may not be so obvious in our day-to-day routine, the power of a single cell is observed with the burst of research activity in stem cells and induced pluripotent stem cells (and also see the Nobel Prize winner for 2012 in Physiology or Medicine, Shinya Yamanaka, who discovered the reprogramming process). In cancer research, the concept of cancer stem cells has developed into a major effort into identifying and characterizing circulating tumor cells (“CTC’s”) by which metastatses occurs. This was a major topic of discussion at the Spring 2013 AACR meeting in Washington DC, as well as a recent Next Generation Dx meeting (also in Washington DC). In many other areas of human disease biology, the inherently heterogeneous nature of tissues in general point to the need to analyze biology at a much finer resolution.
Take for example our immune system. Cell-mediated immunity comprises of many different cell types, with T-memory cells retaining an adaptive response to an antigen for the rest of that individuals’ lifetime, along with B-memory cells retaining the humoral response. These particular cells for that particular antigen response are by definition rare – yet plentiful enough in the milieu of your circulatory system to confer protection against future infection. To treat all circulating peripheral blood lymphocyte (PBLs) as a single sample type, while easy enough to obtain, overlooks the vast variety of cells that comprise it. (I have written up about sequencing the immune repertoire here if you haven’t seen it already.)
To put this in context, a healthy individual’s blood will contain from 5M to 10M peripheral blood cells per mL; thus a normal US blood donation is in amounts of 450mL apiece, or using the 5M/mL number, there are over 2B cells in 450mL or 15.2 oz. So if there’s a certain number (100’s? 1,000’s?) of each individual type of memory cells in that particular 450mL unit of blood, whatever that number is, will need to be divided by 2 billion.
If a researcher is looking to see an expression signal from a blood sample, that may represent a signal that is diluted by a factor of 1:1,000,000 or even more. And say they are looking at an expression signal on an RNA-Seq experiment, where the starting material is 5 ug of total RNA. At 10pg of RNA per cell (approximate), that 5 ug represents 500,000 cell’s worth of expressed total RNA. Purifying PolyA+ RNA from that 5 ug yields about 500ng of PolyA+ RNA, which is then turned into cDNA and made into an NGS library.
So will that particular signal be found in that RNA-Seq experiment? That’s only 500,000 cells’ worth of starting material. Going back to the ‘whole unit of blood’ scenario, 1:500,000 would equate to over 4,000 cells in that 2B cells in a unit of blood. And anything less than that level of prevalence of the cells of interest, and you run a reasonable risk that that expensive RNA-Seq experiment looking for that signal is simply lost; there is too much background.
Therefore looking at gene expression differences across an average of many cell types is a relatively crude method of looking for a signal. And of course PBLs are only one type of tissue.
Fluidigm introduced (in the summer of 2012) a novel single-cell sample preparation device called the C1™ Single Cell Auto Prep system, which tackles the problem of how to isolate individual cells and perform some biochemistry on them. Under development for several years (I have heard about this work since about 2006), Fluidigm’s history as a company has been all around microfluidics, first offering a protein crystallization system called the Topaz in 2003, still being offered by the company. The protein crystallization market is a bit of a specialty one, where many parameters need to be laboriously tested in a matrix-type fashion, which is a trial-and-error process. By ‘specialty market’, specifically the market for higher-throughput protein crystallization is in the drug discovery / biological characterization field, along with some academic work.
Eventually Fluidigm’s Integrated Fluidic Chip (IFC) technology was then applied to gene expression, with their EP1 end-point PCR and BioMark real-time instruments, offering a range of formats of 24 to 96 samples and 24 to 96 individual gene targets.
A few years ago (2009) their AccessArray line was developed for target enrichment for NGS, which enabled 48 amplicons from 48 samples simultaneously. This solution fit very nicely with the Roche/454 FLX system, as researchers could used tailed-primers with the 454 library adapters on them, and go right into template preparation with minimal post-PCR handling.
Before the C1 Single Cell Auto Prep System, (and it has only been commercially available since the end of 2012) researchers have been using relatively simple and non-scalable methods for studying single cells. For example, this paper in Nature Methods looked at a single-cell transcriptome using SOLiD technology, and used hand-pulled glass pipettes for manipulating these cells. In the diagnostic world, handling single cells is routine, although manual and requiring a fair amount of training and specialized expertise. In particular, karyotyping in cytogenetic laboratory testing routinely works with single cells, as well as pre-implantation genetic diagnosis (performed during in-vitro fertilization to check for genetic abnormalities to increase the rate of implantation and developmental success). For these two markets, however, they do not face overwhelming number of samples to process, and so a manual method serves their needs.
To understand more of biology, however, the need for single cell genomic analysis (both the genetics and gene expression level) is a real one. From rare circulating tumor cells, to stem cells, to understanding function of a specific tissue type, a tool like the C1 enables looking at 10’s or 100’s of individual cell profiles, in order to better understand cellular heterogeneity on a number of levels. As an example, here’s a recent Nature Biotechnology article entitled “Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments”, where a group at Oxford looked at 92 genes across 1440 cells from 15 individuals. From the abstract: “We provide evidence that many heritable variations in gene function—such as burst size, burst frequency, cell cycle–specific expression and expression correlation/noise between cells—are masked when expression is averaged over many cells.”
The C1 system process 96 single cells at a time, (>90% capture efficiency with an input of 1,000 cells according to their specification sheet), and utilizes the Life Technologies / Ambion Cells to Ct™ reagent for cDNA reverse transcription, typically followed by Clontech SMARTer cDNA RACE amplification.
As the system has additive steps (no intermediate purifications) the miniscule amounts of RNA from an individual cell is then amplified, and I’m told that the end yield after this process is on the order of about 5ng of cDNA material for RNA-Seq.
We have witnessed the ongoing interplay between GWAS (Genome Wide Association Studies) and sequencing of individuals; at the recent Boston Consumer Genetics Conference, Daniel MacArthur (who blogs at Genetic Future at Wired) mentioned in a talk that they are working on 62,000 whole exomes. In a similar fashion you can expect a shift (and interplay) between ‘bulk’, averaged gene expression and the gene expression profiles of individual cells, revealing some surprises along the way.
Not to leave out the genetic component either – as genetic mosaicism is a very real thing.