Looking at sequencing from one perspective, library preparation is straightforward. Sequencing a genome (whether bacterial on the order of 5 million bases or a human at 3 billion bases) is a shotgun-based affair with tens of millions to tens of billions of reads that overlap multiple times across the genome (known as ‘fold coverage’). (Thus a 30x human genome coverage would require some 90 billion bases, or a 15x coverage of each haploid allele.) Multiple random start points, 30-fold coverage across the entire genomic sample, one takes a gDNA sample, randomly shears it, attaches synthetic adapters, and off you go following the manufacturer’s protocol on getting sequence data out, whether by Roche / 454, Illumina GAIIx or HiSeq 2000, Life Technologies SOLiD or 5500xl, Pacific BioSciences RS, Illumina MiSeq, Life Technologies Ion Torrent PGM…
But from another perspective library preparation is anything but straightforward. Starting from total RNA, say you are interested in microRNA (which is a somewhat outdated term now, ‘small RNA’ or ‘non-coding RNA’ are better descriptive terms). How does that get converted to cDNA with synthetic adapters? Or DNA from a Chromatin ImmunoPrecipitation experiment? Or say you wanted to assay RNA-Protein interaction across the genome?
It is the remarkable potential of this library preparation step that has enormous potential, so that very frequently a publication comes out with a new method of library preparation (and a new acronym to describe that preparation) with a new biological insight. With names like PAR-CLIP, FAIRE-Seq, PARE-Seq, the list goes on and on, looking at layers and layers of genomic insights. Thus next-generation sequencing has been described as a technology as revolutionary as PCR, and when PCR was introduced in the late 1970’s, for the next decade (and longer) there were many creative applications developed. (I recently wrote about the advent of digital PCR, dubbed ‘third-generation’, which was first described in 1999.) Since next-generation sequencing was first introduced in 2005, there is at least another decade of robust applications that researchers are still dreaming up.
So for library preparation, in its simplest form, one takes a genomic DNA sample, randomly fragments it (originally requiring a Covaris ultrasonicator to perform the shearing, which was an expensive $50K+ piece of equipment, but now via transposons and other enzymes can do just about as good a job with fragmentation as with the mechanical methods). The particular steps from that point vary from vendor to vendor, but involve several enzymatic steps and intervening cleanup steps. (Some have A-tailing with a terminal transferase, others nick translation etc.) There are usually two rounds of PCR, the first (usually 10-15 rounds) after the initial ligation of adapters to the library insert pieces, the second (on the order of 4-6 rounds) to finish off the construct of the adapter. (At least one if not both adapters need to have a complementary sequencing primer, and a barcode sequence of 5 to 10 bases is needed if multiple samples are on the same instrument run.)
This process sounds time-consuming, and it is, so several vendors have come up with various methods of automation. (Typically it would take an entire day if not a day and a half.) Life Technologies has a Library Builder, Beckman has something called SPRIworks, Caliper has their Sciclone NGS workstation, and Agilent has their SureSelect XT (combined with target enrichment). There are others coming on the market from smaller firms, and these depend upon using magnetic beads for selective purification of the library during the cleanup steps, thus reducing the hands-on drudgery of waiting for a centrifuge yet once again. Of course, other customers have automated this process on their own via their own 96-well robotic systems.