Table of Contents
- Introduction to Short-read sequencing
- Principle of Short-Read Sequencing
- Process of Short-Read Sequencing
- Short-Read Sequencing Examples
- Advantages of Short-Read Sequencing
- Limitations of Short-Read Sequencing
- Applications of Short-Read Sequencing
- Combining Short-read and Long-read Sequencing Methods
- References
Introduction to Short-read sequencing
- Short-read sequencing is a widely used next-generation sequencing (NGS) method that sequences short DNA fragments, typically between 50 and 300 base pairs in length.
- It is a rapid and cost-effective sequencing method developed after Sanger sequencing.
- Short-read sequencing methods are also referred to as second-generation sequencing methods.
- This approach offers high efficiency and accuracy because shorter DNA fragments are easier to generate, amplify, and sequence.
- Sequencing methods are categorized based on read lengths into two types: short-read sequencing and long-read sequencing.
- Read length refers to the number of base pairs sequenced or the average length of sequencing reads produced.
- Long-read sequencing platforms can sequence longer strands of DNA, making them ideal for whole genome sequencing and analyzing complex genomic regions.
- Despite their advantages, long-read sequencing methods are costly, less accurate, and time-consuming.
- In contrast, short-read sequencing platforms are faster, more cost-effective, and well-suited for targeted sequencing.
Principle of Short-Read Sequencing
- The principle of short-read sequencing involves reading short DNA fragments.
- The process includes fragmentation of DNA into small pieces, attachment of adapters, amplification of templates, and sequencing using different short-read sequencing platforms to determine the nucleotide order of each fragment.
- After sequencing, the resulting data undergoes analysis to extract useful biological information from the genetic data.
- Three main sequencing principles are used by short-read sequencing platforms: sequencing by synthesis, sequencing by ligation, and sequencing by binding.
- Despite the development of many new sequencing technologies, Illumina’s sequencing by synthesis remains the most widely used short-read sequencing method.
- In sequencing by synthesis (SBS), nucleotides are added to the growing DNA strand, and each addition is detected through fluorescence, pH changes, or other sensor-based methods.
- Illumina sequencing and Ion Torrent sequencing use the sequencing by synthesis (SBS) method.
- In sequencing by ligation (SBL), ligase enzymes are used instead of polymerase to identify sequences.
- Short fluorescently tagged oligonucleotides are introduced, and the ligase joins the sequence that matches the template strand, with fluorescent signals used to detect the nucleotide base.
- SOLiD sequencing is an example of sequencing by ligation (SBL) used in short-read sequencing.
- Sequencing by binding (SBB) is a newer method of short-read sequencing involving a two-step DNA replication process.
- In this method, fluorescently tagged nucleotides bind to the template strand but are not incorporated due to a reversible blocker; the signal is recorded, and the nucleotide is washed away.
- Afterward, the blocker is removed, and an unlabeled nucleotide is added to extend the DNA.
- This method separates the binding and incorporation steps, using fluorescently tagged nucleotides for base identification without incorporating them into the DNA strand.
- This separation helps prevent errors caused by molecular scarring.
- PacBio short-read sequencing is an example of sequencing by binding (SBB).
Process of Short-Read Sequencing
1. Library Preparation
- The process starts by extracting the DNA or RNA of interest and preparing it into libraries compatible with the sequencing platform.
- For RNA samples, an additional step is required where RNA is converted into complementary DNA (cDNA) through reverse transcription.
- Next, fragmentation is performed using physical, enzymatic, or chemical methods to break the extracted nucleic acids into smaller pieces.
- The fragmented DNA or cDNA is then repaired to form blunt ends, and adapters are attached to the fragments. These adapters enable the sequencing platform to recognize the fragments and may also include barcodes to allow multiple samples to be sequenced together.
- Following this, the fragments are size-selected using bead-based or electrophoretic-based methods to eliminate unwanted contaminants and enhance sequencing accuracy.
- Finally, the size-selected library undergoes PCR amplification to prepare it for sequencing.
2. Sequencing
- The two most widely used short-read sequencing methods are Illumina and Ion Torrent, both of which utilize sequencing by synthesis.
- Prior to sequencing, DNA fragments are amplified. Illumina platforms use bridge amplification to create millions of clusters of identical DNA fragments, while Ion Torrent platforms use emulsion PCR, where DNA fragments attach to beads and are amplified within tiny water droplets suspended in oil.
- Once amplification is completed, sequencing begins. In the sequencing by synthesis (SBS) method, nucleotides are sequentially incorporated into the growing DNA strands and detected through sensors that measure fluorescence or pH changes.
3. Data Analysis
- Data analysis is divided into three stages: primary, secondary, and tertiary analysis.
- Primary analysis involves base calling and quality control, processing the raw sequencing data and storing it in FASTQ format.
- Secondary analysis includes aligning reads to a reference genome and performing variant calling to identify sequence variations.
- Tertiary analysis focuses on annotating and interpreting the variants to understand their biological significance.
Short-Read Sequencing Examples
Few of the examples of short-read sequencing methods are:
- Illumina Sequencing is the most widely used short-read sequencing platform. It uses the sequencing by synthesis method, where DNA fragments are first amplified on a flow cell through bridge amplification. Fluorescently labeled bases are then added one at a time to the sequencing template, and each nucleotide is detected by its fluorescent signal. Illumina platforms produce high-throughput data with high accuracy.
- Ion Torrent Sequencing employs semiconductor technology for DNA sequencing and also follows the sequencing by synthesis method. DNA fragments are amplified on beads using emulsion PCR, and these beads are placed in microwells on a semiconductor chip. When nucleotides are added during sequencing, protons are released, causing pH changes, which are detected by the chip to identify the nucleotide added. Unlike Illumina, Ion Torrent does not use fluorescence, making it faster and less expensive, but it has limitations in accurately sequencing homopolymer regions.
- SOLiD Sequencing uses a ligation-based approach. DNA libraries are prepared and amplified on beads through emulsion PCR, followed by sequencing through ligation, where fluorescently labeled di-base probes are ligated to the DNA fragments. This method uses a unique color space system to encode nucleotides, where each base is identified through the corresponding color code. It offers high accuracy and high throughput, though it is less commonly used compared to Illumina and Ion Torrent.
- Onso Sequencing is a short-read sequencing platform developed by PacBio that uses sequencing by binding (SBB) technology. It provides very high accuracy and can detect rare genetic variants that other short-read methods might miss. In this method, base interrogation and incorporation steps are separated. Sequencing starts with a 3’ reversible blocked nucleotide. In each cycle, fluorescently labeled nucleotides bind to the DNA and their fluorescence is detected. Afterward, the reversible blocker is removed, and native, unlabeled nucleotides are added for chain extension. This process repeats for each base.
Advantages of Short-Read Sequencing
- Short-read sequencing produces highly accurate results, making it ideal for detecting small changes in DNA and for sequencing low-quality DNA samples.
- It is a fast method that enables rapid sequencing of both DNA and RNA, helping to save time and reduce costs in projects that require quick results.
- This technique is more affordable compared to traditional and long-read sequencing methods, with a lower cost per base, making it suitable for large-scale projects.
- Short-read sequencing is supported by widely available bioinformatics tools and pipelines, which simplifies data analysis and interpretation.
- It can generate large volumes of genomic data in a short time, making it highly effective for both research and clinical applications.
Limitations of Short-Read Sequencing
- Short-read sequencing cannot sequence long DNA fragments directly; large sequences must be fragmented, amplified, and computationally assembled, which is challenging, particularly in highly repetitive regions.
- The amplification step can introduce errors or sequence biases into the data.
- Short-read sequencing has difficulty accurately sequencing complex genomic regions, such as highly repetitive sequences and regions rich in GC content.
- Detecting large structural variations is difficult with short-read sequencing methods.
- Short-read sequencing can experience uneven coverage, leading to inconsistent data and potentially inaccurate results, especially when analyzing regions with low coverage.
Applications of Short-Read Sequencing
- Short-read sequencing is used in whole-genome sequencing to sequence entire genomes, helping in the study of genome-wide variations.
- It is useful in whole-exome sequencing to identify changes or variations in genes associated with protein-coding sequences.
- It is applied in microbiome analysis to study the DNA sequences of microbial communities in environmental or clinical samples, aiding in understanding microbial diversity and their roles in different diseases.
- Short-read sequencing is used in RNA sequencing to study gene expression and understand disease mechanisms.
- Its high accuracy, low cost, and rapid processing make it valuable in clinical diagnostics, including disease diagnosis such as cancer studies.
- It is used in targeted sequencing and gene panel sequencing to focus on specific regions of interest, making it particularly useful in personalized medicine.
Combining Short-read and Long-read Sequencing Methods
- Both short-read and long-read sequencing methods have their own limitations.
- Long-read sequencing is effective at analyzing repetitive and complex DNA regions but is associated with higher error rates.
- Short-read sequencing produces highly accurate and cost-effective data.
- Combining both methods allows short-read data to correct the errors in long-read data, improving the overall sequencing accuracy.
References
- Chauhan, T. (2024, February 9). Understanding Short-Read Sequencing. Retrieved from Genetic Education
- Hu, T., Chitnis, N., Monos, D., & Dinh, A. (2021). An overview of next-generation sequencing technologies. Human Immunology, 82(11), 801–811. https://doi.org/10.1016/j.humimm.2021.02.012
- Mészáros, É. (2024). Comparison Between Short-Read and Long-Read Sequencing. INTEGRA. Retrieved from INTEGRA Biosciences
- Mobley, I. (2024, June 17). Long-Read vs. Short-Read Sequencing – Which One to Choose? Front Line Genomics. Retrieved from Front Line Genomics
- PacBio. (2024, September 30). Introduction to the Onso Sequencing System. Retrieved from PacBio
- PacBio. (2023, August 8). Sequencing 101: Understanding SBB Sequencing Technology. Retrieved from PacBio Blog
- Sambavince. (2021, June 2). Key Considerations in NGS: Coverage, Read Length, and Multiplexing. Retrieved from iRepertoire
- seqWell. (2024, December 11). Choosing Between Short-Read and Long-Read Sequencing Technologies for Research. Retrieved from seqWell
- Zymo Research. (2025, January 5). Short-Read vs. Long-Read Sequencing: A Comparative Guide. Retrieved from Zymo Research
- AAT Bioquest. (n.d.). Advantages of Short-Read Sequencing Technology. Retrieved from AAT Bioquest