Table of Contents
- Introduction of Long-Read Sequencing
- Principle of Long-Read Sequencing
- Types of Long-Read Sequencing
- Process of Long-Read Sequencing
- Long-Read Sequencing Data Types
- Advantages of Long-Read Sequencing
- Limitations of Long-Read Sequencing
- Applications of Long-Read Sequencing
- Long-Read Sequencing vs. Short-Read Sequencing
- References
Introduction of Long-Read Sequencing
- Long-read sequencing (LRS) is a third-generation sequencing method capable of reading longer nucleotide fragments compared to traditional short-read sequencing techniques.
- It generates genomic data by sequencing DNA or RNA fragments ranging from 10,000 to 100,000 base pairs in length.
- These long reads originate from native molecules, which are directly extracted from a biological sample.
- The direct analysis of native molecules provides a more accurate representation of the original genomic material.
- This enhances genome assembly accuracy and facilitates the detection of complex genomic regions that short-read sequencing may struggle to identify.
- In sequencing, "reads" refer to the nucleotide sequences determined by a sequencing machine, with their length depending on the sequencing platform.
- Reads are categorized into short reads and long reads, based on their fragment length.
- Traditional short-read sequencing (SRS) involves fragmenting DNA into short pieces, amplifying them, and then sequencing.
- Long-read sequencing, on the other hand, sequences longer DNA fragments directly, without requiring amplification.
- The leading technologies in long-read sequencing include Pacific Biosciences’ Single Molecule Real-Time (SMRT) sequencing and Oxford Nanopore sequencing (ONT).
Principle of Long-Read Sequencing
- Long-read sequencing (LRS) operates by directly reading long DNA or RNA molecules in a single sequencing pass, eliminating the need for extensive fragmentation and amplification.
- Different LRS technologies function based on distinct principles to achieve long-read sequencing.
- SMRT sequencing detects fluorescence emitted when a polymerase incorporates a nucleotide into a growing DNA strand.
- The polymerase is fixed at the bottom of a tiny well, and each added nucleotide produces a unique fluorescent signal.
- These fluorescence signals are recorded to determine the sequence of nucleotides in the DNA strand.
- Oxford Nanopore sequencing detects variations in ion flow as single-stranded DNA passes through tiny biological pores.
- Each nucleotide has a unique electrical resistance, causing a specific change in ion flow.
- These ion flow fluctuations are measured and translated into base calls to determine the DNA sequence.
Types of Long-Read Sequencing
True Long-Read Sequencing
- True long-read sequencing (LRS) directly reads native DNA or RNA molecules without computational reconstruction, processing entire DNA fragments in a single run.
- The two primary true LRS technologies are Oxford Nanopore Sequencing and PacBio Sequencing (SMRT).
- These platforms provide long-read data without requiring short-read assembly or reconstruction.
- In contrast, alternative methods either read short fragments and assemble them into longer sequences or process long fragments but reconstruct them as shorter reads.
Synthetic Long-Read Sequencing
- Synthetic long-read sequencing combines short-read sequencing with computational techniques to reconstruct long reads from shorter fragments.
- It achieves long-read reconstruction by linking short-read fragments using barcodes or other molecular strategies.
- Some synthetic long-read methods include linked-read technologies (10X Genomics), proximity ligation, and optical mapping.
- While synthetic long-read sequencing is more cost-effective and improves upon standard short-read sequencing in some areas, it remains limited by its reliance on short reads for assembly.
- Compared to true long-read sequencing, synthetic methods are less effective for detecting structural variants and achieving complete genome assembly.
Process of Long-Read Sequencing
1. Library Preparation
- DNA is extracted from the target organism to ensure accurate sequencing results.
- The extracted DNA is fragmented into large pieces.
- Sequencing adapters are attached to the fragment ends to enable binding to the sequencing platform.
2. Sequencing
The prepared libraries are loaded onto a sequencing platform designed for long-read sequencing.
SMRT Sequencing:
- The DNA library is loaded onto SMRT cells, which contain thousands of wells called zero-mode waveguides (ZMWs).
- A DNA polymerase enzyme is fixed at the bottom of each ZMW.
- DNA molecules bind to the polymerase, and fluorescently labeled nucleotides are incorporated.
- The emitted fluorescence signals are detected in real time and converted into nucleotide sequences.
Nanopore Sequencing:
- The DNA library passes through a protein nanopore.
- As DNA moves through the nanopore, it disrupts the ionic current due to the unique electrical properties of each nucleotide.
- These ionic fluctuations are detected as electrical signals and processed into DNA sequences.
- This method allows real-time sequencing without amplification.
3. Data Analysis
- The raw signals obtained during sequencing are processed to identify base pairs and generate sequencing data.
- Base calling translates these signals into nucleotide sequences.
- The generated sequences are mapped to a reference genome or used for de novo assembly to reconstruct new genomes.
- Genetic variants are identified by aligning sequences with known genetic data.
- Detected variants are annotated to provide insights into their location, function, and biological significance.
Long-Read Sequencing Data Types
Long-read sequencing technologies generate reads of varying lengths and accuracy.
PacBio sequencing data types:
- Continuous Long Reads (CLR):
- Produced using SMRTbell templates with DNA inserts longer than 30 kb.
- Enables only single-pass sequencing of the template.
- High-Fidelity (HiFi) Reads:
- A highly accurate and recent data type developed by PacBio.
- Generated using circular consensus sequencing (CCS) of SMRTbell templates ranging from 10-30 kb.
- Smaller DNA inserts allow multiple polymerase passes, improving accuracy and producing extremely long reads.
- A highly accurate and recent data type developed by PacBio.
- Generated using circular consensus sequencing (CCS) of SMRTbell templates ranging from 10-30 kb.
- Smaller DNA inserts allow multiple polymerase passes, improving accuracy and producing extremely long reads.
ONT sequencing data types:
- Long Reads:
- The most common type of ONT sequencing reads.
- Typically range from 10-100 kb in length.
- Ultra-Long Reads:
- Specialized ONT reads derived from high molecular weight DNA.
- Can exceed 100 kb in length but have lower throughput than standard long reads.
Advantages of Long-Read Sequencing
- Long-read sequencing provides better genome assembly by reducing ambiguity and errors associated with shorter fragments.
- Platforms like Oxford Nanopore enable real-time sequencing, allowing for rapid data generation.
- Accurately sequences repetitive DNA regions and detects large-scale genomic mutations linked to genetic disorders, which are difficult for many other NGS technologies.
- Some LRS platforms can sequence RNA molecules directly without converting them to complementary DNA, providing a more precise understanding of transcriptomes.
- Can sequence native DNA and RNA without amplification, eliminating amplification bias and preserving base modifications such as DNA methylation.
- Enhances the detection of structural variants, including large deletions, insertions, inversions, and duplications, which are often missed by short-read sequencing.
- True LRS technologies like Nanopore sequencing feature a compact and portable design, making them usable even in remote locations.
Limitations of Long-Read Sequencing
- Analyzing long-read sequencing data is complex and requires specialized expertise and resources, making interpretation more challenging.
- Higher error rates occur due to sequencing longer DNA fragments, leading to base-calling errors and inaccuracies in data interpretation.
- Lower throughput compared to short-read sequencing, generating fewer sequencing reads per run, limiting the number of samples processed within a given time.
- Sequencing long DNA fragments takes more time, potentially delaying results, which is a concern in clinical settings requiring rapid diagnosis.
- More expensive than short-read sequencing, particularly for large-scale projects, increasing overall sequencing costs.
Applications of Long-Read Sequencing
- Genome-wide variant identification, enabling the detection of clinically significant variants, including large insertions, deletions, and repetitive regions.
- Targeted sequencing of complex and clinically relevant genomic regions that are difficult to analyze with short-read technologies.
- Studying epigenetic modifications like DNA methylation across large genomic regions without amplification, preserving native DNA content.
- Haplotype phasing, which helps distinguish between chromosomal copies, aiding in understanding the genetic basis of diseases and identifying specific genetic variations without relying on statistical inference or parental sequencing.
- Full-length transcript sequencing, allowing detailed analysis of isoforms, alternative splicing, and gene expression patterns.
Long-Read Sequencing vs. Short-Read Sequencing
Feature
Long-Read Sequencing (LRS)
Short-Read Sequencing (SRS)
Read Length
Can sequence DNA fragments ranging from 10,000 to 100,000 base pairs.
Reads shorter DNA fragments, typically between 50-300 base pairs.
Sequencing Time
Takes longer due to the processing of larger DNA fragments.
Faster sequencing as it processes shorter fragments.
Genome Assembly
Provides better genomic context, simplifying assembly.
Limited context makes genome assembly more challenging.
Structural Variant Detection
Effective at identifying large-scale genomic variations and repetitive sequences.
Less effective at detecting large insertions, deletions, and repetitive regions.
Error Rates
Higher error rates in individual reads.
Generally lower error rates, especially with Illumina sequencing.
Data Analysis
Requires advanced computational tools and algorithms.
More widely supported, with simpler and faster data processing.
Applications
Ideal for whole-genome sequencing, detecting structural variants, and analyzing complex genomes.
Preferred for high-accuracy studies such as transcriptomics and population genetics.
Examples
PacBio Sequencing, Oxford Nanopore Sequencing.
Illumina Sequencing, Ion Torrent Sequencing.
References
- Amarasinghe, S. L., Su, S., Dong, X., Zappia, L., Ritchie, M. E., & Gouil, Q. (2020). Challenges and opportunities in analyzing long-read sequencing data. Genome Biology, 21(1). https://doi.org/10.1186/s13059-020-1935-5
- Chauhan, T. (2024, February 9). Understanding long-read sequencing: principles and applications. Retrieved from https://geneticeducation.co.in/what-is-long-read-sequencing/
- Davis, J. (2023, August 22). Exploring long-read sequencing: A revolutionary approach in genomics. News-Medical. Retrieved on December 22, 2024, from https://www.news-medical.net/life-sciences/What-is-Long-Read-Sequencing.aspx
- Logsdon, G. A., Vollger, M. R., & Eichler, E. E. (2020). Advancements in long-read human genome sequencing and its real-world applications. Nature Reviews Genetics, 21(10), 597–614. https://doi.org/10.1038/s41576-020-0236-x
- CD Genomics. (n.d.). Long-read sequencing: Technology, benefits, and applications. Retrieved from https://www.cd-genomics.com/long-read-sequencing.html
- Illumina. (n.d.). Long-read sequencing technology for decoding complex genomes. Retrieved from https://www.illumina.com/science/technology/next-generation-sequencing/long-read-sequencing.html
- Mobley, I. (2024, June 17). Comparative analysis of long-read and short-read sequencing: Key differences and applications. Front Line Genomics. Retrieved from https://frontlinegenomics.com/long-read-sequencing-vs-short-read-sequencing/
- Pollard, M. O., Gurdasani, D., Mentzer, A. J., Porter, T., & Sandhu, M. S. (2018). The significance of long reads in modern genomic research. Human Molecular Genetics, 27(R2), R234–R241. https://doi.org/10.1093/hmg/ddy177
- PacBio. (2023, March 2). Sequencing 101: The role of long-read sequencing in modern genomics. Retrieved from https://www.pacb.com/blog/long-read-sequencing/
- CD Genomics. (2024). A comprehensive guide to long-read sequencing: Methods, benefits, and emerging trends. Retrieved from https://www.cd-genomics.com/resource-complete-overview-long-read-sequencing.html
- Jain, M., Olsen, H. E., Paten, B., & Akeson, M. (2016). The rise of long-read sequencing in genomic research. Nature Biotechnology, 34(10), 1067–1075. https://doi.org/10.1038/nbt.3683