Overview

  • The ability to directly read RNA sequences in a highly parallel manner, where each species’ sequence assembly is compartmentalized.
  • The ability to read RNA sequences de novo (without needing a pre-determined sequence reference).
  • The ability to discern canonical from up to 170 different modified ribonucleotide bases (e.g., pseudouracil, 5-methylcytidine, etc.), including those known to inhibit ribonucleotide hydrolysis—irrespective of their clustering.

Regulatory scrutiny of the integrity of RNA therapeutics is expected to increase and the regulatory framework for this is in the process of being defined. Modified bases in small RNAs and RNA therapeutics are crucial for processing and biological activity through complex feedback networks involving their writing, erasing, and sensing—disruption of which results in cellular dysregulation and disease.

For small-RNA drugs the pattern and quality of intended modifications is foundational to their effectiveness and toxicity. Modified mRNA bases are also used by cells for the regulation of half-life. Some mRNA therapies like vaccines are intentionally engineered with modified bases at specific positions to ensure persistence, while others are not; in either case, preparations can include RNAs with unintentional base modifications that create adverse events.

Authorities are signaling interest in standardized approaches to ensure the safety and efficacy of RNA therapeutics, including the detection and quantification of both intended and unintended base modifications, prompting calls for more rigorous, modification-aware analytical assays in submissions. The problem is that while common contaminants such as dsRNA or plasmid DNA associated with the manufacturing process are easy to detect with existing technologies (e.g., LC-MS, PCR, or chromatography), the discernment and quantification of modified nucleotides requires direct, de-novo RNA sequencing—which has not yet been possible. This is the problem we elegantly solve at DirectSeq.

Importance of Small RNA Base Modifications

Micro-ribonucleotides (miRNAs) regulate cellular homeostasis primarily through gene silencing. Modified ribonucleotide bases (e.g., m6A) are critical for this function, forming binding conformations for proteins involved in writing, erasing, and sensing. Because these pathways are tightly controlled in cells, quality assurance of intended base-modification distributions in small-RNA therapeutics is vital for patient safety.

Danger of Unintended RNA Base Modifications

Both phosphoramidite syntheses and IVT reactions can produce trace isomer contaminants containing unnatural ribonucleotide modifications via chemical degradation (oxidation, deamination, hydrolysis), supplier or lot impurities, and polymerase side-activities. Lot-to-lot variation in these isomers has been associated with serious adverse events—including, in rare cases, death.

Analytical Limitations Today

Common contaminants like dsRNA or plasmid DNA are detectable by LC-MS, PCR, or chromatography, but modified nucleotide discernment and quantification requires direct, de-novo RNA sequencing— which has not been possible with conventional methods.

Safety & Regulatory Implications

Authorities are signaling interest in standardized, modification-aware assays in submissions to ensure the safety and efficacy of RNA therapeutics. Lack of visibility into intended and unintended base modifications creates critical risk and uncertainty for developers and regulators.

A technology Leap is needed

Liquid Chromatography Mass Spec (LC-MS) is used for control of a majority of the RNA therapeutic Critical Quality Attributes (CQAs), but sequence identity assessments are normally performed through RT-PCR, which is incapable of identifying nucleotide base modifications and quantifying the isomers containing them. Modified base isomer distributions should be added to the Critical Quality Attribute list for RNA therapeutics, and these distributions can only be appreciated through direct RNA sequencing.

The problem with direct RNA sequencing up until now is that sequence could only be obtained from highly purified samples capable of producing complete ladders, and only in a confirmatory manner which required pre-determined knowledge of the sequence. Incapable of simultaneous, direct, de novo (and hypothesis-free) sequencing and quantitative assessment, prior platforms have been incapable of rigorous quality control needed to identify unknown contaminating modified bases.

The RNA therapeutics industry looks forward to novel analytical tools to gain more insight into CQAs such as sequence identity, while also opening doors beyond current limitations to address previously unaddressed facets of CQAs (such as modified base–containing isomers) that have emerged during clinical and post-clinical stages as potential safety concerns.

How DirectSeq Solves This

DirectSeq's Next Generation Mass Spec (DSMX™-Seq) platform is the significant technology leap patients, regulators and RNA therapeutics professionals have been waiting for. Instead of inferring RNA identity through mass readout, as with LC-MS, DSMX™-Seq reveals identity directly via de-novo sequencing.

DSMX™-Seq delivers single-run, exhaustive, direct sequencing of RNA species in a sample, including those containing any of up to 170+ base modifications. It allows sequencing and quantification of not just the major RNA species, but each minor primary sequence isomer, including those containing one or multiple (even clustered) base modifications, while also delivering detection readouts like LC-MS (e.g., presence, length, structural variants, cofactors).

This represents a significant advancement beyond the limitations of LC-MS and opens the door for the first time to empirical, hypothesis- and bias-free assessment of RNA identity and quality.

More on our Technology

3D DSMX™-Seq is the first RNA sequencing technology capable of directly sequencing and quantifying every sequence of a mixed RNA sample, including minor isomers (down to 1%) differing by only a single canonical or modified base. Unlike nanopore technologies, it can read modified base sequences found in clusters as they are commonly found in nature.

How it works

The RNA is hydrolyzed into ladder fragments, which are resolved for the various isoforms by mass, intensity and retention time. Within each layer, short sequencing reads are generated de novo by base-calling nucleotides from mass differences between adjacent ladder fragments. The short reads are then assembled into full-length RNA sequences.

Major Problems Solved with 3D DSMX™-Seq

  • The ability to directly read RNA sequences in a highly parallel manner, where each species’ sequence assembly is compartmentalized.
  • The ability to read RNA sequences de novo (without needing a pre-determined sequence reference).
  • The ability to discern canonical from up to 170 different modified ribonucleotide bases, including those known to inhibit ribonucleotide hydrolysis—irrespective of their clustering.

Demonstration of our Technology

Recent projects have sequenced tRNA, synthetic oligoribonucleotides, siRNA/miRNA mimics, sgRNA and tsRNA samples, and we are completing the validation work necessary to next take on mRNA samples in the near future. Our tsRNA work provided crucial granulation for noteworthy research published in Nature Communications, demonstrating the disruptive and transformative nature of our technology.

One of the first demonstrations of DSMX™-Seq showcased its discriminatory sequencing power. We found that, surprisingly, a commercially provided research-grade tRNA sample from a reputable vendor was actually a heterogeneous mixture of five different length, canonical ribonucleotide base, and modified ribonucleotide base isomers. One of the two canonical base substitution isomers was present at a relative abundance of 0.6 (the other at 0.1), and two different modified base isomers were present at a relative abundance of about 0.2. Later work demonstrated the homogeneous purity of a synthetic 20bp ribonucleotide oligomer, its sensitivity and accuracy confirming not just sequence but stoichiometry of synthetic ribonucleotide oligomer mixtures.

One of the most important demonstrations of DSMX™-Seq showed that a 100 nucleotide modified-base sgRNA sample produced using phosphoramidite chemical synthesis was revealed to be a mixture of the intended major species along with impurity isomers arising from phosphoramidite chemistry. One length isomer (a truncation) represented 20.7% of the sample mass, and three unintended modified ribonucleotide-containing isomers were identified to represent 5.0%, 4.5% and 0.3% of the sample. This result is meaningful to the RNA therapeutics industry, where ribonucleotide base integrity and purity of RNA drugs is generally assumed when synthesized chemically, and suggests that such assumptions need to be revisited. We have also demonstrated sequencing on endogenous tsRNAs isolated from the liver of mice. A tsRNA-Glu-CTC species was isolated and shown to be present in four distinct modified ribonucleotide base isoforms, each containing a single mC, D or D modified base at positions 6, 19 and 20, respectively.

Regulatory

The next generation of mRNA drugs are already expanding beyond vaccines to additional disease states, filling critical therapeutic gaps. But adverse events linked with particular delivery modalities and RNA contaminants are just now being fully understood, and a regulatory framework is still being defined. Major contaminants like length truncations, dsRNA and plasmid template carryover from manufacturing are relatively easy to detect, but the same is not the case with modified nucleotide isomers. It has long been known that even low levels of RNA therapeutic secondary and tertiary sequence isomerization can lead to innate immune sensing, but mechanistic studies show that aberrantly modified ribonucleotides can lead to reduced regulatory control of translation efficiency, altered degradation kinetics, and even nuclear translocation and DNA integration—producing outsized biological effects relative to their abundance.

The emergence of unusually high adverse event rates for COVID vaccines was a wake-up call and heightened public concern and scrutiny of RNA therapeutics. The COVID-19 mRNA vaccines contained intentionally engineered ribonucleotide modifications to enhance durability and reduce unintended immune response, but showed unexpectedly long spike protein mRNA persistence (up to several months—far longer than intended), and lot-to-lot variation in contamination has been associated with varying levels of serious adverse events. Similar observations have been made for monogenic replacement RNA therapies.

Product-specific impurities with unnatural half-lives are a concern because their presence is not easily controllable or knowable with current analytical instrumentation, and their effects are not entirely testable in clinical trials of limited patient populations. Modified nucleotide incorporation in RNA therapeutics therefore merits careful analytical control and correlation with clinical safety data. We can expect an enhanced Quality Assurance framework for detecting and preventing their contribution to RNA therapeutics contamination going forward. This will require new standards, new technologies, and procedures to enable more comprehensive assessments of RNA complexity during lot analysis, including:

  • The use of better, more advanced technology that can detect and directly sequence every RNA species present in a sample, de novo, without the need for reference sequences.
  • New RNA standards from NIST constituting mixtures of primary RNA with minor contaminant species of different primary, secondary and tertiary structures, including species that differ by as little as one, and by as many as 10 of the 170 known ribonucleotide base modifications.
  • Standardized, internally validated operating procedures for using this advanced technology including the use of these new NIST RNA standards as controls for each lot analysis.
  • Basic accreditation of laboratories running these standardized procedures including periodic blind challenge proficiency assessments.
  • More comprehensive quality reporting and storage; never again should we have to speculate about why different lots were associated with unexpected adverse events, or wish we had access to these lots later when the events become apparent.

Academic

Unbiased Sequencing of RNA Modifications & The Human RNome Project

RNA molecules encode biological information not only through their sequences but also through over 170 known chemical modifications. These modifications influence RNA structure, stability, translation, and function—and are implicated in over 100 human diseases, including cancer, diabetes, Alzheimer's, and Parkinson's. As such, RNA and its modification profiles are emerging as powerful biomarkers and therapeutic targets.

Yet, our understanding of RNA sequence and modification diversity remains limited. Current sequencing technologies offer partial insights but fail to provide the full spectrum of RNA sequence variants. In fact, we do not know how many unique RNA molecules or sequence variants are present in a sample, and further, we do not know the complete sequence content of each RNA—including the identity and location of every nucleotide (canonical or modified) within a full-length RNA.

To overcome these limitations, we developed NGMS-Seq, a next-generation mass spectrometry-based sequencing platform. Unlike optical or electronic systems such as Illumina or nanopore-based RNA sequencing, NGMS-Seq uses mass spectrometry as a direct readout to comprehensively sequence full-length RNAs and their modifications—without bias or prior knowledge across isolated and bulk RNA samples.

NGMS-Seq enables three unprecedented capabilities:

  • Exhaustive sequencing of every RNA sequence without omission (targeting all RNA molecules).
  • Unbiased sequencing of all RNA modifications (targeting all RNA nucleotides, modified or not).
  • Global profiling of RNA and its modifications in human diseases (targeting all RNA and modification changes).

These capabilities pave the way for the world's first Human RNome Project, a landmark initiative to draft the first complete sequence of all human RNA molecules and their modifications. Just as Sanger sequencing enabled the Human Genome Project (completed in 2003), NGMS-Seq has potential to deliver the first complete Human RNome—transforming RNA biology, therapeutics, and diagnostics.