Overview
- The ability to directly read RNA sequences in a highly parallel manner, where each species’ sequence assembly is compartmentalized.
- The ability to read RNA sequences de novo (without needing a pre-determined sequence reference).
- The ability to discern canonical from up to 170 different modified ribonucleotide bases (e.g., pseudouracil, 5-methylcytidine, etc.), including those known to inhibit ribonucleotide hydrolysis—irrespective of their clustering.
Regulatory scrutiny of the integrity of RNA therapeutics is expected to increase and the regulatory
framework for this is in the process of being defined. Modified bases in small RNAs and RNA therapeutics
are crucial for processing and biological activity through complex feedback networks involving their
writing, erasing, and sensing—disruption of which results in cellular dysregulation and disease.
For small-RNA drugs the pattern and quality of intended modifications is foundational to their effectiveness
and toxicity. Modified mRNA bases are also used by cells for the regulation of half-life. Some mRNA therapies
like vaccines are intentionally engineered with modified bases at specific positions to ensure persistence,
while others are not; in either case, preparations can include RNAs with unintentional base modifications
that create adverse events.
Authorities are signaling interest in standardized approaches to ensure the safety and efficacy of RNA therapeutics,
including the detection and quantification of both intended and unintended base modifications, prompting calls for
more rigorous, modification-aware analytical assays in submissions. The problem is that while common contaminants
such as dsRNA or plasmid DNA associated with the manufacturing process are easy to detect with existing technologies
(e.g., LC-MS, PCR, or chromatography), the discernment and quantification of modified nucleotides requires direct,
de-novo RNA sequencing—which has not yet been possible. This is the problem we elegantly solve at DirectSeq.
Importance of Small RNA Base Modifications
Micro-ribonucleotides (miRNAs) regulate cellular homeostasis primarily through gene silencing. Modified
ribonucleotide bases (e.g., m6A) are critical for this function, forming binding conformations for proteins
involved in writing, erasing, and sensing. Because these pathways are tightly controlled in cells, quality assurance
of intended base-modification distributions in small-RNA therapeutics is vital for patient safety.
Danger of Unintended RNA Base Modifications
Both phosphoramidite syntheses and IVT reactions can produce trace isomer contaminants containing unnatural
ribonucleotide modifications via chemical degradation (oxidation, deamination, hydrolysis), supplier or lot
impurities, and polymerase side-activities. Lot-to-lot variation in these isomers has been associated with serious
adverse events—including, in rare cases, death.
Analytical Limitations Today
Common contaminants like dsRNA or plasmid DNA are detectable by LC-MS, PCR, or chromatography, but
modified nucleotide discernment and quantification requires direct, de-novo RNA sequencing—
which has not been possible with conventional methods.
Safety & Regulatory Implications
Authorities are signaling interest in standardized, modification-aware assays in submissions to ensure the safety and
efficacy of RNA therapeutics. Lack of visibility into intended and unintended base modifications creates critical risk
and uncertainty for developers and regulators.
A technology Leap is needed
Liquid Chromatography Mass Spec (LC-MS) is used for control of a majority of the RNA therapeutic Critical Quality
Attributes (CQAs), but sequence identity assessments are normally performed through RT-PCR, which is incapable of
identifying nucleotide base modifications and quantifying the isomers containing them. Modified base isomer
distributions should be added to the Critical Quality Attribute list for RNA therapeutics, and these distributions can
only be appreciated through direct RNA sequencing.
The problem with direct RNA sequencing up until now is that sequence could only be obtained from highly purified
samples capable of producing complete ladders, and only in a confirmatory manner which required pre-determined
knowledge of the sequence. Incapable of simultaneous, direct, de novo (and hypothesis-free) sequencing and
quantitative assessment, prior platforms have been incapable of rigorous quality control needed to identify unknown
contaminating modified bases.
The RNA therapeutics industry looks forward to novel analytical tools to gain more insight into CQAs such as sequence
identity, while also opening doors beyond current limitations to address previously unaddressed facets of CQAs (such as
modified base–containing isomers) that have emerged during clinical and post-clinical stages as potential safety concerns.
How DirectSeq Solves This
DirectSeq's Next Generation Mass Spec (DSMX™-Seq) platform is the significant technology leap patients, regulators
and RNA therapeutics professionals have been waiting for. Instead of inferring RNA identity through mass readout, as
with LC-MS, DSMX™-Seq reveals identity directly via de-novo sequencing.
DSMX™-Seq delivers single-run, exhaustive, direct sequencing of RNA species in a sample, including those containing
any of up to 170+ base modifications. It allows sequencing and quantification of not just the major RNA species, but
each minor primary sequence isomer, including those containing one or multiple (even clustered) base modifications,
while also delivering detection readouts like LC-MS (e.g., presence, length, structural variants, cofactors).
This represents a significant advancement beyond the limitations of LC-MS and opens the door for the first time to
empirical, hypothesis- and bias-free assessment of RNA identity and quality.
More on our Technology
3D DSMX™-Seq is the first RNA sequencing technology capable of directly sequencing and quantifying every sequence
of a mixed RNA sample, including minor isomers (down to 1%) differing by only a single canonical or modified base.
Unlike nanopore technologies, it can read modified base sequences found in clusters as they are commonly found in nature.
How it works
The RNA is hydrolyzed into ladder fragments, which are resolved for the various isoforms by mass, intensity and
retention time. Within each layer, short sequencing reads are generated de novo by base-calling nucleotides from mass
differences between adjacent ladder fragments. The short reads are then assembled into full-length RNA sequences.
Major Problems Solved with 3D DSMX™-Seq
- The ability to directly read RNA sequences in a highly parallel manner, where each species’ sequence assembly is compartmentalized.
- The ability to read RNA sequences de novo (without needing a pre-determined sequence reference).
- The ability to discern canonical from up to 170 different modified ribonucleotide bases, including those known to inhibit ribonucleotide hydrolysis—irrespective of their clustering.
Demonstration of our Technology
Recent projects have sequenced tRNA, synthetic oligoribonucleotides, siRNA/miRNA mimics, sgRNA and tsRNA samples,
and we are completing the validation work necessary to next take on mRNA samples in the near future. Our tsRNA work
provided crucial granulation for noteworthy research published in Nature Communications, demonstrating the
disruptive and transformative nature of our technology.
One of the first demonstrations of DSMX™-Seq showcased its discriminatory sequencing power. We found that, surprisingly,
a commercially provided research-grade tRNA sample from a reputable vendor was actually a heterogeneous mixture of five
different length, canonical ribonucleotide base, and modified ribonucleotide base isomers. One of the two canonical base
substitution isomers was present at a relative abundance of 0.6 (the other at 0.1), and two different modified base isomers
were present at a relative abundance of about 0.2. Later work demonstrated the homogeneous purity of a synthetic 20bp
ribonucleotide oligomer, its sensitivity and accuracy confirming not just sequence but stoichiometry of synthetic ribonucleotide
oligomer mixtures.
One of the most important demonstrations of DSMX™-Seq showed that a 100 nucleotide modified-base sgRNA sample produced
using phosphoramidite chemical synthesis was revealed to be a mixture of the intended major species along with impurity
isomers arising from phosphoramidite chemistry. One length isomer (a truncation) represented 20.7% of the sample mass,
and three unintended modified ribonucleotide-containing isomers were identified to represent 5.0%, 4.5% and 0.3% of the
sample. This result is meaningful to the RNA therapeutics industry, where ribonucleotide base integrity and purity of RNA
drugs is generally assumed when synthesized chemically, and suggests that such assumptions need to be revisited. We have
also demonstrated sequencing on endogenous tsRNAs isolated from the liver of mice. A tsRNA-Glu-CTC species was isolated
and shown to be present in four distinct modified ribonucleotide base isoforms, each containing a single mC, D or D modified
base at positions 6, 19 and 20, respectively.
Regulatory
The next generation of mRNA drugs are already expanding beyond vaccines to additional disease states, filling critical
therapeutic gaps. But adverse events linked with particular delivery modalities and RNA contaminants are just now being
fully understood, and a regulatory framework is still being defined. Major contaminants like length truncations, dsRNA and
plasmid template carryover from manufacturing are relatively easy to detect, but the same is not the case with modified
nucleotide isomers. It has long been known that even low levels of RNA therapeutic secondary and tertiary sequence
isomerization can lead to innate immune sensing, but mechanistic studies show that aberrantly modified ribonucleotides can
lead to reduced regulatory control of translation efficiency, altered degradation kinetics, and even nuclear translocation and
DNA integration—producing outsized biological effects relative to their abundance.
The emergence of unusually high adverse event rates for COVID vaccines was a wake-up call and heightened public concern
and scrutiny of RNA therapeutics. The COVID-19 mRNA vaccines contained intentionally engineered ribonucleotide modifications
to enhance durability and reduce unintended immune response, but showed unexpectedly long spike protein mRNA persistence
(up to several months—far longer than intended), and lot-to-lot variation in contamination has been associated with varying
levels of serious adverse events. Similar observations have been made for monogenic replacement RNA therapies.
Product-specific impurities with unnatural half-lives are a concern because their presence is not easily controllable or knowable
with current analytical instrumentation, and their effects are not entirely testable in clinical trials of limited patient populations.
Modified nucleotide incorporation in RNA therapeutics therefore merits careful analytical control and correlation with clinical safety data.
We can expect an enhanced Quality Assurance framework for detecting and preventing their contribution to RNA therapeutics
contamination going forward. This will require new standards, new technologies, and procedures to enable more comprehensive
assessments of RNA complexity during lot analysis, including:
-
The use of better, more advanced technology that can detect and directly sequence every RNA species present in a sample,
de novo, without the need for reference sequences.
-
New RNA standards from NIST constituting mixtures of primary RNA with minor contaminant species of different primary,
secondary and tertiary structures, including species that differ by as little as one, and by as many as 10 of the 170 known
ribonucleotide base modifications.
-
Standardized, internally validated operating procedures for using this advanced technology including the use of these new NIST
RNA standards as controls for each lot analysis.
-
Basic accreditation of laboratories running these standardized procedures including periodic blind challenge proficiency assessments.
-
More comprehensive quality reporting and storage; never again should we have to speculate about why different lots were associated
with unexpected adverse events, or wish we had access to these lots later when the events become apparent.
Academic
Unbiased Sequencing of RNA Modifications & The Human RNome Project
RNA molecules encode biological information not only through their sequences but also through over 170 known chemical
modifications. These modifications influence RNA structure, stability, translation, and function—and are implicated in over
100 human diseases, including cancer, diabetes, Alzheimer's, and Parkinson's. As such, RNA and its modification profiles are
emerging as powerful biomarkers and therapeutic targets.
Yet, our understanding of RNA sequence and modification diversity remains limited. Current sequencing technologies offer partial
insights but fail to provide the full spectrum of RNA sequence variants. In fact, we do not know how many unique RNA molecules or
sequence variants are present in a sample, and further, we do not know the complete sequence content of each RNA—including the
identity and location of every nucleotide (canonical or modified) within a full-length RNA.
To overcome these limitations, we developed NGMS-Seq, a next-generation mass spectrometry-based sequencing platform. Unlike optical
or electronic systems such as Illumina or nanopore-based RNA sequencing, NGMS-Seq uses mass spectrometry as a direct readout to
comprehensively sequence full-length RNAs and their modifications—without bias or prior knowledge across isolated and bulk RNA samples.
NGMS-Seq enables three unprecedented capabilities:
- Exhaustive sequencing of every RNA sequence without omission (targeting all RNA molecules).
- Unbiased sequencing of all RNA modifications (targeting all RNA nucleotides, modified or not).
- Global profiling of RNA and its modifications in human diseases (targeting all RNA and modification changes).
These capabilities pave the way for the world's first Human RNome Project, a landmark initiative to draft the first complete sequence of
all human RNA molecules and their modifications. Just as Sanger sequencing enabled the Human Genome Project (completed in 2003),
NGMS-Seq has potential to deliver the first complete Human RNome—transforming RNA biology, therapeutics, and diagnostics.