The landscape of antibody–drug conjugate

Antibody–drug conjugates (ADCs) have become a key therapeutic modality in oncology, spurred by superior clinical profiles compared to standard-of-care chemotherapy across multiple indications. Consequently, revenue from approved ADCs and those in phase III development is forecasted to reach $26 billion in 2028 (Supplementary Fig. 1).

Despite the success of ADCs, long-term growth in their application faces two main challenges. First, the few validated payload mechanisms of action (MoAs) restrict addressable indications. Approved ADC payloads cover three cytotoxic MoAs — anti-mitotic, DNA alkylation and topoisomerase 1 inhibition — that typically require tumour-specific overexpression of the target antigen to ensure sufficient and safe payload delivery. As such, these ADCs primarily target established tumour antigens such as HER2, CD20 and BCMA.

Second, non-specific and insufficient payload delivery narrows the therapeutic window of ADCs. Delivery components for approved ADCs typically include cleavable peptide linkers stochastically conjugated through cysteine reduction to monoclonal antibody carriers. Premature payload release, poor tumour penetration, variable drug-to-antibody ratios and aggregation are common issues.

To explore the impact of next-generation ADC technology on these challenges, we investigated innovation in the ADC clinical pipeline across five design levers — target, payload MoA, antibody, linker and conjugation method — and assessed the likelihood for expanding the addressable indications or widening the therapeutic window of ADCs.

Assessment of the clinical pipeline

ADC assets in development were categorized into two types based on the potential to overcome the two main challenges (Supplementary Fig. 2a). Type-1 assets have new targets and/or payload MoAs and have first-in-class potential. Type-2 assets leverage established target/payload MoA combinations with novel delivery components to achieve a best-in-class profile.

Daiichi Sankyo’s patritumab deruxtecan, which targets HER3, is an example of a type-1 asset.Recent data from the phase II HERTHENA-Lung01 trial showed an overall response rate (ORR) of 30% in patients with an EGFR mutation and previously treated with an EGFR inhibitor and platinum-based chemotherapy, compared to an estimated real-world ORR of 14% in a similar patient population.

Merck and Kelun Biotech’s SKB264, an example type-2 asset, has the same target and payload MoA as the approved ADC sacituzumab govitecan (Trodelvy; Gilead). SKB264’s differentiated 2-methylsulfonyl pyrimidine linker increases stability in circulation compared to sacituzumab govitecan. An ORR of 40%, with 56% of patients having adverse events (AEs) of grade 3 or higher, was reported for a phase II study  of SKB264 in patients with pre-treated metastatic triple-negative breast cancer. This compares favourably to the 21% ORR and 74% AE rate reported for sacituzumab govitecan  in a similar patient population.

We applied this framework to a database of 168 ADCs in clinical development (Supplementary Fig. 2b). Overall, ~85% of assets address solid tumour indications, with breast and lung cancer the most common (Supplementary Fig. 3). Of the phase III ADCs, ~60% are type-2 assets that leverage established targets and payload MoAs with improved delivery components, which may reflect the recency of the modality’s success and a lower risk tolerance in late-stage development. Greater biological risk is evident in earlier-stage development; ~75% of phase I/II ADCs are type-1 assets with novel combinations of targets and payload mechanisms. Fig. 1 highlights specific components used by assets in development, with the highest concentration in components validated by approved products and a long-tail of single-digit assets with novel components (Supplementary Fig. 4).

Fig. 1 | Assessment of ADCs in clinical development. Innovation across specific design levers used in clinical assets. The top 10 targets and technologies for each lever are shown. Ab, antibody; ADC, antibody–drug conjugate; IgG, immunoglobulin. See Supplementary information for details and an expanded version.

Assessment of next-gen targets

Biological targets are a major innovation area for ADCs, with 61 unique targets under investigation in the clinic. Overall, ~90% of targets are antigens highly expressed on cancer cells, and ~10% of targets are associated with unique characteristics of the tumour microenvironment. For example, Pyxis Oncology’s PYX-201 targets fibronectin, an extracellular protein highly secreted by cancer-associated fibroblasts. ADCs targeting stromal components may prove effective against tumours with  high stromal–tumour ratios such as breast and prostate cancers, and abrogate the evolution of resistance due to the genetic stability of stromal cells.

Assessment of next-gen technology

To gauge the potential impact of ADC innovation, we categorized next-gen technology according to our innovation framework and assessed the profile compared with marketed ADCs (Fig. 2). Here, we highlight a subset of next-gen design levers that may be of interest given their novelty and/or promising preclinical and clinical data.

Fig. 2 | Assessment of next-generation ADC technology. Evaluation of next-gen ADC components on potential to either expand ADC applicability or optimize delivery components. Novel biological targets are not considered a technology and therefore not included in the assessment. ADC, antibody–drug conjugate. See Supplementary information for details and an expanded version.

Next-gen payloads. Small-molecule degraders are a promising payload class given their high specificity, picomolar potency and ability to target a broad set of intracellular proteins associated with cancer. Orum Therapeutic’s ORM-5029 delivers a degrader payload selective for GSPT1, a GTPase overexpressed in multiple cancers including gastric, colorectal and breast cancer. Anticancer activity similar to trastuzumab deruxtecan (Enhertu; Daiichi Sankyo) was reported in breast cancer models in preclinical studies. The upcoming phase I data for ORM-5029 will be the first clinical data for antibody–degrader conjugates.

Next-gen carriers. Engineering antibodies with variable antigen binding affinity can reduce off-tissue toxicities and increase tumour-specific exposure. Strategies include shielding Fab domains with peptide masks susceptible to cleavage by proteases overexpressed in tumours, and engineering antibodies with optimized pH-sensitive binding characteristics. Despite past failures, antibody engineering has potential to expand the therapeutic window and treat patients with lower target expression levels. For example, Mythic Therapeutic’s MYTX-011 is designed with lower antigen affinity at endosomal pH to enhance payload escape from endosomes. Increased internalization, cytotoxic activity and in vivo efficacy against c-Met expressing tumour models relative to the parent antibody and a clinical c-Met-targeted ADC was reported in preclinical studies.

Next-gen linkers. Emerging linker technology focuses on controlled payload release independent of endogenous enzyme-mediated cleavage. TagWork’s preclinical ADC TGW101 uses an exogenously administered chemical activator to induce payload release. Superior anti-tumour activity in colorectal and ovarian cancer xenograft models compared to VC-peptide linkers was reported in preclinical studies. Controlled payload release can limit off-tissue toxicity and support further development of assets targeting non-internalizing proteins overexpressed in cancers.

Next-gen conjugation. Incorporation of non-natural amino acids into the antibody carrier facilitates site-specific conjugation through oxime bonds. Ambrx’s ARX788 is a HER2-targeted ADC with a non-cleavable PEG linker attached in a site-specific manner to non-natural amino acids. Phase I data showed anti-tumour activity and improved stability in serum relative to approved HER2-targeted ADCs. Reduction of premature payload release can increase the amount delivered to tumour cells and drive higher response rates.

Conclusion
As ADCs gain traction and new technologies emerge, companies must effectively evaluate emerging platforms, determine investment strategies to maximize expertise and capabilities, and decide whether a first-in-class or best-in-class approach is more attractive. A ‘one-size fits all’ technology is unlikely to emerge in the near-term. Instead, we anticipate companies will build diverse collections of components to enable ‘plug-and-play’ development tailored to specific targets and indications.

神经退行性疾病的机制

https://ars.els-cdn.com/content/image/1-s2.0-S0092867422015756-gr2_lrg.jpg

病理性蛋白聚集

蛋白聚集这一特征经常被作为疾病诊断和分类的依据,相关的疾病包括阿尔兹海默(Alzheimer disease, AD),帕金森(Parkinson disease, PD),原发性陶氏病(primary tauopathies),额颞痴呆(frontotemporal dementia, FTD),肌萎缩性脊髓侧索硬化症(amyotrophic lateral sclerosis, ALS)等。对于很多NDDs而言,蛋白聚集都发生在脑区,与临床预后相关,提示其可能发挥病理作用。遗传学角度的解析发现NDDs中有很多突变,这会导致有毒功能的积累,正常功能的丧失,另外还有因为朊蛋白的错误折叠,从而促进病理过程。相关的生物标志物有Aβ42/Aβ40,磷酸化的tau等。但需要注意的是,蛋白聚集并不总是与疾病进展完美匹配,还有许多其他的机制造成神经毒性。另外还有一些非蛋白致病性的NDDs,例如创伤性脑损伤(traumatic brain injury, TBI),慢性创伤性脑部病变(chronic traumatic encephalopathy, CTE),中风等。

突触和神经元网络功能障碍
NDDs的症状典型地反映了特定神经网络的紊乱,神经元网络功能的发挥依赖于突触功能的精准调控,而突触的损伤和毒性似乎是在神经元损伤前就出现的。突触功能受线粒体功能调控,需要精准的能量调节以维持钙稳态和离子平衡,星形胶质细胞和小胶质细胞也在能量,神经递质稳态,突触消除和稳定中发挥着重要作用。研究表明多种NDDs与突触功能障碍相关,例如在AD,PD,FTD中,突触功能障碍是早期事件,这一事件同时得到了成像的验证。由于涉及到神经元网络功能异常,它会同时与多个NDD特征相关。

蛋白质稳态异常
蛋白质稳态的维持依赖于泛素-蛋白酶体系统(UPS)和自噬-溶酶体途径(ALP)这两种主要的细胞机制,NDDs中泛素化聚集蛋白的积累提示蛋白质稳态发生了异常。UPS两个组分UBQLN2和VCP的突变与ALS/FTD相关,NDDs中聚集的一些蛋白也会损害UPS的功能。类似地,脑组织特异的自噬失活会导致NDDs和聚集蛋白积累,PD相关的基因和蛋白都在自噬和溶酶体功能中发挥重要作用,它们能够造成ALP和突触功能异常,影响细胞-细胞间蛋白运输和神经细胞死亡。

细胞骨架异常
神经元的细胞骨架主要有3个聚合结构:基于微管蛋白的微管,中间纤维,以肌动蛋白为基础的微丝,三者之间互相关联但蛋白组分和直径不同。这些结构是神经元建造维持,组织运输的基础,支撑着能量稳态和突触功能。NDDs中神经元细胞骨架改变,传递信息,交换物质的能力丧失,主要涉及细胞骨架相关基因的突变如中间纤维基因的突变,轴突运输机器的突变,蛋白的异常磷酸化和失调,神经丝和肌动蛋白的异常聚集等。细胞骨架的破坏与其他NDD标志间也存在着互作,尤其是突触维持的丧失和能量代谢的改变,RNA运输,蛋白质聚集和自噬,以及神经元死亡。

能量稳态改变
多种NDDs中都呈现出能量代谢缺陷。ATP是大脑能量代谢的关键分子,其由葡萄糖或乳酸代谢产生,这两个能量底物能够通过星形胶质细胞直接从血流传到神经元。以往的研究表明,低ATP利用率与线粒体功能障碍等都能诱发神经元功能失调甚至是细胞死亡。神经退行性疾病中风就是一个很好的例子,兴奋毒性和能量耗竭共同促进疾病进展。此外,糖酵解,脂代谢,线粒体代谢中的酶的遗传缺陷也会导致神经系统出现故障。类似的机制还有线粒体DNA的突变等,而线粒体的损伤又会进一步影响Ca2+稳态,从而影响与之相关的多种生物学过程,由此与其他NDD标志物也紧密相连。

DNA和RNA缺陷
DNA层面的改变能够驱动突变,染色质重排等多种不利分子事件,RNA代谢和稳态的缺陷会导致RNA转运等生物过程异常。DNA损伤的危害在几种罕见遗传性疾病中表现得尤为明显,它会削弱响应和清除基因组应激的能力,氧化DNA损伤更被提出可作为AD模型和PD样病理的风险因子。此外,NDDs中常见的蛋白也参与不同的DNA损伤应答,DNA损伤与DNA损伤感知器PARP1的超活化和线粒体功能失调有关。部分NDDs患者呈现出RNA调节异常,主要是代谢改变,多系统的蛋白质病变等,从而影响RNA稳态。

炎症
神经炎症包括小胶质细胞增生和星形胶质细胞增生,前者存在于所有NDDs中,小胶质细胞负责大脑感知,管家和防御功能,NDDs中聚集蛋白等危险信号会激活小胶质细胞,调节细胞因子,ROS的产生,与星形胶质细胞的互作等,与其功能相关的很多基因都在AD中被鉴定出来。与小胶质细胞联系紧密的星形胶质细胞也对神经元的健康和功能十分重要,尤其是在谷氨酸稳态和三突触网络中,其与小胶质细胞间的紧密互作可能还会共同促进NDD进展。

神经细胞死亡
神经元一些固有特性使得其特别容易受到细胞死亡的影响。首先,有丝分裂后细胞意味着其逐渐积累年龄相关的损伤以及无法复制更新;其次,由于需要供给突触功能和ROS产生,对能量要求极高;延展的轴突树突意味着运输和结构组织需要远距离实现;依赖胶质细胞来维持,产能和防御。前面提到的NDD相关标志特征都与神经元损伤有关,这最终会导致脑容量损失和神经细胞内蛋白的释放。关于神经元死亡有两个比较公认的机制:内在外在的凋亡和细胞坏死。其他机制包括坏死性凋亡,铁死亡等也发挥作用,不同机制间存在crosstalk。与细胞死亡相关的分子层面途径包括线粒体渗透性过渡孔(mPTP)的打开,突触前和生长树突电压依赖的Ca2+通道的激活,不依赖caspase的染色质凝结等。此外,非细胞自主性死亡也导致神经元丢失。

https://www.sciencedirect.com/science/article/pii/S0092867422015756

genome-editing for cardiomyopathy

Genome editing has progressed rapidly from discovery to clinical development, while preclinical studies continue to refine the approaches. Now, two papers in Nature Medicine showcase the potential of two such strategies — base editing and CRISPR–Cas9 — for prevention of hypertrophic cardiomyopathy (HCM) in mouse models. The studies also highlight different advantages and challenges for each genome-editing strategy.

HCM is caused by mutations in cardiac sarcomeric genes that lead to thickening of the heart muscle and can cause heart failure and sudden cardiac death. A dominant-negative pathogenic variant in the sarcomeric protein β-myosin, c.1208G>A, is a well-studied cause of severe HCM.

In one of the studies, Chai et al. set out to correct the c.1208G>A mutation using a base editor — a fusion protein consisting of a modified Cas9 nickase and a deaminase enzyme that converts one DNA base into another, at a site determined by a guide RNA (gRNA) sequence.

The authors used human induced pluripotent stem cells (iPSCs) to screen various adenine base editors (ABEs) with different editing efficiency and specificity. They opted for an ABE with a narrow editing window. Although this had a relatively low efficiency (34%), the risk of bystander edits — whereby the base editor modifies other adenine residues close to the target adenine — was also low.

In cardiomyocytes generated from patient-derived iPSCs heterozygous for the c.1208G>A mutation, treatment with the base editor reduced contractile force generation and ATP consumption back to normal levels. The researchers detected minimal bystander editing with little to no off-target editing at distant DNA sites.

Next, Chai et al. generated a mouse model that carried human sequences encoding the β-myosin pathogenic variant. Owing to the large size of the base editor enzyme, they delivered it over two adeno-associated virus 9 (AAV9) vectors, and used a troponin T promoter to target expression to cardiomyocytes.

Mice received intrathoracic injection of the AAV9-vectored base editor immediately after birth, prior to development of HCM. Compared with untreated mice, at 8–16 weeks ABE-treated mice showed reduced features of HCM, such as ventricular wall thickening, and had similar echocardiographic readouts to wild-type mice.

The investigators detected a 32% editing efficiency of the target pathogenic adenine in cardiomyocytes, with no bystander editing. Low-level editing in off-target tissue such as the liver was detected.

In the other study, Reichart et al. tested both an ABE and a CRISPR–Cas9 nuclease that would respectively correct or silence the β-myosin mutation.

They selected an ABE with high editing efficiency, which corrected the mutation in >70% of left ventricular cardiomyocytes. Treatment of mouse models at 10–13 days of age prevented onset of HCM cardiac morphology and dysfunction for 32 weeks. However, bystander edits occurred at a rate of 3–5%, and a low but significant rate of off-target editing was detected.

Reichart et al. also designed a CRISPR–Cas9 nuclease system to selectively inactivate the c.1208G>A mutation in cardiomyocytes. Here, the gRNA directed Cas9 nuclease to make double-stranded breaks (DSBs) and generate indels in the target gene. The smaller size of Cas9 compared with ABEs enables the therapeutic to be packaged in a single AAV. And although this approach cannot correct a mutation, it is more amenable to application across different mutations.

Intrathoracic injection of the CRISPR–Cas9 prevented HCM onset in mouse models. However, high-dose treatment was associated with impaired cardiac function resulting from editing of the wild-type allele, suggesting a narrow therapeutic window.

Further studies and emerging clinical data will assist researchers in balancing editing efficiency with safety and other parameters to select appropriate genome-editing tools.

1- Chai, A. C. et al. Base editing correction of hypertrophic cardiomyopathy in human cardiomyocytes and humanized mice. Nat. Med. 29, 401–411 (2023)

2- Reichart, D. et al. Efficient in vivo genome editing prevents hypertrophic cardiomyopathy in mice. Nat. Med. 29, 412–421 (2023)

Inhibiting cap snatching

To prime their own transcription, influenza A virus (IAV) and influenza B virus (IBV) use ‘cap snatching’, in which the 5′ end of fully capped host RNA is removed and attached to viral mRNAs. Deficiency in host RNA methyltransferase MTR1 — which mediates a 2′-O-methylation step required for host cap maturation — has previously been shown to enhance antiviral interferon responses to the IAV, reduce its cap-snatching efficacy and impair replication. MTR1 may therefore represent a potential target for anti-influenza drugs. Here, Tsukamoto et al. demonstrate in a human cell line that MTR1 is essential for the initiation of replication of both IAV and IBV. An in silico screen of 5,597 compounds, followed by molecular docking studies using the crystal structure of MTR1, identified the adenosine analogue, tubercidin — a natural product from Streptomyces — as an inhibitor of MTR1. Evaluation of 115 tubercidin-related compounds in antiviral drug assays revealed trifluoromethyl tubercidin (TFMT) to be the most effective non-toxic compound. TFMT inhibited replication of various strains of IAV and IBV in human bronchial epithelial cells, human lung explants ex vivo, and mice, without signs of toxicity. In vitro mechanistic studies confirmed TFMT to inhibit IAV cap snatching, acting synergistically with antivirals baloxavir marboxil and oseltamivir.

Tsukamoto, Y. et al. Inhibition of cellular RNA methyltransferase abrogates influenza virus capping and replication. Science 379, 586–591 (2023)

Peptide barcodes meet drug discovery

The success or failure of small-molecule drug discovery efforts strongly depends on the “hit-finding” approaches that are applied at the inception of the drug discovery program (1). High-throughput screening of compound collections is still the main strategy (2), but several other approaches have shown promise. These include screening virtual libraries using three-dimensional protein structure or ligand information (3), de novo design of ligands (4), screening fragment (very small molecule) libraries (5), screening (cyclic) peptide libraries (6), repurposing existing compounds, and screening DNA-encoded libraries (DELs) (7). On page 939 of this issue, Rössler et al. (8) reveal a new hit-finding method that uses peptide-encoded libraries (PELs), which are similar to DELs.

In PELs, solid-phase peptide and small-molecule syntheses are used to readily generate large libraries of bifunctional molecules that each consist of a peptide tethered to a small molecule through a cleavable linker. After cleavage from the solid phase, these libraries are incubated with an immobilized therapeutic protein of interest for affinity selection. To identify those molecules that bind to the target protein, the peptide is cleaved from the bifunctional molecule and sequenced using mass spectrometry technologies that are normally applied in proteomics research (such as nanoscale liquid chromatography. tandem mass spectrometry). On the basis of the sequence of the peptide, the chemical structure of the smallmolecule ligand can be identified. This is because the single amino acids that are used for synthesis of the peptide directly encode the corresponding chemical building blocks that are used to synthesize the small molecules.

The advantages of PEL technology over DELs are manifold. Most notably, a PEL supports harsher and more diverse chemical reactions, including metal-catalyzed reactions and reactions that require strong acidic or basic conditions. This breadth enables the synthesis of a wider scope of drug-like molecules. Another advantage is the application of solid-phase synthesis for peptides and small molecules, which allows the use of excess reactants. This, in turn, supports a higher yield and purity of the final small molecules, which is expected to substantially improve the quality of the libraries. Changing the tagging moiety from four DNA bases to a peptide that contains 16 different amino acids enables a higher information capacity. Thus, in theory, even larger libraries of small molecules could be synthesized and encoded. If an eight-digit encoding string is used, then there are 16 amino acids (hexadecimal system) that can generate 4.3 billion possible codes. By contrast, there are only 56,535 possible codes using the four bases of DNA.

It is thought that the DNA tag of the DELs could interfere with targets that are per se DNA-binding, such as transcription factors or RNA. By contrast, libraries with peptide tags would potentially be better suited for screening against such targets because the amino acids used for the peptide synthesis are less likely to bind to those targets. To confirm that a hit identified in a DEL screen can actually bind to a target, the hit compound is synthesized without a DNA tag and then tested for its effect on biological activity. This can be tedious because, during DEL library synthesis, not every chemical reaction is successful. Occasionally, reaction byproducts are the biologically active compounds, and it takes several investigations to determine this. The hit resynthesis that stems from a PEL can still be performed by solid-phase synthesis using the same conditions that were used to construct the library. This allows a more rapid synthesis and makes the identification of potential by-products easier.

Some challenges need to be overcome to fully exploit PEL technology. Peptide concentrations must be present in at least a 10 fM range to be detected by mass spectrometry. This affects the size of a PEL because, in contrast to a DNA tag, the peptide tag cannot be amplified. The screens of Rössler et al. were performed at ∼1 nM concentration for each peptide-tagged compound. This means that a 100,000-membered library could be screened at a 100 µM library concentration. Because the peptide tags used by Rössler et al. were mostly hydrophobic, there is a certain risk of solubility problems and unspecific peptide aggregation of the library members, which could interfere with binding to a putative target and lead to screening artifacts.

The libraries generated by Rössler et al. were screened against the targets human carbonic anhydrase IX, the epigenetic reader bromodomain-containing protein 4 (BRD4), and the E3 ubiquitin ligase mouse double minute 2 homolog (MDM2). In all cases, several hits were identified that could serve as interesting starting points for further improvements of their potency and properties. PELs could potentially be enhanced by exploiting a wealth of already-established solid-phase organic chemistry reactions to generate new druglike molecules in a chemical space that is not accessible by the DEL technology. Such libraries would be of high interest to drug discovery groups for screening against therapeutic protein targets for which no small-molecule ligands are yet known.

REF ERENCES AND NOTES
1. D. G. Brown, J. Boström, J. Med. Chem. 61, 9442 (2018).
2. P. S. Dragovich, W. Haap, M. M. Mulvihill, J.-M. Plancher, A. F. Stepan, J. Med. Chem. 65, 3606 (2022).
3. F. Gentile et al., Nat. Protoc. 17, 672 (2022).
4. M. Skalic, J. Jiménez, D. Sabbadin, G. De Fabritiis, J. Chem. Inf. Model. 59, 1205 (2019).
5. D. A. Erlanson, S. W. Fesik, R. E. Hubbard, W. Jahnke, H. Jhoti, Nat. Rev. Drug Discov. 15, 605 (2016).
6. C. Sohrabi, A. Foster, A. Tavassoli, Nat. Rev. Chem. 4, 90 (2020).
7. R. A. Goodnow Jr., C. E. Dumelin, A. D. Keefe, Nat. Rev. Drug Discov. 16, 131 (2017).
8. S. L. Rössler, N. M. Grob, S. L. Buchwald, B. L. Pentelute, Science 379, 939 (2023)

The lysosomal degraders

Novel targeted degrader concepts are appearing almost daily. In September, Genentech researchers unveiled a cancer-targeting strategy that uses bispecific antibodies to induce degradation of cell membrane proteins — the latest addition to the fast-growing family of targeted degraders. The very next day, University of California, San Francisco biochemist James Wells published on bispecific antibodies that link cytokine receptors to extracellular and membrane protein targets. (EpiBiologics has licensed the technology.) These and other degraders take different roads, but the destination is the same: the lysosome.

The prototype degrader is the PROTAC, or proteolysis-targeting chimera, which eliminates target proteins by linking them to an E3 ubiquitin ligase for destruction in the proteasome. But PROTACs can’t degrade many categories of disease target, so academic and industry scientists are extending the degrader concept to a diverse set of new targets by exploiting the cell’s other main recycling center, the lysosome. These bifunctional small molecules have the potential to vastly enlarge the degrader space, but as they are so new, the field is still figuring out how best to select, optimize and deploy them, and how to minimize their risks.

Targeted degraders in general can eliminate targets that are otherwise undruggable, because they need only bind to work, not inhibit, and, unlike small-molecule inhibitors, they remove the target in its entirety. First described in 2001, PROTACs emerged as a viable drug modality in 2015, when Craig Crews’ group at Yale described fully small-molecule catalytic PROTACs with efficacy at nanomolar concentrations. Developing PROTACS now commands the efforts of about nine biotechs and numerous pharmas, with at least a dozen clinical trials in progress. But PROTACs can degrade only intracellular proteins and a subset of membrane proteins with suitable intracellular domains. Many other potential drug targets remain inaccessible, including secreted proteins and many integral membrane and membrane-associated proteins, as well as protein aggregates, lipids and whole organelles.

The new generation of degraders avoids the ubiquitin–proteasome system entirely and instead harnesses cellular pathways for delivering cargo to the lysosome for degradation. LYTACs, or lysosome-targeting chimeras, and ATACs, or ASGPR (asialoglycoprotein receptor)-targeting chimeras, link an extracellular domain of a target molecule to a cell-surface receptor that shuttles it, by endocytosis, to lysosomes. Genentech’s PROTABs, or proteolysis-targeting antibodies, and EpiBiologics’ AbTACs, or antibody-based PROTACs, hijack cell membrane ubiquitin ligases that also direct targets to lysosomes, and EpiBiologics’ KineTACs degrade extracellular and membrane proteins by tethering them to cytokine receptors for cell internalization. Several other degraders exploit natural autophagy pathways, in which specialized vacuoles engulf protein aggregates and complexes, or damaged organelles, for lysosomal delivery.

The field of lysosomal degraders is very young but is quickly growing. Besides LYTACs and the other extracellular protein degraders, at least five different autophagy-based degraders are in development (Table 1). At last count, five start-up companies have emerged from stealth mode, with over $434 million dollars invested. But while PROTACs have validated the degrader concept, lysosomal degraders are still untested in patients, and they face challenges and uncertainties as they advance to the clinic. These include oral delivery of bulky degrader molecules, the risk of off-target degradation, the potential to overload lysosomes, and the possibility that hijacking these pathways will disrupt cellular proteostasis.

Garbage in

LYTACs are the most visible new degrader modality. They can theoretically degrade most extracellular and membrane proteins, which together make up about 40% of the proteome. The first LYTACs originated in the Stanford lab of glycobiologist Carolyn Bertozzi, where Stanford University chemical biologist Steven Banik, then a postdoc, was looking for ways to eliminate proteins that crosslink the glycocalyx (the dense sugar-rich layer surrounding cells), which helps drive cancers. The Bertozzi lab had developed artificial glycoprotein polymers, and Banik planned to attach them to glycan-binding disease-related proteins and remove them, by shedding, from the cell surface. But eventually the duo hit on the idea of harnessing the cell’s natural machinery with PROTAC-like bifunctional molecules that operate in the extracellular space and, in combination with membrane-bound transporter molecules, grab extracellular proteins and shuttle them to the lysosome, where they are enzymatically digested (Fig. 1).

The initial LYTAC paper from the Bertozzi lab reported degradation of several therapeutically important proteins — the extracellular protein ApoE4 and membrane proteins EGFR, CD71 and PD-L1 — by linking them to the mannose 6-phosphate receptor, which internalizes and carries glycoproteins to the lysosome. Degrading membrane proteins was a feat because LYTACs must compete with these proteins’ normal internalization machinery. Banik, Bertozzi and colleagues later showed that degraders that used a second internalizing receptor, ASGPR, also worked, in this case in a tissue-specific manner. (Avilar Therapeutics’ ATACs are similar.)

In 2019 Bertozzi founded Lycia Therapeutics to develop and commercialize LYTACs. A $50 million A round followed, and in August of 2021 Lycia signed a research collaboration and licensing agreement with Eli Lilly. The companies have yet to disclose any targets or disease indications, except for Lilly’s focus on immunology and pain. “We’re looking for areas of biology and targets where antagonism has failed,” says Lycia CSO Steve Staben: “things like difficult-to-drug protein aggregates, immune complexes, some of these recalcitrant and ligand-independent membrane targets.” Autoantibodies are also attractive. Other possible targets include membrane proteins with multiple functions and interacting partners, making it necessary to completely remove the protein. Many receptor tyrosine kinases fall into this category, including cancer drivers EGFR, HER2 and FGFR. “A single degrader approach for one of those may be more efficacious than simple kinase domain inhibition or blocking natural ligand binding,” says Staben.

LYTACs enjoy more design flexibility than PROTACs because they don’t have to be cell permeant. Small molecules, peptides, monovalent antibodies, bispecific antibodies — almost anything works if it can bring the target to the receptor. “It’s important for us to have multiple modalities to target these extracellular proteins,” says Lycia president and CEO Aetna Wun Trombley. “You can imagine a fully small-molecule LYTAC that has quite a different pharmacological profile than a fully biologic LYTAC. And so, depending on the situation, it may be more beneficial to apply one of those modalities versus another.”

LYTACs are also relatively straightforward to design. In contrast, PROTACs must bind both targets in a ternary complex that enables the ligase to transfer ubiquitins to the target protein’s lysine residues. The lysines must be oriented properly for ubiquitin binding, and ubiquitination doesn’t always lead to proteasomal degradation. The rules for these interactions remain largely unknown, so drug discovery is empirical and expensive. LYTACs are more forgiving. “You just need to induce proximity transiently to the internalizing receptor,” says Staben. Once achieved, “there’s no need to maintain that ternary complex formation necessarily, there’s no absolute requirement for the suitable presentation of lysines.”

But unlike PROTACs, first-generation LYTACs do not act catalytically, so they may not be able to clear highly abundant proteins. Lycia isn’t ruling out catalysis in the future. “It’s more an engineering feat at this point,” says Banik. “We need to find something that’s going to let go of cargo in the lysosome while it itself can be recycled back to the cell surface.”

LYTAC toxicity, at this early stage, is unknown, but the general approach seems safe because it employs the cell’s normal endocytic machinery. However, the endosomal–lysosomal system is damaged in many diseases and chronic conditions, including neurodegenerative diseases, and the LYTAC cargo burden could make things worse. For example, impaired cargo degradation in lysosomes can lead to lysosomal membrane permeabilization and cathepsin release into the cytosol, inducing either apoptotic or necrotic cell death. “How a cell turns over its proteins is something that I think disease mechanisms tend to alter,” acknowledges Banik. However, “we have at least done initial profiling work to show that LYTACs themselves don’t really cause lysosomal damage in any way.”

Still, Lycia doesn’t want to overload sick lysosomes. “The different disease indications we’re looking at are not known to be associated with dysregulated lysosomal function,” Staben says. And selecting internalizing receptors with specific tissue expression should avoid any problems if issues do arise. “So it may be possible even in some of these patient populations that have lysosomal function changes, that we can avoid hypothetical on-target mechanism-driven toxicity.”

Avilar CEO Daniel Grau says the company hasn’t seen any safety issues in its preclinical studies, including testing in non-human primates. And, like Lycia, Avilar is staying away from diseases with known lysosomal damage or capacity issues. But degraders, by monopolizing the ASGPR (or any internalizing receptor), could theoretically block normal degradation and cause side effects. Grau cites a knockout study showing no reduction in plasma glycoprotein clearance in mice, suggesting that redundant systems are compensating. “There are a lot of backup pathways to degrade proteins,” says Avilar CSO Effie Tozzo. “Also … we’re definitely not saturating the receptor with our ATACs in our pipeline programs.” So degraders shouldn’t affect normal protein clearance. But, until safety is proven in the clinic, such interference remains a concern.

Both Lycia and Avilar are exploring multiple internalizing receptors, but Avilar has staked out a strong chemistry position with ASGPR. “They’ve disclosed quite a bit in terms of optimizing ligands for that receptor,” says Casma Therapeutics CSO Leon Murphy. “It’s impressive.” Unlike the original Banik–Bertozzi ligands, arrayed as bulky chains on their polysaccharide backbones, Avilar’s bypass the need for multivalency to activate ASGPR. “We can dispense with the requirement of having a multimodal presentation,” says Grau. Smaller size enables oral delivery. (Lycia, says Staben, also now has monovalent receptor ligands.) One limitation: while ASGPR degraders can target any extracellular protein, they can only degrade membrane protein targets expressed on liver hepatocytes, where ASGPR is expressed.

But the mannose 6-phosphate receptor and ASGPR are just two effector molecules. The total number of internalizing receptors and other trafficking proteins may be vast. “[There] could end up being hundreds of proteins that one might be able to think about using as a way to get to lysosomes quickly,” says Banik, whose Stanford lab is working to identify potential protein shuttles.

Another outside-in approach for degrading membrane proteins recently emerged from Wells’s lab: AbTACs. AbTACS are bispecific antibodies that place a transmembrane E3 ligase next to a membrane protein of interest, sending it to the lysosome for degradation. (Some E3 ligases sort proteins for degradation in the lysosome instead of the proteasome.) Genentech’s PROTABs used the same membrane E3 ligases to efficiently degrade several cancer-driving membrane proteins in cells. The Genentech team also identified 38 different membrane E3 ligases, many of them cell type specific, that future degraders might use to shuttle membrane targets to lysosomes.

But AbTACs and PROTABs may not be able to degrade extracellular (secreted) proteins, and, like PROTACs, they have tricky ubiquitination requirements, so proximity alone may not induce degradation. Wells’s KineTACs, or cytokine-targeted chimeras, don’t have those limitations, degrading the extracellular protein VEGF (vascular endothelial growth factor) in addition to membrane proteins. All these degraders expand on LYTACs “by both providing additional degradation platforms and tissue selectivity,” Wells writes in an e-mail. Both AbTACs and KineTACs were licensed to EpiBiologics in October.

Can autophagy deliver?

For intracellular proteins, PROTACs and molecular glues — small-molecule E3 ligase binders that alter that enzyme’s surface to recruit a target protein for degradation — remain the dominant degrader modalities. But their inability to degrade aggregates, non-protein targets and organelles has inspired a set of degraders that instead hijack the autophagy system, which efficiently clears such debris from the cell.

Autophagy, which means “self-eating,” starts when the phagophore, a cup-shaped vesicle, scoops up cell contents. The phagophore closes around its cargo to form the double-membrane autophagosome vesicle, which then fuses with the lysosome and deposits its contents there for digestion. “These are normal pathways that we can just enhance further … with a small-molecule degrader,” says Casma’s Murphy.

Companies see potential for autophagy-based degraders in neurodegenerative diseases, particularly those characterized by mutant protein inclusions (Huntington’s), by intracellular tau aggregates (Alzheimer’s), or by α-synuclein fibrils and damaged mitochondria (Parkinson’s). Targeting mitochondria with a degrader in Parkinson’s or any condition is perilous, but conceivable. “Having a disease-specific marker on a mitochondrion [would] likely allow for a favorable therapeutic window,” says Murphy, who says several groups are looking for such markers. Non-neuronal protein aggregation disorders like amyloidosis are also in the picture. Lipid droplets could be targeted in metabolic disease. Even soluble proteins, which are not typically cleared by autophagy, are potential targets, says Murphy. “The majority of proteins in the cell, at some stage of their life, are associated with complexes,” he says. “The target space is broader than what you might think.”

The autophagy degrader space is so new that the effector molecules, analogous to the E3 ligases that PROTACs employ, are mostly undisclosed by the companies testing them. PAQ Therapeutics, for example, is developing ATTECs, or autophagosome-tethering compounds, using undisclosed autophagy proteins. Boxun Lu, a neuroscientist at Fudan University in Shanghai, found the first ATTECs when he conducted a screen for small-molecule binders to mutant huntingtin protein, which accumulates in the brain in Huntington’s disease, and counter-screened against LC3, a critical protein that recruits cargo to autophagosomes for transport to lysosomes. This screen improbably produced two molecular glues that bound to both. Lu’s group went on to show that these compounds and their more target-specific analogs could reduce mutant huntingtin and reverse disease in cells and in mouse disease models. Lu originally set out to find heterobifunctional molecules, not molecular glues, so “it is an extremely lucky event they were able to identify these molecules,” says Nan Ji, CEO of PAQ Therapeutics, which has licensed the technology.

These compounds are now in PAQ’s portfolio, but finding more single-warhead molecular glues for any given target usually involves a massive screening effort. So the company is developing ATTECs with dual linkers, one to an autophagy effector protein and the other a target of interest. These are simple to construct. But unlike the similarly two-headed PROTACs, which can employ known and validated binders, autophagy targets are mostly unexplored territory, and there are few if any known small-molecule binders to autophagy-related proteins.

Nan Ji says that PAQ has begun to break that barrier. Besides finding target binders, “we have also been able to identify very potent small-molecule binders to an autophagy protein,” he says. “I can’t really specify which one yet.”

Other companies are also keeping their autophagy proteins of choice under wraps. There are many possibilities, but also pitfalls, because autophagy is more complex than the ubiquitin–proteasome system, with a lot more players. In budding yeast there are at least 41 autophagy-related (Atg) proteins, which are highly conserved in mammals. A single degrader molecule that binds only one must recruit the entire autophagy machinery. Also, several different mechanisms of selective autophagy exist, suggesting that a single degrader platform may not be possible.

Murphy acknowledges the challenge. “It’s not a classical signaling pathway or cascade — it’s just not,” he says. “There are multiple components that sort of feed in at different levels; there’s a lot of cooperativity and parallel requirements.” Success, he says, will come from acquiring knowledge of specific autophagy factors sufficient to assemble the machinery for degradation. “I do believe it is going to be possible, once in possession of that knowledge, to deploy that approach across multiple therapeutic targets,” Murphy says. A universal autophagy degrader platform, he says, is possible. “This concept of self-sufficiency really holds the key.”

One autophagy degrader approach is the AUTAC, or autophagy-targeting chimera. Fifteen years ago, a group that included Hirokazu Arimoto, an organic chemist now at Tohoku University in Japan, discovered S-guanylation, a post-translational modification of proteins. Arimoto later observed that bacterial pathogens accumulate S-guanylated proteins around them and are cleared by autophagy, suggesting a mechanistic connection. To make the first AUTAC degraders, he linked an S-guanylation tag to a protein of interest, driving its degradation by attaching ubiquitin chains joined at their lysine 63 residues, a pattern that typically triggers autophagy. AUTACs degraded not only cytosolic proteins but also dysfunctional mitochondria, improving cellular energy production.

What’s missing from the AUTAC story is an explanation for how the S-guanylation post-translational modification triggers autophagy. “There’s a gap there,” says Murphy. “It’s unclear exactly how that molecule interfaces with the autophagy machinery.” It’s not known whether AUTACs have been licensed for commercial development; Arimoto did not respond to email messages from Nature Biotechnology.

Casma, like PAQ, has not revealed which autophagy protein or proteins it plans to use to effect target degradation. Autotac Bio, which is developing AUTOTACs, or autophagy-targeting chimeras, has disclosed one: p62, which forms a bridge between a protein that’s marked for degradation and the autophagy machinery. p62 and other autophagy receptors are not classical drug targets. “Finding appropriately tractable small molecule binding sites is going to be essential to success,” says Murphy. Using LC3 could be problematic because recent work has revealed non-autophagy functions for this family of proteins, so recruiting them to degrade any given target could impair some normal biology — for example, the transcription of genes for antimicrobial defense.

A risk exists that autophagy-based degraders could interfere with proteostasis by appropriating the autophagy machinery. “In a chronic disease setting, that’s something to really pay attention to,” says Murphy. “Hopefully it’s something we can modulate, because we’re using chemistry and pharmacology, and we can adjust dosing schedules appropriately to minimize that risk.”

Safety aside, will autophagy degrader drug discovery run into the same unpredictability problem as PROTACs? “There are too many unknowns to say we will definitely have an easier path here,” says Murphy. Nan Ji agrees. On the one hand, he says, even though an autophagy degrader, just like a PROTAC, must form a target–effector ternary complex, “there would be no need necessarily for ubiquitination to occur. So in theory, that this would save one requirement for degradation to happen.” On the other hand, autophagy degraders may sacrifice some specificity in the process. With PROTACs, “because of the requirement for ubiquitination, the lysine positioning, you could get additional selectivity — degradation selectivity,” Ji says. With ATACS, “that may not be the case.” Less built-in selectivity could mean more optimization steps for autophagy degraders, or lower potency. PAQ is still working out the discovery process.

Other TACs and TABs will emerge. “There are many paths to the lysosome,” says Banik. PROTACs are almost certain to find a niche in medical practice — molecular glues like the multiple myeloma drug Revlamid (lenalidomide) already have. But whether targeted degradation will become a major drug modality at the level of small-molecule inhibitors and antibodies is much less clear. Even less certain is the ultimate fate of LYTACs, ATACs, AbTACs, KineTACs, PROTABs, AUTACs, ATTECs and AUTOTACs. They’ve barely gotten started, and their biology is mostly unexplored. Companies, just out of stealth mode, are only now beginning to disclose their clinical candidates and preclinical results. The next year will be telling.

Ref:

https://pubmed.ncbi.nlm.nih.gov/29626215/

First demonstration of miRNA-dependent mRNA decay

MicroRNAs (miRNAs) are important regulators of gene expression. Their function and roles were first discovered in the development of the worm Caenorhabditis elegans, and later shown to occur in all multicellular organisms. miRNAs function by guiding effector proteins through the recognition of complementary miRNA sequences in target mRNAs. In most animals, miRNAs form imperfect hybrids with sequences in the 3′ untranslated regions (3′UTRs) of mRNAs; the ‘seed’ region of the miRNA ensures targeting specificity. Nowadays, we know that this interaction leads to the recruitment of a protein complex that represses translation and causes deadenylation and degradation of target mRNAs.

By contrast, the original model that was proposed to explain the mechanisms underlying miRNA functions postulated that the mRNA remains stable after miRNA binding, and that gene repression occurs only at the level of translation. This conclusion was based on initial findings that miRNAs regulated the levels of target proteins, but not of target mRNAs. This model dominated the field until the publication of a key study by Amy Pasquinelli’s lab in 2005.

Pasquinelli and her team showed that two miRNAs in C. elegans — let-7 and lin-4 — trigger degradation of their imperfectly complementary mRNA targets (lin-41, lin-14 and lin-28). This conclusion was widely accepted thanks to the use of the same experimental system as the one used to establish the initial model and owing to the technical rigour of the work. The authors used northern blotting and reverse transcriptase quantitative PCR (RT–qPCR) to compare levels of miRNA targets across developmental stages, during which miRNA expression changes, and also between miRNA-mutant and wild-type worms. They ruled out transcriptional gene regulation as a mechanism of miRNA function using chromatin immunoprecipitation experiments. Finally, the authors confirmed mRNA degradation by carrying out lacZ reporter experiments with the wild-type 3′UTR of lin-41 or with a mutant 3′UTR, in which the let-7-binding sites were deleted.

“This work… triggered further mechanistic studies of miRNAs”

This work was followed up by numerous studies showing the degradation of miRNA targets in other organisms and triggered further mechanistic studies of miRNAs. Research in subsequent years provided an understanding of how miRNAs cause mRNA degradation and translation repression and how these two mechanisms contribute to overall miRNA-mediated gene repression.

References

Original article

  • Bagga, S. et al. Regulation by let-7 and lin-4 miRNAs results in target mRNA degradation. Cell 122, 553–563 (2005)

Probing RNA-Binding Proteins in Cancer

RNA-binding proteins (RBP) are an emerging class of potential cancer therapeutic targets, albeit with myriad complexities that have yet to be untangled. Recent research has implicated RBPs in MYC-driven cancers and in acute myeloid leukemia (AML)—two examples in a lengthening list.

“The early genomics era was mainly focused on DNA-level molecular mechanisms—transcription factors and epigenetic modifiers were seen as the drivers of gene expression control,” remarks Hani Goodarzi, PhD, of the University of California, San Francisco. “But some RNA biologists, including Robert Darnell’s group at The Rockefeller University [in New York, NY], were already looking ahead to how RBPs might also modulate gene expression.” Notably, Darnell developed cross-linking immunoprecipitation (CLIP) to pinpoint to which RNA type, and where, a given protein binds.

“Once we realized that transcriptional regulation is only one part of the equation, more researchers began paying attention to what happens after, in the post-transcription space,” Goodarzi adds.

Interest in RBPs picked up, and lately “the floodgates have opened,” notes Gene Yeo, PhD, of the University of California, San Diego, in terms of tools at RNA biologists’ disposal. These include mass spectrometry–based quantitative proteomics and STAMP, a technology his group developed to study RBP–RNA interactions in single cells (Nature Methods 2021;18:507-19). As well, CRISPR–Cas9 screening “has become very important for looking at synthetic lethality and function to better identify RBPs as novel drug targets.”

Occasionally, toolbox components are tweaked and refined; for instance, Yeo’s team came up with “enhanced CLIP”—dubbed eCLIP—which is less technically demanding, with improved specificity and success rates. eCLIP “enabled us to generate large-scale interaction maps of RBPs and their targets,” he says. “It turns out that there are considerably more of these proteins than previously thought; some 10% to 20% of the human genome encodes RBPs” (Nature 2020;583:711-9).

Another novel technique for mapping RBP networks, called HyperTRIBE, was adapted from studying nerve cells in fruit flies by Michael Kharas, PhD, of Memorial Sloan Kettering Cancer Center, also in New York, NY (Nat Commun 2020;11:2026). For Kharas, “the most powerful aspect of RBPs is their ability to really change the cell state, because they influence protein production, dictating how much or how little is made from a given transcript.” Cancer can interfere with this process, he adds, “tipping a delicate balance and thereby altering key decision points for the cell.”

“RBPs are complex molecules whose activities reverberate throughout the cell’s gene expression network,” Goodarzi concurs. Whereas transcription factors such as p53 have long received the spotlight as key regulators that tumor cells frequently hijack for their own purposes, “we’re learning that RBPs are master regulators, too, and also co-opted” in cancer. As well, “in sequencing more cancer genomes, we’re starting to uncover a ton of mutations in RBPs,” he notes, “which has really put them on the map for cancer biologists.”

Toward therapeutics

A recent study from Yeo’s group sheds new light on cancers addicted to MYC, which has long vexed researchers as a therapeutic target. However, probing the post-transcriptional milieu of MYC-driven tumor types may yield workaround strategies down the road (Mol Cell 2021;81:3048–64).

Yeo reported that YTHDF2, an RBP, is a vulnerability in triple-negative breast cancer (TNBC) with hyperactivated MYC. YTHDF2 typically keeps a lid on the number of mRNA transcripts that are translated, earmarking many for degradation to maintain cellular homeostasis. With MYC addiction, transcription and translation levels are aberrantly high, so YTHDF2 “becomes more important than ever” for balance, he explains. “When we inhibited it, that provoked a lot of cellular stress from accumulated unfolded proteins, which then triggered apoptosis” in TNBC cells and tumor xenografts.

“Our findings show how cancer cells exploit the function of specific RBPs, to evade stress-induced death,” Yeo adds. “To us, YTHDF2 is a plausible therapeutic candidate, but of course there are others out there.”

“Others” may include RBMX and RBMXL1, which Kharas and his team have been studying. After initially identifying the RBP Musashi-2 as an important regulator in AML, they began scoping out Musashi-2′s network, landing on RBMX and RBMXL1. Both are overexpressed in AML and necessary for tumor cell survival (Nat Cancer 2021;2:741-57).

“We found that these two RBPs directly promote the transcription of their target, CBX5, itself a regulator of chromatin accessibility in AML cells,” Kharas explains. “Knocking them out reduced CBX5′s mRNA and protein abundance, changing how chromatin is compacted, which stunted cell growth and delayed leukemia development.”

The number of identified RBPs is estimated at 1,500, so deciding which ones to pursue therapeutically “will come down to prevalence,” Goodarzi says. “If an RBP’s mode of regulation is pretty extensive, impacting a broad set of cellular and cancer states, that opens the door for it to be prioritized.”

For instance, SF3b1—a key RNA splicing component—is frequently mutated in patients with myelodysplastic syndromes, which can morph into leukemia. H3B-8800 (H3 Biomedicine), a small molecule that modulates SF3b1′s activity, is one drug being evaluated in the clinic. However, in preliminary data from a phase I trial of 15 patients there were no objective responses (Leukemia 2021 Jun 25 [Epub ahead of print]).

A challenge is that RBPs “have different functions that are wholly context-dependent; the same protein that’s a splicing factor in the nucleus can be a stability factor in the cytoplasm,” Goodarzi says.

Yeo agrees: “You’d need to discern what, exactly, to target—is it an RBP’s RNA recognition function, its ability to recruit other proteins as part of a complex, or something else entirely?” Compounding the complexity, many RBPs have intrinsically disordered regions, “which are structurally unstable and can form aggregates. We don’t yet know if this aspect would make them easier or harder to drug.”

Strategies being explored include not only small-molecule inhibitors, but antisense oligonucleotides that modulate RBPs at their own transcript level. As well, decoy RNAs conjugated to proteolysis-targeted chimeras (PROTAC) could selectively trap RBPs, routing them toward degradation. This RNA–PROTAC concept has shown utility in vitro, targeting LIN28 and RBFOX1 in cancer cell lines.

“The more we learn about RBPs, and with better technology, the fancier we can get in thinking about how to drug them,” Kharas says. “This is just the beginning.”

“I’d say RNA is having a renaissance moment,” Goodarzi adds. “We understand very little about post-transcriptional control. It’s this vast landscape, and we’ve barely scratched the surface. But the tools keep improving, and excitement in the field is driving participation, so now is a pretty great time to be an RNA biologist.” –Alissa Poh

RNA delivery with a human virus-like particle

Schematic of the ‘selective endogenous encapsidation for cellular delivery’ (SEND) system. PEG10, cargo RNA and fusogen vectors are transfected into cells. Inside cells, the PEG10 proteins pack the cargo mRNA and assemble into virus-like particles (VLPs) that are secreted to the growth medium in extracellular vesicles. The medium is then collected and the VLPs are isolated by ultracentrifugation. Finally, the target cells are transfected with the VLPs. Portions of this figure were created with BioRender.com.

===

RNA is emerging as a powerful therapeutic modality in applications ranging from vaccines to protein replacement therapies. Yet in many applications beyond vaccines, a central obstacle to clinical development is the lack of efficient methods to deliver RNA to specific tissues and cells. In a recent paper in Science, Segel et al.1 report a novel RNA delivery strategy that is borrowed from the human genome. The approach uses a protein derived from a human retrovirus with the rare capacity to package its RNA and transport it outside the cell in virus-like particles (VLPs). The authors show that their approach, called ‘selective endogenous encapsidation for cellular delivery’ (SEND), enables delivery of exogenous mRNA cargos, such as Cre and Cas9, into cells in vitro without the use of non-human components. Although this delivery strategy is still in its infancy, as a fully human system it may prove to be a safer alternative to current methods.

Currently, the most widely used RNA delivery method is lipid nanoparticles made from natural and synthetic amino ionizible lipids. Lipid nanoparticles fueled the remarkable success of the SARS-CoV-2 mRNA vaccines, but for other applications they have several shortcomings. These include uncertainty about their safety and efficacy for repeated dosing and for crossing biological barriers to target specific cell types.

Virus sequences incorporated throughout the human genome raise the tantalizing possibility that their natural functions could be harnessed to deliver therapeutic RNA. Retroelements account for about 8% of the human genome2. Although most endogenous retroviral genes have lost their functions, some continue to have roles in human physiology. Several retroelements have been reported to retain some of their ancient functionality, such as binding and transferring mRNA and forming capsids within the cell2.

To find candidate retroelement genes suitable for RNA delivery, Segel et al.1 surveyed conserved endogenous retroelements, focusing on homologs of structural retroviral Gag proteins that contain the core capsid domain. This domain protects the genome of both retrotransposons and retroviruses by forming VLPs, suggesting that proteins that contain it might be able to transfer other RNAs. The authors narrowed down their search to proteins that are conserved between human and mouse and have detectable RNA levels, because such proteins are more likely to have retained some functionality in mammalian cells. They screened their leading hits in bacteria and mammalian cells to determine whether they are secreted in extracellular vesicles as VLPs. The protein most highly enriched in the VLP fraction was mouse (Mus musculus) PEG10, which is also detected at appreciable levels in mouse serum. Moreover, the VLPs formed by the PEG10 protein contained the full-length Peg10 mRNA transcript.

To investigate whether these mouse PEG10 VLPs could incorporate unrelated RNAs, Segel et al.1 flanked a Cre recombinase coding sequence with Peg10 5′ and 3′ untranslated regions (UTRs), and co-transfected the construct together with PEG10 into Neuro2a mouse neuroblastoma cells. They also engineered the VLPs by adding the fusogen vesicular stomatitis virus envelope protein (VSVg) to facilitate cellular delivery. Strikingly, PEG10 VLPs with VSVg were secreted in extracellular vesicles and transferred the Cre mRNA into loxP–GFP cells (Fig. 1). This observation suggested that adding Peg10 UTRs to the mRNA cargo enables the PEG10 VLPs to transfer an mRNA of choice, and that the viral fusogenic protein is required for cell entry. Human PEG10, similarly to the mouse ortholog, could form VLPs and transfer mRNA.

This combination of PEG10, modified mRNA and fusogen forms the SEND system. To make the system fully endogenous, Segel et al.1 evaluated murine and human fusogens that might replace VSVg. They focused on syncytin, an endogenous fusogenic transmembrane protein that evolved from retroviral elements, which has been used to pseudotype lentiviruses for nucleic acid delivery. The authors found that the fusogenic syncytin proteins in mouse, SYNA and SYNB, had a similar expression pattern to mouse PEG10, and that mouse SYNA could successfully replace VSVg in the transfer of Cre mRNA to tail-tip fibroblasts. The human syncytins (ERVW-1 and ERVFRD-1) operate in a similar fashion, which establishes SEND as a fully human system for functional gene transfer, at least in vitro.

To test the modularity of SEND, the authors also used it to deliver the large SpCas9 mRNA and tested its functionality by evaluating gene disruption in Neuro2a mouse neuroblastoma cells constitutively expressing a single-guide RNA (sgRNA) against Kras. The SEND system delivered the Cas9 mRNA cargo and caused a remarkable 60% gene editing in the Kras locus in the recipient cells. However, SEND failed to deliver sgRNA cargo to Cas9-expressing cells. Therefore, the authors combined the sgRNA and Cas9 mRNA to create an all-in-one vector. This vector facilitated 30% Kras gene editing in Neuro2a cells using the mouse SEND system and 40% VEGFA gene editing in HEK293 cells using the human SEND system.

The study by Segel et al.1 is notable as the first example of an endogenous system able to package, secrete and deliver specific mRNAs. Before practical uses can be envisaged, extensive further testing is needed. The SEND system was studied only in vitro, and it must be evaluated in vivo. As previously reported3, mouse PEG10 has multiple roles in the placenta and neuronal development, and it is unknown whether adding external PEG10 protein might affect its native functions. Additional questions concern possible autoimmune responses when an endogenous protein is expressed in a different biological context, as well as biodistribution, toxicity, efficacy and scalability.

Future work should also determine how the SEND system compares to existing mRNA delivery systems, including the lipid nanoparticles used in SARS-CoV-2 vaccines4,5 and many other approaches now in clinical testing6. It will be important to understand whether the system possesses intrinsic cell-type specificity and whether such specificity could be engineered. The next generation of lipid nanoparticles includes targeting strategies that have recently shown cell-type specificity, potent efficacy and safety in various animal models of inflammation, cancer and genetic disorders using mRNA alone or in combination with sgRNA to knockout cancer genes7,8,9,10. Nonetheless, the SEND system could become a safer and even more efficient alternative. After further development, it may have advantages in addressing biological questions, delivering vaccines and treating diseases, with particular relevance to chronic diseases that require lifelong therapies.

References

  1. Segel, M. et al. Science 373, 882–889 (2021).
  2. Feschotte, C. & Gilbert, C. Nat. Rev. Genet. 13,
    283–296 (2012).
  3. Ono, R. et al. Nat. Genet. 38, 101–106 (2006).
  4. Baden, L. R. et al. N. Engl. J. Med. 384, 403–416 (2021).
  5. Polack, F. P. et al. N. Engl. J. Med. 383,
    2603–2615 (2020).
  6. Rosenblum, D., Gutkin, A., Dammes, N. & Peer, D. Adv. Drug Deliv.
    Rev. 154-155, 176–186 (2020).
  7. Kedmi, R. et al. Nat. Nanotechnol. 13, 214–219 (2018).
  8. Veiga, N. et al. J. Control. Release 313, 33–41 (2019).
  9. Rosenblum, D. et al. Sci. Adv. 6, eabc9450 (2020).
  10. Dammes, N. et al. Nat. Nanotechnol. 16, 1030–1038 (2021).

Beyond RNAi: RIBOTAC for RNA silencing

Proteolysis-targeting chimeras (PROTACs) are an increasingly established modality to induce protein degradation by bridging the protein target to proteolytic machinery. Ribonuclease-targeting chimeras (RIBOTACs) perform a similar function, bringing RNA target molecules to RNases for degradation. Writing in Science Translational Medicine, a team led by Matthew Disney design a RIBOTAC to degrade the disease-causing RNA in amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). In patient-derived spinal neurons and a mouse model of ALS, their molecule induced degradation of the pathogenic mRNA and reduced the associated pathology.

ALS and FTD are progressive neurodegenerative conditions that result in motor and cognitive impairment. These diseases are usually sporadic; the most commonly associated mutation is a hexanucleotide repeat expansion (HRE) in an intron, usually intron 1, of chromosome 9 open reading frame 72 (C9orf72). This subset of the disease is termed c9ALS/FTD.

The HRE-containing RNA is translated into a protein that can contain one of five different dipeptide repeats (depending on the reading frame of the HRE), often poly(GP) or poly(GA).

The HRE-containing RNA and the resulting dipeptide repeat protein are both thought to promote neuronal death. Both the RNA and the protein form toxic aggregates. The RNA also sequesters gene expression machinery.

The authors reasoned that removing the mRNA itself could be of particular therapeutic value, because it would eliminate both of these toxic species. The HRE mRNA forms a particular 3D structure, and targeting this structure, rather than the primary sequence, could have fewer off-target effects from targeting other, non-pathogenic mRNAs that contain shorter HREs.

Using structure–activity relationships combined with biophysical and structural analyses, the authors first designed a small molecule dimeric compound that would bind to the 3D RNA structure. Each monomer bound within the internal loops of the RNA hairpins. The dimer bound with a Kd of 4±0.7 nM, and had a long residence time on the target RNA.

In a pull-down assay, the dimer associated with the target RNA dose-dependently in cells, including patient-derived lymphoblastoid cell lines and induced pluripotent stem cells (iPSCs). In multiple cell lines, including those derived from patients, the molecule reduced ribosome loading and translation of RNAs containing the expanded C9orf72 intron 1.

Building on that dimer, they added an RNase L recruiter to turn the molecule into a RIBOTAC. In HEK293T cells, their RIBOTAC inhibited translation and dose-dependently reduced levels of the HRE-containing RNA in a manner that required RNase L.

Their RIBOTAC rescued pathological hallmarks in c9ALS/FTD patient-derived cell lines, too. This molecule reduced the abundance of C9orf72 intron 1 in patient-derived lymphoblastoid cell lines and iPSCs, and reduced levels of the dipeptide repeat protein in iPSCs. No effects on other transcripts were observed.

These same reductions were seen in iPSC-derived spinal neurons (iPSNs), which recapitulate many of the genetic, transcriptional and biochemical signatures of brain tissue from patients with c9ALS/FTD. The RIBOTAC dose-dependently reduced levels of C9orf72 intron 1 and poly(GP), without altering the transcription of other transcripts that contain short, non-pathogenic (G4C2) repeats.

Nuclear pore proteins have previously been found to be reduced in patients with c9ALS/FTD. By super-resolution structured illumination microscopy, the authors found the RIBOTAC restored levels of Nup98, a key nuclear pore protein.

The mouse model for c9ALS/FTD contains a C9orf72 bacterial artificial chromosome that expresses 500 r(G4C2) repeats, resulting in foci containing the aberrant RNA or poly(GP) proteins. Treatment of these mice with the RIBOTAC by a single intracerebroventricular injection reduced r(G4C2)-containing mRNA, r(G4C2)-containing foci, and poly(GP)-containing proteins. These effects were observed as early as 1 week after injection, and persisted until at least 6 weeks after treatment (the earliest and latest time points analysed). The RIBOTAC reduced known hallmarks of c9ALS/FTD, including poly(GP) and poly(GA) aggregates, as well as transactivation response DNA-binding protein 43 (TDP-43) inclusions, detected by immunohistochemical analysis.

The authors note that further optimization of the medicinal chemistry and physicochemical properties would be required to translate their RIBOTAC to the clinic. However, this work demonstrates that targeting RNA is feasible, and could be an optimal modality for diseases in which RNA plays a key pathogenic role.

Bush, J. A. et al. Ribonuclease recruitment using a small molecule reduced c9ALS/FTD r(G4C2) repeat expansion in vitro and in vivo ALS models. Sci. Transl Med. 13, eabd5991 (2021)

Gene therapy for pathologic gene expression

Haploinsufficiency arises when one copy of a gene is functionally lost, often through nonsense or frameshift mutations or small chromosomal deletions. The resulting monoallelic expression is not sufficiently compensated for by the intact allele, ultimately leading to decreased expression of the gene product and resulting in pathologic phenotypes (1). What are the therapeutic options for diseases rooted in insufficient gene expression? One possible viable option is to restore normal gene expression levels by enhancing their transcription in a targeted fashion. On page 246 in this issue, Matharu et al. (2) report a CRISPR-based gene-activation approach that can increase the expression of normal endogenous genes in a tissue-specific manner, setting the stage for the development of new gene-regulating therapies for gene dosage–associated diseases.

Among the emerging applications of CRISPR-based gene editing are techniques that use a catalytically inactive Cas9 enzyme (dCas9) fused to a protein domain to modulate transcription (3). These fusion proteins can be recruited by way of guide RNAs (gRNAs) to specific genomic locations, including promoters and cis-regulatory elements such as enhancers, which regulate gene expression. If the recruitment site is transcriptionally competent, the result is activation (CRISPRa) or repression/interference (CRISPRi) of transcription. Although this strategy has been applied in human cell culture and animal models (4, 5), the ultimate task of employing CRISPRa to therapeutically rescue pathologic gene expression has not been fully realized. Matharu et al. use CRISPRa to restore the expression of two haploinsufficient genes, single-minded 1 (Sim1) and melanocortin 4 receptor (Mc4r), to physiological amounts in mouse models of severe early-onset obesity. Haploinsufficiency of either gene causes severe obesity in humans, and previous work in mice established that SIM1 and MC4R control eating behavior through their expression in the hypothalamus (6–8); therefore, a relevant therapeutic intervention would target gene expression specifically in the hypothalamus.

Because Sim1 and Mc4r are expressed in multiple tissues, an important first step was to address whether it is feasible to modulate expression in a tissue-specific manner. The authors tested two approaches, focusing initially on Sim1: (i) Target CRISPRa to the promoter of the remaining functional Sim1 gene to enhance expression wherever Sim1 was already active, and (ii) target CRISPRa to a 270-kb distal enhancer that controls Sim1 expression specifically in the hypothalamus (see the figure). Both approaches were employed in transgenic animals expressing the CRISPRa reagents (dCas9 fused to the transcriptional activator VP64), as well as recombinant adeno-associated virus (rAAV)–mediated delivery of CRISPRa directly into the hypothalamus. In all cases, hypothalamic Sim1 expression was restored to wild-type levels and the mice did not become obese, demonstrating robust prevention of a haploinsufficient phenotype by enhancing endogenous gene expression. Interestingly, the authors found that they could manipulate Sim1 expression exclusively in the hypothalamus by targeting the hypothalamic enhancer instead of the Sim1 promoter, indicating that to obtain tissue-specific transcriptional modification, CRISPRa will likely need to be deployed to tissue-specific regulatory elements. Injection of rAAV-based CRISPRa into the hypothalamus of Mc4r haploinsufficient mice similarly prevented obesity, further demonstrating the strength of this approach.

This strategy illustrates what could emerge as an important new approach to treating gene expression disorders and raises the possibility of expanding the scope of CRISPRa and CRISPRi technology to treat diseases that involve pathogenic overexpression of a gene, particularly in cancer. For example, somatic mutations in a subset of pediatric T cell acute lymphoblastic leukemia (T-ALL) result in the formation of a highly active enhancer that drives oncogenic TAL1 gene overexpression (9). Moreover, MYC gene expression in human B cell acute myeloid leukemia (AML) was recently shown to be dependent on a 1.7-megabase distal enhancer element (10). Both studies demonstrated that disrupting these enhancer elements negatively affected cancer cell survival, providing a precedent for developing CRISPRi as a therapeutic approach to inactivate cancer-promoting enhancers. Although transcription factors such as TAL1 and MYC are among the most potent oncoproteins, targeting them with small-molecule inhibitors has proven challenging. The results presented by Matharu et al. suggest that it should be possible to circumvent protein-targeted therapies by quelling oncogene expression at its source—transcription.

A key advancement in the study by Matharu et al. is their use of rAAV to deliver CRISPRa reagents in vivo. For a CRISPR-based therapeutic to be relevant for use in humans, it will likely need to be packaged within a virus and administered intravenously, because most targeted cell types will not be available for ex vivo manipulation and implantation. rAAV is nonpathogenic and displays a high delivery potential, making it a viable option for effectively introducing CRISPR reagents to human cells. CRISPRa and CRISPRi approaches have the added benefit of modulating gene expression without modifying the genome, thereby avoiding potential off-target mutations. Thus, pairing CRISPRa with rAAV to treat a gene expression disorder in vivo is an important step forward in the development of expression-based therapeutics.
Although Matharu et al. demonstrate that CRISPR-based up-regulation of a haploinsufficient gene can prevent obesity, this study also raises the important question of whether a disease phenotype can be reversed. Because the authors administered CRISPRa reagents to mice at 4 weeks of age—before the onset of obesity—they did not address the potential to rescue the phenotype later in life. Many haploinsufficient disorders in humans are likely to be therapeutically actionable only after the disease phenotypes are partially or fully established. Future experiments should test the therapeutic benefit of targeting gene expression with the goal of reversing a haploinsufficient phenotype. Additionally, it is important to recognize that many enhancers are dynamic, meaning that they may act at specific developmental stages and change their tissue specificity with time (11). Fortunately, the authors were able to capitalize on a developmentally stable tissue-specific enhancer, although it is unclear how often this will be the case for targeting enhancers of other haploinsufficient genes.

Naturally occurring and pathogenic gene regulatory DNA elements provide a tailored therapeutic route to targeting gene expression. The results presented by Matharu et al. underscore the importance of identifying and carefully characterizing the enhancers that control gene expression. Large-scale efforts have identified thousands of putative enhancers in hundreds of human cell types. However, cell types representing diverse disease states, particularly from human patients, remain understudied. Knowing the full repertoire of gene regulatory elements and their target genes (12) in these cell types is likely to provide critical insight that can be exploited for CRISPR-based therapeutic approaches to modify gene expression.

REFERENCES AND NOTES

  1. N. Huang et al., PLOS Genet. 6, e1001154 (2010).
  2. N. Matharu et al., Science 363, eaau0629 (2019).
  3. C.-H. Lau, Y. Suh, Transgenic Res. 27, 489 (2018).
  4. M. L. Maeder et al., Nat. Methods 10, 977 (2013).
  5. H. Zhou et al., Nat. Neurosci. 21, 440 (2018).
  6. J. L. Michaud et al., Hum. Mol. Genet. 10, 1465 (2001).
  7. M. J. Krashes et al., Nat. Neurosci. 19, 206 (2016).
  8. C. Vaisse et al., J. Clin. Invest. 106, 253 (2000).
  9. M. R. Mansour et al., Science 346, 1373 (2014).
  10. C. Bahr et al., Nature 553, 515 (2018).
  11. A. S. Nord et al., Cell 155, 1521 (2013).
  12. L. E. Montefiori et al., eLife 7, e35788 (2018).

Challenges in targeting circRNAs

https://www.nature.com/articles/s41392-021-00569-5

To date, circRNA-based therapeutic approaches have only been performed in preclinical studies. There are still many obstacles that need to be overcome in order for the therapeutic potential of these approaches to be achieved. Major limitations with these techniques and potential mitigation strategies are outlined in this section.

Off-target gene silencing

A fundamental concern with RNAi-based strategies is that small molecules like siRNA can potentially induce off-target gene silencing via a miRNA-like effect.165 siRNA can target transcripts through partial complementarity, which usually occurs between the 3’UTR of the transcript and seed region of the siRNA.166,167 In circRNA knockdown experiments, it is usually verified that the corresponding linear mRNA levels are unaffected. However, off-target effects beyond their linear counterparts are less predictable. Designing siRNA to mitigate off-target effects is an ongoing area of interest for RNAi approaches.127,157 The CRISPR/Cas13 system has demonstrated low mismatch tolerance and could knockdown circRNAs with greater specificity than RNAi.35 However, whether or not this approach will be effective in vivo remains to be investigated.

Nonspecific tissue or cell type targeting

Although the majority of circRNAs are expressed in a tissue- or cell type-specific manner, some circRNAs are present in more than one tissue or cell type.25 Common strategies used to target circRNAs may cause adverse effects on off-target tissues or cells. Nanoparticle delivery systems have the potential to improve the targeting of therapeutic agents to specific cells.31,32,168 Alternatively, this challenge could be avoided in cases where it is possible to target circRNAs with highly specific expression patterns.

Toxicity of gold nanoparticles

Although AuNPs are convenient for delivering circRNA-targeting agents or circRNA plasmids in animal models, it is unclear how safe they are for clinical use. Previous studies on AuNPs draw inconsistent conclusions about their toxicity.169 It has been suggested that its toxic effects are dependent on the size of the particles, with smaller AuNPs causing more harmful effects.170 Thus, it is possible that the properties of AuNPs can be fine-tuned to meet safety requirements. Of note, a LNP-siRNA system has already been approved for the treatment of hereditary transthyretin amyloidosis30 and could potentially be used to deliver siRNA targeting disease-promoting circRNAs.

Mis-spliced products

CircRNA overexpression vectors are usually based on the pairing of intronic complementary sequences. This system can lead to mis-splicing of linear RNAs or circRNAs. The mis-spliced byproducts can cause nonspecific and potentially deleterious effects. currently, there are still no vectors that can generate target circRNAs without mis-spliced products. Highly purified circRNA molecules synthesized in vitro could potentially be used to overcome the shortcomings of circRNA overexpression vectors. However, inherent problems with large-scale synthesis may limit the therapeutic potential of synthetic circRNAs.

Synthetic circRNA immunogenicity

In addition, synthetic circRNAs can induce immune system activation in vivo.171 It was suggested that foreign circRNAs are distinguished from endogenous circRNAs based on their lack of the m6A modification.138 Strategies are currently being explored to reduce synthetic circRNA immunogenicity, including introducing chemical modifications and coating them in RBPs.139

Other questions

https://www.sciencedirect.com/science/article/abs/pii/S0167779919301775

Will circular RNAs be optimally delivered using lipid-based or exosome-mediated delivery?

ActD treatment is commonly used to evaluate circular RNA stability, but its toxicity to cells limits long exposure studies. Could alternative methods of evaluating RNA stability aid in understanding the limits of circular RNA stability?

How difficult would it be to translate circular mRNA therapeutics in a targeted cell-specific manner? Does this involve the use of multiple input regulatory molecules as well as endogenous regulators?

To date all proteins generated from exogenous circular RNAs have been translated from a single open reading frame. For more complex molecules such as antibodies the best strategy for producing multiprotein complexes is not yet known: will it be a single circular mRNA with multiple open reading frames or multiple independent circles?

A comparative study evaluating the performance of in vitro versus in vivo generated circular RNAs is lacking. For example, what are the challenges in scaling up the production of synthetic circular RNAs?

Gene Therapy for Dravet Syndrome

Several years ago, gene therapy still seemed a distant possibility for Dravet syndrome. While preclinical and clinical gene therapy approaches were marching forward for other diseases, the size of the SCN1A gene hampered progress on traditional approaches to gene therapy for Dravet syndrome. However, advances to our basic understanding of genetics, along with an ever-expanding“genetic toolset,” have allowed researchers to develop new approaches to gene-based interventions, making truly disease-modifying therapies closer to a reality for Dravet syndrome.

In more than 80% of cases, Dravet syndrome is caused by a mutation in one copy of the SCN1A gene that encodes a sodium channel, Nav1.1 (Zuberi et al 2011, Wu et al 2015). Mutations in SCN1A that are associated with Dravet syndrome result in about 50% decreased expression or function of the Nav1.1 sodium channel (see Figure 1; ). This type of reduction in gene expression is referred to as a haploinsufficiency (Catteral et al 2010). Traditional gene therapy approaches to haploinsufficiency would be gene replacement. Gene replacement therapy uses the casing of a virus, a viral vector, to deliver DNA that encodes a healthy copy of the mutated gene to cells. However, this approach has proven difficult in Dravet syndrome because the SCN1A gene is quite large and that amount of DNA cannot fit in commonly used delivery vectors. Researchers are now circumventing this problem by delivering other genes that can increase the expression of the unaffected copy of SCN1A, targeting the gene at the RNA-level instead of the DNA, or optimizing other kinds of larger delivery vectors. In addition to the problem of the gene size, delivery to specific subsets of neurons in the brain is another challenge that researchers are working to overcome.

Targeting DNA Regulation

When a gene within your DNA (like SCN1A) is expressed, the first step is to make a strand of messenger RNA (mRNA) that can then be used as instructions to make the final product: a protein (the sodium channel, Nav1.1). In order for a specific cell to know when and how much of a gene to express, there are regulatory regions in the DNA (often called promoters or enhancers) that act as markers to control gene expression. Molecules that bind to these regulatory regions and turn gene expression on, off, up, or down, are called “transcription factors,” (because the process of making mRNA from DNA is called “transcription”). Researchers have been working to identify where these regulatory regions are for SCN1A and ways that they can modulate the activity of the regulatory regions to increase expression of SCN1A.

ETX101. Encoded Therapeutics has developed an approach to increase expression of the SCN1A gene with a new gene therapy called ETX101. Instead of delivering a new copy of SCN1A, ETX101 delivers a regulatory gene that acts to increase expression of SCN1A and, in turn, the sodium channel Nav1.1. The gene that ETX101 will be delivering to cells is an n engineered transcription factor that will help to increase the expression of the SCN1A gene. Because this engineered transcription factor is much smaller than the SCN1A gene, ETX101 can be packaged within an adeno-associated viral (AAV) vector for delivery to cells. ETX101 is anticipated to be a one-time-treatment delivered directly to the brain, where the engineered transcription factor will be expressed in the major type of neurons that utilize the SCN1A gene. The hope is that this will restore the function of these inhibitory interneurons, ameliorating seizures and other comorbidities, and preclinical work presented at scientific meetings has shown this approach to be effective in rodent models of Dravet syndrome. Additionally, injection of ETX101 to the brain of non-human primates shows broad distribution of ETX101 and favorable safety outcomes. Encoded Therapeutics hopes to begin a clinical trial, called ENDEAVOR, for ETX101 in patients with Dravet syndrome later in 2021.

CRISPR. Another approach currently being tested in preclinical cell and rodent models is also targeting the regulation of the SCN1A gene. Commonly, CRISPR technology is used to “cut and paste” sequences of DNA, and the therapeutic potential has been largely focused on the ability to cut out and correct a specific mutation in a gene. However, some researchers have been utilizing this technology in a different way to increase gene expression. ‘CRISPR associated protein 9,’ or Cas9, is used in conjunction with a “guide RNA sequence” to locate the target DNA segment and make a cut. A deactivated version of Cas9, called dCas9, no longer harbors the ability to cut DNA, but instead can be connected to molecules that increase gene expression. Several groups of academic researchers are investigating how this technology could be utilized to increase SCN1A expression by targeting dCas9 to specific regulatory regions for the SCN1A gene (Colasante et al 2020, Yamagata et al 2020). Work in cell lines and mouse models of Dravet syndrome have shown the effectiveness of this approach to increase SCN1A, and consequently levels of the Nav1.1 sodium channel. Additionally, experiments indicate that this treatment approach can improve neuronal communication and seizure activity in Dravet syndrome mice. While encouraging, this work is still in preclinical development; there are still challenges to the delivery method and efficiency of increasing gene expression. The current experiments used injection of multiple delivery vectors to contain both the dCas9 and the guide RNA sequences with limited expression in the brain, or they have taken advantage of mouse genetics to ensure the efficient delivery to the correct cells. Despite the need for advancements to the technology for eventual human therapies, these proof-of-concept studies highlight that this approach could correct the haploinsufficiency of SCN1A and improve patient outcomes.

Targeting RNA Regulation

A lot of regulation can occur at the RNA level as well. Scientists are taking advantage of some of those regulatory instructions to increase the amount of Nav1.1 that the SCN1A mRNA creates by targeting alternative splicing, correction of nonsense mutations, and stabilization of SCN1A mRNA transcripts.

TANGO ASO. Stoke Therapeutics has developed a strategy for a disease-modifying approach called STK-001 that works at the level of RNA-splicing. RNA-splicing occurs to remove the sections copied from the DNA code that are not essential to the “recipe” for creating the protein product (in this case, Nav1.1). The SCN1A gene sometimes includes a section of the DNA code, called a poison exon, within the RNA transcript that tells the cell to trash the strand of RNA instead of using it to produce the Nav1.1 protein. Stoke Therapeutics approach, called Target Augmentation of Nuclear Gene Output (TANGO), sends in a small piece of RNA that blocks the inclusion of the poison exon, and thus, increases the amount of RNA strands that produce Nav1.1 (Lim et al 2020). The small piece of RNA, called an antisense oligonucleotide or ASO, can be packaged inside a lipid droplet that allows the ASO access into cells. This type of packaging is ideal, as it does not pose the same risks of off-target immune reactions as some other delivery methods. The other advantage to this approach is that only the cells that should be naturally expressing the SCN1A gene will be affected by the therapy, helping to reduce off-target effects. STK-001 is delivered by intrathecal injection (similar to a lumbar puncture or an epidural). Preclinical work showed efficacy to reduce seizures and mortality in a mouse model of Dravet syndrome (Han et al 2020). Clinical trials (called MONARCH and SWALLOWTAIL) began in late 2020 and early 2021 for STK-001 to determine the safety, pharmacokinetics, and efficacy in patients with Dravet syndrome. The trials will determine how often STK-001 needs to be administered; it is thought potentially STK-001 administration will be needed every several months, as ASO’s are eventually broken down by the cells that take them up. We expect to hear the first reports on the STK-001 trials by the end of 2021.

tRNA. Tevard Biosciences recently partnered with Zogenix to advance two therapies that could correct the haploinsufficiency in Dravet syndrome. They are using a different kind of RNA, called transfer RNA or tRNA, to increase SCN1A gene expression. They are developing two different approaches. The first therapy will specifically target nonsense mutations. Nonsense mutations create a change in the DNA code that tells the cell to prematurely stop making the Nav1.1 protein, leading to a shortened version that either gets broken down by the cell or does not work as efficiently as it should. The therapy in development uses a tRNA that can overcome the mutation and allow the Nav1.1 protein to be made correctly. The other therapy they are developing also uses tRNA, but this approach helps to stabilize the SCN1A mRNA so that it can be used to create more copies of the Nav1.1 protein. Both of these approaches would be delivered in an AAV vector to cells in the brain. Currently, these experiments have all been shown in cells, and the company is now working with animal models of Dravet syndrome. These therapies are exciting because of the potential they hold for broad application to other genes, but there is still work to be done in animal models.

Focusing on Gene Replacement

As mentioned above, one of the major challenges to overcome for gene therapy in Dravet syndrome has been the large size of the SCN1A gene that does not fit into the vectors that would be most ideal for delivery to neurons in the brain. Several academic research collaborations are working on utilizing larger types of adenoviral vectors to deliver a replacement copy of the SCN1A gene. There have also been some groups working on splitting a replacement gene into two vectors that deliver the gene to the cell where it can reassemble to encode for the full Nav1.1 protein. All of these studies are still in early stages, but some work in cells and mouse models is beginning to show promising results. It is yet to be determined what the challenges of using these gene replacement approaches in humans might be, but the field is marching forward steadily.

In summary, there are several disease-modifying therapeutic approaches in various stages of development. It is encouraging to see so many different tactics being employed to overcome the challenges to correcting the haploinsufficiency of SCN1A in Dravet syndrome. With so many talented minds pushing forward from so many different angles, there is a lot of hope for the development of a therapy that can truly treat the root cause of Dravet syndrome and dramatically improve the outlook for patients.

More questions about this topic? Email DSF’s Scientific Director, Veronica Hood: veronica@dravetfoundation.org

References:
Catteral et al (2010) Journal of Physiology. DOI: 10.1113/jphysiol.2010.187484
Colasante et al (2020) Molecular Therapy. DOI: 10.1016/j.ymthe.2019.08.018
Han et al (2020) Science Translational Medicine. DOI: 10.1126/scitranslmed.aaz6100
Lim et al (2020) Nature Communications. DOI: 10.1038/s41467-020-17093-9
Wu et al (2015) PEDIATRICS. DOI: 10.1542/peds.2015-1807
Yamagata et al (2020) Neurobiology of Disease. DOI: 10.1016/j.nbd.2020.104954
Zuberi et al (2011) Neurology. DOI: 10.1212/WNL.0b013e31820c309b

Hitting SEND on mRNA delivery

A substantial proportion of the human genome is made up of retroelements — ancient transposable DNA tracts including retroviruses that have integrated into our genome during evolution. A recent study in Science reports the development of a modular system based on some of these endogenous retroelements that can be used for delivery of therapeutic mRNA cargo. This platform might support the burgeoning field of mRNA-based therapies, such as mRNA vaccines for SARS-CoV-2.

Many endogenous retroelements have lost their original function, but some have been co-opted for physiological processes, sometimes involving transfer of mRNA between cells. This led Segel et al. to wonder whether retroelements in the human genome might be programmed to deliver specific nucleic acids, which would then be translated into protein therapies inside the cell.

The researchers began by searching for homologues of the retroelement structural gene gag, and in particular for gag homologues containing a capsid domain, in the human genome. Capsid proteins form virus-like particles (VLPs) around secreted retroelement RNA, which could be a useful component for an RNA delivery platform.

By computational survey, Segel et al. looked for capsid-containing gag homologues that were common to the human and mouse genome, reasoning that such hits would be most likely to serve physiological roles in mammalian cells. The researchers used Escherichia coli to express mouse versions of the identified gene candidates, and observed by electron microscopy that several of the resultant protein products formed VLPs and are secreted. Of these proteins, PEG10 showed the greatest propensity for VLP secretion.

PEG10 is derived from a homologue of a common type of retroelement called a long-terminal repeat retrotransposon. In vitro studies including CRISPR-mediated activation of endogenous Peg10 in mouse cells showed PEG10 binds to and facilitates secretion of Peg10 transcripts. Studies in transgenic mice suggested a function of PEG10 in mammals could be to stabilize mRNAs involved in neurodevelopment.

The authors sought to reprogramme PEG10 to bind and package a different RNA cargo. They chose the gene encoding Cre recombinase (Cre) as a test cargo, which they aimed to transfer to mouse N2a cells expressing a loxP–GFP reporter.

They found that flanking Cre with the untranslated regions of PEG10 provided a ‘packaging signal’ for Cre mRNA secretion in VLPs. Addition of the fusogenic envelope protein from vesicular stomatitis virus (VSV) — a mix-and-match process known as virus pseudotyping — enabled entry of the Cre mRNA cargo into target cells.

To make the system fully endogenous, the team looked for alternative fusogens to the VSV envelope protein for pseudotyping. They focused on the syncytins, which are fusogenic proteins in mammalian cells that evolved from retroviral envelope proteins. In the mouse RNA delivery system, replacement of the VSV fusogen with the mouse syncytin A gene enabled transfer of Cre mRNA cargo to target cells in vitro.

The authors named their tripart system — the recoded PEG10 sequence, the target gene and the fusogen — selective endogenous encapsidation for cellular delivery (SEND).

To demonstrate the modular nature of SEND for delivery of a cargo mRNA of choice, the researchers used the system to deliver Cas9 mRNA to mouse N2a cells that constitutively express a single guide RNA against Kras. The SEND system achieved functional delivery of Cas9 mRNA to recipient cells, 60% of which contained insertions or deletions (indels) in Kras after delivery.

In addition, by co-packaging Cas9 mRNA and vascular endothelial growth factor A (VEGFA) guide RNA within the SEND system, the researchers created an all-in-one vector that produced indels at the VEGFA locus in 40% of HEK293 cells.

The ability to swap in different mRNA cargoes makes SEND a potentially broadly applicable platform for delivery of nucleic acids. Moreover, the authors note that the system could be less immunogenic than other methods of mRNA delivery, such as viral vectors, as SEND uses endogenous proteins. Indeed, PEG10 is highly expressed in the developing human thymus, which is a key site for the induction of T cell tolerance.

Future studies might characterize and develop other capsid proteins and fusogens encoded in the human genome to provide additional components to optimize the SEND platform.

Segel, M. et al. Mammalian retrovirus-like protein PEG10 packages its own mRNA and can be pseudotyped for mRNA delivery. Science 373, 882–889 (2021)

Machine learning solves RNA puzzles

========

RNA molecules fold into complex three-dimensional shapes that are difficult to determine experimentally or predict computationally. Understanding these structures may aid in the discovery of drugs for currently untreatable diseases. Townshend et al. introduced a machine-learning method that significantly improves prediction of RNA structures (see the Perspective by Weeks). Most other recent advances in deep learning have required a tremendous amount of data for training. The fact that this method succeeds given very little training data suggests that related methods could address unsolved problems in many fields where data are scarce.

=======

RNA is distinct among large biomolecules in that it has both informational coding ability, carried in its sequence, and the ability to form complex three-dimensional structures that can have catalytic and regulatory roles. The information-carrying component is widely appreciated. The pattern of base pairing—the first level of RNA structure—can be experimentally assessed and modeled with impressive accuracy (1, 2). By contrast, our understanding of the extent and roles of complex three-dimensional RNA structures remains rudimentary. RNA viral genomes are rich in motifs with complex three-dimensional structures with regulatory functions (3), and evidence increasingly supports the hypothesis that functional RNA structures are ubiquitous in organisms ranging from bacteria to humans. However, developing and testing hypotheses about the roles of RNA structure have been hindered by the inability to identify and model these structures. On page 1047 of this issue, Townshend et al. (4) report a machine-learning strategy for identifying native-like RNA folds.

Nearly all RNAs that form well-understood complex structures fall into a small number of classes: the ribosomal RNAs, the large and small ribozymes that catalyze RNA cleavage, bacterial riboswitches, and regulatory elements encoded by RNA viruses. Thus, there are limited examples for guiding identification and modeling of RNAs with complex three-dimensional structures. There are only four major RNA nucleotides, and the interactions that govern base pairing and simple helix formation are well understood. Once formed, RNA helices (secondary structure) often assemble as fairly rigid elements that interact hierarchically to form more complicated structures (tertiary structure) (see the figure). Despite these simplifying features, the modeling of complex RNA structures has proven to be difficult.

The RNA-Puzzles community exercise (5, 6) has been instrumental in illuminating the challenges involved: Groups try to predict an RNA structure from its sequence before learning the solved structure. Several rounds of RNA-Puzzles have revealed important themes. No single method consistently yields the best models, although certain approaches have better records than others, and most approaches are getting better. The best agreement tends to result when experimental or homology-based information is incorporated into the computational modeling. However, the median accuracy for small RNAs, with complex tertiary folds but without a close known homolog, has stayed stubbornly stuck in a range of ∼15- to 20-Å root mean square deviation [(RMSD) a measure of the similarity between known and modeled structures]. This agreement is much poorer than that now achieved for protein structures by machine learning (7), where native-like folds (∼2-Å RMSD or less) are achieved. Modeled RNA structures thus often recapitulate the overall fold of a target RNA but do not consistently reveal details of the tertiary structure. Current methods are not likely to be useful for applications such as understanding the biological mechanism of a structure or for designing ligands (or drugs) that modulate RNA function.

The Atomic Rotationally Equivalent Scorer (ARES) approach of Townshend et al. is a deep neural network, a form of machine learning, and did not initially include preconceived notions of RNA structure. Indeed, the ARES framework is not specific to RNA and can be applied to other problems in molecular structure. Instead, ARES was given a small set of motifs with known RNA structure plus a large number of alternative (incorrect) variations of these same structures. ARES parameters were adjusted so that the program learned the functional and geometric arrangements of each atom and how these elements are positioned relative to each other. Layers in the neural network compute features from finer to coarser scales to recognize base pairs, helices, and more-complex structures. For example, ARES learned patterns of base pairing, the optimal geometry for RNA helices, and a subset of noncanonical tertiary motifs without being provided explicit information about these features of RNA structure.

Although ARES was trained on very simple RNA systems, the resulting ARES scoring function was able to predict structures of more complex RNAs, on average, to roughly a 12-Å RMSD. This degree of accuracy represents an overall improvement of ∼4 Å over prior scoring methods. ARES is still short of the level consistent with atomic resolution or sufficient to guide identification of key functional sites or drug discovery efforts, but Townshend et al. have achieved notable progress in a field that has proven recalcitrant to transformative advances.

There are three fundamental challenges for modeling complex RNA three-dimensional structures: generating reasonable structures that may represent a biological state, accurately scoring or identifying models that best represent the correct native state, and using these hopefully accurate models to discover new functional motifs and to develop hypotheses regarding the mechanisms by which RNAs with complex three-dimensional structures regulate biological processes. The ARES machine-learning approach addressed the second of these three challenges: Candidate structures still need to be generated for evaluation by ARES. With further development, deep learning strategies hold promise for creating new scoring functions that can guide structure generation in ways that might yield near-native structures. Another important goal is to use a machine-learning strategy to identify regions in large RNAs most likely to fold into three-dimensional structures.

Current computational-only algorithms are not able to predict the pattern of base pairing in large RNAs accurately, even though base pairs are simpler to predict than tertiary structure. However, secondary structures for large RNAs are routinely modeled to high accuracies by incorporating experimental information. New, efficiently executed experiments are now being developed that measure features of RNA tertiary structures. Another frontier, analogous to recent advances in secondary structure modeling, would thus be to incorporate experimental information into machine-learning strategies for modeling RNA tertiary structure.

Large-scale investigation of RNA structure to date, primarily focused on RNA secondary structure, has revealed several core principles. One is that the existence of regions within large RNAs with complex, higher-order structure is unremarkable. When these base pairing and tertiary structures affect biological functions, they create “an RNA structure code” with pervasive effects on gene regulatory circuits. Additionally, every RNA likely has a distinct structural personality, which implies that there are numerous ways by which RNA structure tunes the underlying function of an RNA. At the level of secondary structure, such tuning RNA structures tend to function like switches and attenuators that modulate binding by RNA and protein ligands (8–11). Finally, characterization of well-determined RNA secondary structures often leads to identification of centers of new biology. As it becomes possible to measure, (deeply) learn, and predict the details of the tertiary RNA structure-ome, diverse new discoveries in biological mechanisms await.

REFERENCES AND NOTES

  1. E.J.Strobeletal.,Nat.Rev.Genet.19,615(2018).
  2. K.M.Weeks,Acc.Chem.Res.54,2502(2021).
  3. Z.A.Jaafar,J.S.Kieft,Nat.Rev.Microbiol.17,110(2019). 4. R.J.L.Townshendetal.,Science373,1047(2021).
  4. J.A.Cruzetal.,RNA18,610(2012).
  5. Z.Miaoetal.,RNA26,982(2020).
  6. E.Pennisi,Science373,262(2021).
  7. D.Longetal.,Nat.Struct.Mol.Biol.14,287(2007). 9. M.Kerteszetal.,Nat.Genet.39,1278(2007).
  8. D.Dominguezetal.,Mol.Cell70,854(2018).
  9. A.M.Mustoeetal.,Biochemistry57,3537(2018).

Therapy based on functional RNA elements

Over the past several years, advances in RNA sequencing have led to an increased appreciation of the prevalence and function of noncoding RNAs, including long noncoding RNAs (lncRNAs). These are typically expressed in a tissue-specific manner in healthy tissues and are often dysregulated in disease, making them potential biomarkers and therapeutic targets. On page 662 of this issue, Li et al. (1) reveal the biological importance of a lncRNA in an inherited metabolic disorder called phenylketonuria (PKU) and demonstrate in mice that a molecule that mimics the functional region of this lncRNA is a promising therapeutic. This discovery suggests that short lncRNA fragments could overcome some of the challenges faced by other RNA therapeutic modalities.

RNA-based and RNA-targeting therapeutics have many advantages: They are cost-effective, are relatively simple to manufacture, can target otherwise undruggable pathways, and have demonstrated success in the treatment of several diseases. Although RNA therapeutics have a long and bumpy history, advances in the generation, purification, and cellular delivery of short oligonucleotides and long RNAs have led to regulatory approval of several RNA-focused therapies, including the much-celebrated messenger RNA (mRNA)–based COVID-19 vaccines.

The human genome encodes a large number of RNA molecules that do not encode functional proteins, including tens of thousands that are classified as lncRNAs (2). lncRNAs and mRNAs are virtually identical at the molecular level, although lncRNA production is typically much more tissue specific. Also, lncRNA genes evolve much faster than protein-coding ones (3). lncRNAs have diverse roles, including in gene regulation and as scaffolds for macromolecular assemblies. Some lncRNAs function in cis—that is, in the vicinity of their site of transcription—whereas others are trans-acting, and their function is not affected by their production site within the genome. Because lncRNAs are expressed in a cell-, tissue-, developmental stage–, or disease-specific manner, their modulation could have substantial, but focal, consequences, which are expected to be well tolerated. However, the progress in elucidating their functions and causally linking genetic changes in lncRNA loci to disease has been slow.

Antisense oligonucleotides (ASOs) are currently the most common approach for therapeutic targeting of RNAs. These are single-stranded oligonucleotides that base pair with a target RNA and can either lead to target degradation or alter target RNA structure and/or its ability to interact with other factors. Chemical modifications of ASOs make them highly stable and able to permeate cells, and considerable progress has been made in the improvement of their pharmacological properties, allowing development of effective therapeutics such as nusinersen for spinal muscular atrophy (4). However, the limited sequence conservation of lncRNAs between human and mouse poses a substantial challenge, because many human lncRNAs do not have recognizable mouse orthologs (3). For those that are conserved, it is often impossible to find an ASO sequence that will recognize both the human and the mouse sequences, which substantially complicates preclinical drug development.

In other cases, increased lncRNA expression is sought, either because the lncRNA is mutated in a disease or because an increase in its concentration carries benefits. One conceptual challenge is that for lncRNAs that function in cis, exogenous delivery to the entire cell will likely not sufficiently increase their concentration at the target locus and may hence remain inconsequential. In any case, a major challenge is the delivery of a large RNA molecule. This can be potentially overcome by identifying and using a functionally active fragment of the full lncRNA. Such a functional element can be a region in the lncRNA molecule that is responsible for interacting with other factors, possibly resulting in changes to their abundance or activity.

For example, the lncRNA Nron (noncoding repressor of NFAT) was identified in mice as a critical suppressor of bone resorption, which is a pathological mechanism in osteoporosis (5). Delivery of full-length Nron using a bone-resorption surface-targeting nucleic acid delivery system inhibits bone resorption but causes side effects in mice, including splenomegaly, probably because of a strong immune response to the delivered RNA. However, the delivery of just the conserved functional motif of Nron, which binds the E3 ubiquitin ligase cullin-4B, effectively reversed bone loss in mice without any obvious side effects, indicating its potential translational use in osteoporosis (5).

Li et al. developed a therapeutic strategy based on the activity of the HULC (hepatocellular carcinoma up-regulated long non-coding RNA) lncRNA which, as they demonstrate, increases the activity of phenylalanine hydroxylase (PAH), which is mutated in PKU. They used lncRNA mimics containing a short fragment of HULC sequence that is tagged with an N-acetylgalactosamine (GalNAc) moiety that facilitates delivery to hepatocytes. Two different lncRNAs, Pair and HULC, perform this function in mouse and human liver, respectively, yet both were able to function equivalently in cells from both species, and the mimics of the functional region in human HULC were effective in vivo at improving PAH function in the mouse liver, without any detectable adverse effects on liver or kidney function.

The use of mimics of lncRNA functional motifs to treat human disease has several advantages compared with other approaches (see the figure). In contrast to therapeutic mRNAs, which need to be translated by ribosomes, and similarly to ASOs, lncRNA mimics can be extensively modified, which can facilitate high in vivo stability and decrease immunogenicity. They can also be easily tagged with organ-targeting peptides for tissue-specific distribution. Functional RNA motifs often do not have strict sequence requirements, which allows flexibility in designing lncRNA mimics and minimizing undesired activities, such as triggering antiviral pathways that recognize different RNA modalities. Because endogenous lncRNA activities are often tissue specific, there is, in principle, a relatively low potential for toxicity. Lastly, as exemplified by Li et al., functional elements can have conserved functions even if their sequences are entirely different, and so the same element can be equivalently active in humans and mice, overcoming a major challenge for ASOs.

Several hurdles still need to be overcome before lncRNAs or fragments thereof realize their full therapeutic potential. Perhaps most important is the need for advances in the methods to deliver RNA molecules to specific tissues and cell types (as nanoparticles or through other vehicles), which will also benefit therapeutic mRNAs and ASOs (6). The repertoire of lncRNAs whose biology is properly understood and linked to specific pathological states also needs to be expanded. Lastly, for as long as the delivery of full-length lncRNAs remains a challenge, new approaches will be needed in computational and/or experimental identification of lncRNA functional domains and of minimal backbones that will facilitate stability and desired subcellular localization.

References and Notes
↵Y. Li et al., Science 373, 662 (2021).
↵M. K. Iyer et al., Nat. Genet. 47, 199 (2015).
↵I. Ulitsky, Nat. Rev. Genet. 17, 601 (2016).
↵X. Shen, D. R. Corey, Nucleic Acids Res. 46, 1584 (2018).
↵F. Jin et al., Nat. Commun. 12, 3319 (2021).
↵M. D. Buschmann et al., Vaccines 9, 65 (2021).

Cell–cell interactions revealed with RABID-seq

Methods to study interacting cells and their transcriptomes are difficult to apply in living organisms. To facilitate such in vivo studies, Francisco Quintana from Harvard Medical School in Boston and the Broad Institute in Cambridge, Massachusetts, and his team came up with the idea of rabies barcode interaction detection followed by sequencing (RABID-seq). “We’ve been interested in astrocytes and their roles in health and disease,” says Quintana, explaining that astrocyte responses are controlled by multiple factors, such as metabolism and environment. “But one of the most important factors is literally cell–cell interactions,” says Quintana. He explains that RABID-seq allows studying these interactions in a comprehensive and unbiased fashion.

In RABID-seq, astrocytes are infected with a rabies-virus-based library encoding barcoded mCherry. Specific infection of astrocytes is achieved by using rabies virus pseudotyped with the envelope protein EnvA and by transgenically expressing the EnvA receptor TVA in astrocytes only, so that the initial infection is limited to astrocytes. The rabies virus used is the RabΔG variant, which lacks a crucial gene for a structural protein. This protein is transgenically expressed in astrocytes, which allows astrocytes to produce infectious virus particles, while other cell types, after infection due to their interactions with astrocytes, cannot produce functional virus particles and therefore do not continue a chain of infection.

Once astrocyte-interacting cells are infected, the tissue is dissociated, astrocytes and their interaction partners are enriched by sorting for mCherry expression, and their transcriptomes are analyzed with single-cell RNA sequencing. The aforementioned barcodes are inserted in the 3′ untranslated region of the mCherry transcripts and are read out by single-cell RNA sequencing as well. Cells that interacted with each other will harbor the same barcodes.

Quintana says that establishing RABID-seq has been a highly collaborative efforts and gives special credit to Iain Clark, Cristina Gutiérrez-Vázquez and Michael Wheeler, who are joint first authors of the publication. While RABID-seq may appear to be a straightforward combination of existing technologies, the team had to overcome hurdles to make the technology work. For instance, replication of the libraries was a challenge, as was the optimization of a computational pipeline to analyze the data. “The single-cell RNA-seq dataset allows you to establish cell types, cell subsets and activation status,” says Quintana, and the data can be mined for specific astrocyte populations and their interaction partners, ligands and receptors that might mediate these interactions, as well as signaling pathways that are upregulated.

Quintana and his team applied RABID-seq to study the interactions of astrocytes in experimental autoimmune encephalomyelitis (EAE) mice, which serve as a model for multiple sclerosis. In control mice, astrocytes interacted with other astrocytes, microglia and a few other cell types. In contrast, astrocytes in the EAE model also interacted with immune cells such as T cells, dendritic cells, monocytes and macrophages, which is consistent with the inflammation observed in the central nervous system of this mouse model. The researchers then focused on microglia–astrocyte interactions. They analyzed potential ligand–receptor interactions and identified the semaphorin–plexin pathway as a promising candidate for microglia–astrocyte communication. The researchers also found a role for EphB3 in the proinflammatory activity of astrocytes via its ligand ephrin-B3 in microglia.

One concern about rabies virus is its potential for causing deleterious effects in infected cells. “We didn’t see significant neurotoxicity,” says Quintana. In fact, he was more concerned “whether you would induce an immune response to the virus.” This was not a substantial problem in their studies, but Quintana cautions that care must be exercised when studying subtle effects.

RABID-seq has proven a useful tool in the hands of Quintana and his team. Now they are working on a second generation of RABID-seq. So far, RABID-seq relies on transgenic components, which complicates experiments with mouse lines that have a complex genetic background or prevents experiments with ex vivo human tissue samples. To overcome these hurdles, the team is establishing a RABID-seq version that makes use exclusively of viral tools to deliver the different components.

Research paper
Clark, I.C. et al. Barcoded viral tracing of single-cell interactions in central nervous system inflammation. Science 372, 360 (2021).

Microexon alternative splicing

Microexons are small sized (≤51 bp) exons which undergo extensive alternative splicing in neurons, microglia, embryonic stem cells, and cancer cells, giving rise to cell type specific protein isoforms. Due to their small sizes, microexons provide a unique challenge for the splicing machinery. They frequently lack exon splicer enhancers/repressors and require specialized neighboring trans-regulatory and cis-regulatory elements bound by RNA binding proteins (RBPs) for their inclusion. The functional consequences of including microexons within mRNAs have been extensively documented in the central nervous system (CNS) and aberrations in their inclusion have been observed to lead to abnormal processes. Despite the increasing evidence for microexons impacting cellular physiology within CNS, mechanistic details illustrating their functional importance in diseases of the CNS is still limited.

PTBP, known as hnRNPI, binds the polypyrimidine-rich region (U/CUCUCU) within introns and affects neuronal AS (Gil et al., 1991; Patton et al., 1991; Zheng, 2020). PTBPs has been extensively shown to be involved in AS of microexons in neurons during different contexts (Black, 1992; Chan & Black, 1995; Markovtsov et al., 2000). Neural progenitor cells abundantly express PTBP1 and during neurogenesis the expression of PTBP1 decreases, while the expression of its paralog PTBP2 increases (Chan & Black, 1997; Makeyev et al., 2007; Spellman et al., 2007). As a result, the synchronization of PTBP paralogs is critical for neuronal development and the switching of neuronal programs. The CLIP and RNA-seq data reveal that PTBP1 regulates microexon AS by binding upstream of the microexon (Y. I. Li et al., 2015). In the Neuro2A mouse neuroblastoma cell line, PTBP1-depletion caused microexon inclusion (~94%), whereas only 8% showed exclusion, inferring that PTBP1 is a repressor of microexon inclusion (Y. I. Li et al., 2015). This is in line with previous work from Black’s lab, where PTBP1 represses the N1 microexon inclusion of c-src mRNA in non-neuronal cells (Black, 1992; Chan & Black, 1997; Min et al., 1995). Likewise, PTBP1-depletion caused microexon skipping within the eIF4G transcript in neurons (Gonatopoulos-Pournatzis et al., 2020). Another study demonstrated that in the neural progenitor cell, microexon 5 of BAK1 transcript is skipped and this is promoted by the PTBP1 binding to the intronic region proximity to the 3′ splice site of the microexon. However, as neural progenitor cells differentiate to neurons, PTBP1 expression decreases, allowing the microexon to be included in the BAK1 transcript triggering the loss of BAK1 protein and enhancing neuronal survival (Lin et al., 2020). Therefore, PTBP1 is a microexon AS regulator playing crucial roles in neurons.

The physiological consequences or the causality of mis-spliced microexons has not been functionally examined. Possible ways to solve such a conundrum include performing gene editing with e.g., CRISPR/Cas9 to precisely remove individual microexons or flanking RNA elements and examine the functional outcomes (Du et al., 2020; Yuan et al., 2018). This approach will improve our comprehension of different small GTPase protein isoforms in regulating cellular physiology and CNS function. The observation of microexons AS in autism spectrum disorders is the beginning of mining these splicing events, especially of small GTPase regulators in CNS disorders at large, and determining whether microexon AS defects are a common feature of other disorders.

Source: PMID 34155820

A glimpse at the glycoRNA world

RNA modifications, discovered decades ago, have important biological functions. The most functionally validated modification is the 5′ m7G cap of mRNAs that controls canonical translation (Wei et al., 1975). A wide variety of modifications are present in tRNAs that affect its folding as well as translation. In the past decade, there has been an explosion in the number of known RNA modifications, particularly in mRNAs, and the discovery of their biological roles has spawned the important field of epitranscriptomics (Nachtergaele and He, 2018). In this issue in Cell, Flynn et al., link glycol- and RNA biology with the discovery of a new biopolymer, glycoRNA, a class of RNAs that are glycosylated with sialic acids and fucose (Flynn et al., 2021)

Paradigm shifts often require the development and implementation of tools to dissect and study dark spaces in biology. One approach for isolating, analyzing, and imaging glycosylated biomolecules is to co-opt the cellular biosynthesis of glycans by providing N-azidoacetylmannosamine (Ac4ManNAz), allowing researchers to label sialic-acid-containing glycans with a bioorthogonal handle (Baskin et al., 2007; Saxon et al., 2002). Although used broadly to detect glycosylated proteins, Flynn et al. report the first use of this approach to probe glycosylated RNAs. Indeed, glycoRNAs were detected in various human cell lines and in mouse liver and spleen tissues.

Although many modified mRNAs are found in the epitranscriptome, glycosylated mRNAs were not found. Rather, glycoRNAs are small nuclear (sn)RNAs, ribosomal (r)RNAs, small nucleolar (sno)RNAs, tRNAs, and Y RNAs, the latter of which comprise the greatest percentage of glycosylated RNA species. Further, fractionation and immunohistochemical imaging studies revealed that glycoRNAs are mainly associated with the cell surface, experimentally supported by their loss from the cell surface upon treatment with an enzyme the cleaves sialic acid (Figure 1). That Y RNAs are glycosylated is particularly interesting. Small, conserved RNAs that form ribonucleoprotein complexes, Y RNAs are known antigens associated with autoimmune diseases such as lupus. Because of this disease association and conservation, a series of rigorous experiments were completed to validate Y RNA glycosylation. In particular, CRISPR-Cas9 knockout of Y RNAs in HEK293T cells, which did not affect cell growth as expected from previous studies, ablated Ac4ManNAz-labeling of cells.

Next, the authors investigated whether the same biosynthetic machinery that produces N- and O-linked glycans used to glycosylate proteins also glycosylate RNA. They employed both genetic and pharmacological inhibition approaches. In cells where the glycan biosynthetic machinery is impaired by genetic manipulation, production of glycoRNA is impaired, which can be reversed by supplementation with exogenous glycan. Pharmacological inhibition of oligosaccharyltransferase also diminishes production of glycoRNA. Each study supports that the glycan biosynthetic machinery also produces cellular glycoRNA.

Expression of glycoRNA on the cell surface suggests it may play a role in signaling. It has been assumed that all cell-surface interactions of sialic acid binding-immunoglobulin lectin-type (Siglec) receptor family is due to its binding to glycolipids or glycoproteins. The Siglecs are the largest family of sialoside-binding proteins in humans, and they have important roles in various diseases, from cancers to autoimmune disorders to host-pathogen interactions. Flynn et al. show that two members of the Siglec family (−14 and −11) (Crocker et al., 2007) have interactions with the cell surface that are sensitive to RNase treatment, suggesting that glycoRNA mediate these interactions (Figure 1).

Armed with knowledge of the biopolymer, these rigorous and thorough studies lay the foundation to investigate the exact architecture and structure of glycoRNA; how the glycans are synthesized and incorporated into RNA, which RNAs are subject to glycosylation; and the regulation of its biosynthetic pathway. Most importantly, the precise biological functions of glycoRNAs can be determined. It was only a few decades ago that both RNA and glycans were an afterthought as direct players in human biology. Now that they have chemically joined forces, we should look forward to learning about how glycoRNAs affect biological processes!

References

J.M. Baskin, J.A. Prescher, S.T. Laughlin, N.J. Agard, P.V. Chang, I.A. Miller, A. Lo, J.A. Codelli, C.R. Bertozzi Copper-free click chemistry for dynamic in vivo imaging. Proc. Natl. Acad. Sci. USA, 104 (2007), pp. 16793-16797

P.R. Crocker, J.C. Paulson, A. Varki Siglecs and their roles in the immune system. Nat. Rev. Immunol., 7 (2007), pp. 255-266

R.A. Flynn, K. Pedram, S.A. Malaker, P.J. Batista, B.A.H. Smith, A.G. Johnson, B.M. George, K. Majzoub, P.W. Villalta, J.E. Carette, et al. Small RNAs are modified with N-glycans and displayed on the surface of living cells. Cell, 184 (2021), pp. 3109-3124

S. Nachtergaele, C. He Chemical modifications in the life of an mRNA transcript. Annu. Rev. Genet., 52 (2018), pp. 349-372

E. Saxon, S.J. Luchansky, H.C. Hang, C. Yu, S.C. Lee, C.R. Bertozzi. Investigating cellular metabolism of synthetic azidosugars with the Staudinger ligation J. Am. Chem. Soc., 124 (2002), pp. 14893-14902

C.M. Wei, A. Gershowitz, B. Moss. Methylated nucleotides block 5¢ terminus of HeLa cell messenger RNA Cell, 4 (1975), pp. 379-386

OUTSTANDING QUESTIONS FOT TDP43

TDP-43 regulates hundreds of transcripts. Do a small number of these target genes account for disease pathogenesis and progression, or are TDP-43 proteinopathies the result of many modest molecular ‘paper cuts’ collectively summing to major dysfunctions?

Can restoration of a small number of TDP-43 target RNAs (e.g., STMN2) serve as a clinical strategy for TDP-43 proteinopathies, or do therapeutic approaches need to focus on pathways upstream of TDP-43 to restore a broader set of transcripts?

More than 50 mutations associated with disease have been identified in TARDBP . What are the consequences of these mutations on TDP-43 function, and do they lead to distinct or common defects?

The ability of TDP-43 to phase separate into biomolecular condensates is well established, but how is this process regulated in response to cellular stresses? Are there signal transduction cascades that regulate the ability of TDP-43 to specifically respond to these insults? If so, what are they?

TDP-43 pathology has also been observed in astrocytes. Does TDP-43 pathology in non-neuronal cell types also lead to significant alterations in RNA metabolism? Does pathology in non-neuronal cells contribute to disease onset or progression?

Reduced STMN2 expression has been observed in other TDP-43 proteinopathies. Is the mechanism behind reduced STMN2 expression also the consequence of premature polyadenylation and inclusion of a cryptic exon, or is it the result of neuronal loss?

RNA therapeutic news roundup for 2020

Over the course of 2020, you heard a number of announcements from us about our collaboration with Ionis Pharmaceuticals on an ASO for prion disease. We published on our clinical strategy and discussions with regulators [Vallabh 2020a], preclinical data in mice [Minikel 2020], and the natural history of biomarkers from our clinical research study at MGH [Vallabh 2020b]. We got officially listed on Ionis’s pipeline and Dr. Anne Smith, who is leading clinical development, spoke to the patient community at CJD Foundation’s virtual conference. While those are all the updates directly relevant to our prion disease program, I’m always keeping my eye on interesting developments in the science of ASOs and other RNA-targeting therapeutics more broadly. Over the course of the past year there were several news stories and/or papers that caught my eye that I never yet got around to blogging about. Here is a roundup.
an oral ASO against PCSK9

The ASO on which we’re collaborating with Ionis will be delivered intrathecally, meaning via a lumbar puncture. Currently this is the only realistic way to get ASOs into the brain. One question we often get from people in the prion disease community is “when will this be a pill?” The answer is not for a long time. Intrathecal delivery of ASOs to the CNS builds on years of work to develop comparable ASOs for spinal muscular atrophy and other diseases, and even someone invents a way to deliver ASOs to the brain orally, developing and honing that technology, demonstrating its safety, and applying it to our disease in particular, will take years. Still, it’s interesting to keep an eye on new technological developments that could eventually be relevant. Therefore, I was interested to see a conference abstract late last year that an oral ASO, albeit not for a brain indication, is now in preclinical development [Gennemark 2020]. The actual drug in question is an ASO against PCSK9 being developed for a liver indication, hypercholesterolemia. The abstract describes preclinical development, with testing in rats, dogs, and monkeys. Bioavailability in the liver was about 7% in dogs. The compound doesn’t match rat sequence, but a proof-of-concept compound against a different target achieved 78% knockdown in rats. They mention that there was a reduction (though they don’t say how much) in plasma LDL in monkeys at doses of 28-56 mg/day. A PCSK9 ASO is currently in a clinical trial (NCT04641299) with subcutaneous delivery, but I haven’t yet heard anything about a human trial of oral delivery. This is exciting, but remember that the liver is perhaps the easiest tissue to hit with a drug, while brain is perhaps the hardest. An ASO pill for brain diseases is still likely a long ways off.
risdiplam: approval of a splice-modulating small molecule

Risdiplam, a small molecule designed to modulate splicing of SMN2 RNA in spinal muscular atrophy, obtained FDA approval. It is the third disease-modifying therapy to be approved for spinal muscular atrophy, after nusinersen (an ASO) and onasemnogene abeparvovec (an AAV gene therapy), and the first to be given as an oral tablet. But, much more excitingly, it is a novel type of therapy. Until now, the only approved small molecule drugs targeting RNA were linezolid and tedizolid, two antibiotics that bind a pocket in the bacterial 23S ribosomal RNA [Warner 2018]. Risdiplam binds the complex of the SMN2 RNA with the U1 snRNP [Campagne 2019], causing SMN2 exon 7 to be included, thus giving rise to a functional SMN protein. As a modulator of the splicing of a human RNA, it represents a whole new therapeutic mechanism and a flagship for small molecules targeting RNA. PTC Therapeutics, partnered with Roche/Genentech, spent many years developing splice modulators for SMA [Naryshkin 2014, Ratni 2016, Ratni 2018], and they even brought an earlier analogue, RG7800, to the clinic [Kletzl 2019] before pivoting to risdiplam. Both preclinical and Phase I clinical studies of risdiplam showed good safety, pharmacokinetics, and an ability to induce full-length SMN2 RNA just as hoped [Poirier 2018, Sturm 2018]. As far as I could tell, the Phase II/III data have not yet been published, so all we have to go on is Genentech’s press release and the FDA label. Those indicate that, for instance, 90% of risdiplam-treated infants survived after a year of treatment, a timepoint where, based on natural history data, only ~25% of untreated infants would have been expected to survive.

I blogged about splice-modulating small molecules five years ago, and noted that one concern was whether they could modulate the target with sufficient specificity. ASOs may target, say, a 20 nucleotide RNA sequence, which often is enough to have no perfect matches anywhere else in the transcriptome. In contrast, risdiplam only appears to make direct contact with just 3 nucleotides in SMN2 RNA, and the RNA sequence motif for which the compound is active is no longer than 11 nucleotides [Campagne 2019]. PTC’s first SMN2 splice modulator resulted in differential expression of 12 genes [Naryshkin 2014], and Novartis’s competing compound, branaplam (then called NVS-SM1) affected expression of 175 genes [Palacino 2015]. I had wondered whether the efficacy on the one target of interest could really outweigh the safety implications of affecting tens or hundreds of other genes. The new clinical data appear to finally prove that risdiplam’s specificity is, in fact, sufficient to confer a rather favorable safety/efficacy balance. Some of the drug’s specificity may be conferred directly by the sequence that the drug binds, but rather by the fact that the drug turns a weak 5’ splice site into a strong one [Campagne 2019]. Thus, in order to be affected by the drug, a splice site must not only have the right sequence, but also be poised at just the right level of baseline U1 snRNP binding, not too strong and not too weak, such that the drug makes a difference. Meanwhile, the fact that some splice-modulating small molecules are not quite perfectly specific has pointed to intriguing new applications. Novartis’s branaplam, originally developed for SMA, turns out to also downregulate expression of HTT, and they’ve now pivoted to developing it as a therapy for Huntington’s disease. Whether it will prove safe and effective in that indication remains to be seen, but at least, this appears to provide a precedent for the notion that RNA-targeting small molecules could potentially be used to knock down a gain-of-function disease gene — their application may not be limited to correcting splice defects.
a dose-limiting toxicity in Angelman’s syndrome

A Phase I/II trial (NCT04259281) of GTX-102, an intrathecal ASO developed by Ultragenyx for Angelman’s syndrome, was paused this past fall after a significant safety event. An Ultragenyx press release stated that the dose levels in the trial were 3.3, 10, 20, and 36 mg. As reported by FierceBiotech, one patient in the 20 mg group and four in the 36 mg group had a lower limb weakness progressing to inability to walk about 1-4 weeks after dosing. This weakness gradually resolved, and the press release stated that some clinical improvement in disease symptoms was observed and even outlasted the lower limb weakness. Thus, there is no indication that development of GTX-102 will be halted, though it looks like dosing will be adjusted. The press release states that nothing like this was observed in the monkey toxicology studies that led up to clinical trials.

This development caught my notice because it might represent the first indication of a dose-limiting toxicity for an intrathecal ASO in humans. (None of Ionis’s intrathecal ASOs that have completed at least one trial in humans — nusinersen, tominersen, and tofersen — had any significant safety issues observed.) Still, it’s not yet clear how relevant the GTX-102 observations will be to ASOs more broadly. First, this adverse event may be specific to this one compound; there is not yet reason to suspect it represents any sort of broader class effect. Second, if it is a class effect, we don’t yet know what the “class” is — as far as I can tell, the exact chemical composition and sequence of GTX-102 have not been disclosed, though it is presumably one of the many ASOs enumerated in patent US20200370046A1. ASOs come in a lot of different chemistries, and some of them have toxic liabilities that others don’t — for instance there is literature on the specific safety issues with locked nucleic acid (LNA) ASOs [Burel 2016], which do not seem to occur with 2’MOE ASOs. Overall, the GTX-102 news certainly doesn’t dampen our enthusiasm for developing an ASO for prion disease, but it is something we’ll be keeping our eye on in case it affects how dosing levels are selected or what safety issues drug companies or regulators want to monitor for in future clinical trials.
tofersen in SOD1 ALS

Excitingly, we learned the results from the Phase I/II trial of tofersen, Ionis Pharmaceuticals’ ASO against SOD1 for ALS [Miller 2020]. Nusinersen, the splice-modulating ASO targeting SMN2 for spinal muscular atrophy, has been FDA-approved since 2016 and we now have lots of data on that drug. But, following tominersen against HTT for Huntington’s disease [Tabrizi 2019], tofersen is now only the second intrathecal ASO designed for knockdown, as opposed to splice modulation, of a target gene, to have clinical trial results read out in humans. The trial involved five treatment arms: patients received injections of either placebo or 20, 40, 60, or 100 mg of tofersen. Each patient got five injecitons spaced over 3 months, and was followed for another 3 months afterwards. Tofersen achieved clear target engagement: at 3 months after the first dose, the concentration of SOD1 protein in patient CSF at the highest dose was reduced by 33% to 36% (depending on whether you compare only to the patients’ own baselines, or also normalize to the placebo group). That’s comparable to tominersen, which acheived 40% knockdown of mutant huntingtin in CSF. But potentially more exciting findings are from the trial’s secondary endpoints. They measured three outcomes related to ALS disease progression: ALSFRS-R score (measuring overall ability to function), slow vital capacity (ability to breathe), and handheld dynamometry megascore (muscle strength). Overall, patients at the highest dose group experienced less disease progression on all three measures than patients in the placebo group. That may provide some whiff of signal that the ASO is not only doing its job of knocking down the disease-causing protein, but also affecting the disease process. Of course, it’s important to remember that this is super preliminary. This small trial was not powered to detect change in these outcomes, so the data are noisy and there is not a clean dose-response relationship across the different tofersen dose levels. For example, ALSFRS-R in the 20 mg group declined by just 0.76 points, while in the placebo group it declined by 5.63 points, even though the 20 mg dose did not acheive appreciable target engagement (-1% to +2% change in CSF SOD1, depending on how you normalize). That example shows how any signal in these exploratory outcomes is also wrapped up with a lot of noise. A Phase III trial (NCT02623699) is now underway to definitively answer the question of whether tofersen slows progression of SOD1 ALS.
intrathecal AAV in adult humans

Finally, one study that received a fair amount of press recently reported on a new potential modality: intrathecally delivered AAV-vectored microRNA [Mueller 2020]. In other words, a small RNA designed to knock down SOD1 RNA was placed in a viral vector and injected into the CSF of adults with SOD1 ALS. This is interesting because it addresses an area where there were previously no data: intrathecal AAV in adult humans. As background, an intravenous AAV drug is approved for infants — onasemnogene abeparvovec, a virus which delivers an intact copy of SMN1 for spinal muscular atrophy [Mendell 2017]. But rodent data have for years suggested that AAV9 uptake into brain neurons is good in neonates but very weak in adult animals [Foust 2009]. That has left open the question of whether a drug like onasemnogene could ever work for an adult disorder. Early primate studies didn’t provide much basis for optimism: when 1.8×10^12 viral genomes (vg) were delivered intrathecally in macaques, the distribution was broad across the brain but very low yield [Gray 2013]. They measured both GFP positivity and the ratio of viral genomes to host diploid genomes (vg/dg), and both were around 2%. Most scientists I’ve spoken with assume that we need to engineer better AAV vectors in order to make adult brain gene therapy a reality, as explained here. Nonetheless, some scientists have hypothesized that it might be possible to achieve meaningful levels of neuronal uptake with existing AAVs, such as AAV9, if the drug is delivered intrathecally at super high doses. To date there has been a shortage of data on this topic. A recent conference abstract hinted at meaningful target engagement in monkeys with an AAV gene therapy [Thomsen 2019], but was short on detail and I didn’t manage to see the actual talk at AAN2019.

The preclinical development of the SOD1 miRNA therapy in question here had previously reported promising results in primates. They intrathecally injected macaques with a whopping 3.5×10^13 viral genomes of AAVrh10 and the vg/dg ratio was nearly 100 in bulk spinal cord, or about 5.3 when they microdissected out motor neurons [Borel & Gernoux 2018]. The treatment resulted in ~50% apparent knockdown of SOD1. Scaling from monkeys to humans, they selected a dose level of 5×10^14 vg, again delivered intrathecally, for human studies [Mueller 2020]. The trial included just 2 patients, though, so ultimately it is hard to tell what the effect was. SOD1 protein in CSF was not lowered in either patient. In the one patient who died and underwent autopsy, the vg/dg ratio in spinal cord was right around 10, substantially lower than in the monkeys but still high enough to be potentially meaningful, and SOD1 protein level was nominally lower than in a few untreated patients, but it is tough to tell whether that is treatment-related or a fluke. The injection resulted in a considerable immune response but appeared to be managed effectively with immunosupression (sirolimus and prednisone). Overall, it’s an interesting development, but we’ll need to see studies in many more patients to know whether intrathecal AAV is a potential therapeutic modality for adult CNS diseases.

Source: http://www.cureffi.org/2021/01/11/rna-therapeutic-news-roundup-2020/

Nat Biotech:基因编辑治疗有效的递送方式

基因编辑技术可以广泛被用于治疗很多种类型的疾病,比如单基因突变引起的孟德尔遗传疾病或者更加复杂的特发性疾病等(图1)。序列特异性核酸酶技术极大地加速了基于致病基因敲除或内源突变基因修复的基因编辑治疗手段的发展。这些基因编辑治疗的手段也逐步被推广到临床中使用,然而有效的递送与运载技术在近年来被认为是决定基因编辑治疗成功的关键一环,例如用于基因治疗的腺相关病毒与慢病毒载体以及用于核酸和蛋白质递送的脂质纳米颗粒和其他非病毒类载体。

大分子递送
自基因编辑技术问世以来就一直限制着核酸治疗领域。例如,治疗单基因复发性疾病B2型血友病、腺苷脱氨酶缺乏症、2型莱伯氏先天性贫血面临的主要瓶颈的突破是成功开发出腺相关病毒(AAV)载体及逆转录病毒和慢病毒载体系统。早在1978年的时候,人们就发现反义寡核苷酸就可以在鸡胚成纤维细胞中抑制mRNA的翻译,然而姗姗来迟的递送技术使得这项基因编辑疗法直到1998年才获得美国FDA的批准。类似地,得益于为递送小干扰RNA(siRNA)而成功开发的脂质纳米粒技术,临床批准了使用RNAi治疗遗传性甲状腺素介导的淀粉样变性(hATTR)。近年来,临床上运用基因疗法,反义基因疗法,小干扰RNA(siRNA)疗法等揭示了向患者成功递送基因编辑技术的可能性。 近日,来自美国加州大学伯克利分校的David V. Schaffer教授团队和Niren Murthy教授团队合作在Nature Biotechnology在线发表题为The delivery challenge: fulfilling the promise oftherapeutic genome editing的综述文章。文章充分探讨了基因编辑技术如何与载体协同作用,以快速实现(尤其是针对血液疾病)临床前和临床应用的快速进展。此外,他们还重点指出若想充分发挥基因编辑治疗潜力而必须克服的挑战,特别是对于需要内源基因进行同源定向修复(HDR),体内递送至肝脏,肌肉或中枢神经系统的情况。

基因编辑酶及潜在治疗机理
大部分基因编辑技术依赖DNA核酸酶靶向细胞基因组中的特定位点,例如锌指核酸酶(ZFNs), 转录激活子样效应子核酸酶(TALENs), 大范围核酸酶和CRISPR-Cas9系统。核酸酶可以以cDNA的形式进行基因表达。核酸酶的DNA结合结构域指导其在细胞基因组约20个核苷酸的靶点位处引入DNA的双链断裂,随后通过非同源末端连接(NHEJ)和同源DNA序列介导的DNA(HDR)两种修复机制修复断裂的DNA. 另一种相关的碱基编辑技术,在不造成双链DNA断裂的情况下,Cas9酶的DNA结合结构域(dCas9)与一种可以改变单一目标DNA的碱基的酶活性融合。最新开发的初级编辑系统在不引起DNA双链断裂的情况下,不仅可以修正点突变,还可以纠正小插入缺失的问题。HDR还可用于治疗特定基因功能丧失的隐性突变。例如,将编码白介素2受体γ链的cDNA敲入相应的内源基因座,从而将用具有治疗功能的cDNA控制后续转录过程,而避免了有遗传毒性的逆转录病毒基因片段的插入。

基因编辑技术的有效递送到底难在哪?
首先,一些基因编辑所需的酶的大小就超过了常用病毒基因传递载体的大小,而当我们基因编辑时涉及到HDR,供体模板会大大增加整个载体的大小。另一个面临的难题是基因组编辑工具应该仅在靶细胞中完成瞬时递送,因为长时间的活性可能会造成对脱靶核酸酶的基因毒性和对这些原核蛋白的免疫反应。虽然非病毒递送可以帮助我们实现瞬时递送,但是我们依旧大力开发AAV和慢病毒用于基因替代疗法,部分原因是它们具有长期介导基因表达的能力。基因编辑技术遇到的另一项挑战是基于核酸酶的基因编辑机制,当DNA链断裂后,可能会引发一系列的遗传毒性。除此之外,基因编辑载体引入,可能会造成一些特殊的免疫反应。例如,引入非自身蛋白Cas9或者合成的gRNAs都可能在动物体内引发从头发生的免疫反应(de novo immune responses)。HDR过程所涉及的DNA供体也可能诱发先天免疫反应,从而导致细胞毒性。

由病毒介导的递送机制
现有的基因编辑技术常常使用AAV、慢病毒、腺病毒等载体(图1)。AAV具有一个4.7kb的单链线性DNA基因组,其编码两个基因:rep(介导基因组复制)和cap(编码病毒衣壳的结构蛋白。通过反式结构(in trans)将rep和cap包裹在衣壳内,可以产生一个不具备复制能力的载体,不同的衣壳能够在体内递送至不同的组织和细胞。慢病毒载体是先将 HIV-1或其他慢病毒的活性相关序列去除,然后再在这个慢病毒基因组骨架中引入实验所需要的目标基因的序列和表达结构,并将之制备成约10kb大小载体。慢病毒可以用不同的包膜蛋白来定向递送的方向性。腺病毒载体是双链线性DNA(dsDNA)病毒。在载体制备过程中,去除特定的病毒元件(如E1),为转基因的插入腾出了空间。

1-体外递送
首次进行基因编辑的人类临床试验是基于T细胞的一种细胞疗法,该T细胞在体外已用含有针对CCR5的ZFN的腺病毒载体转导。HIV将其作为进入细胞的受体,从而产生不会被HIV感染的T细胞。自体移植后,抗HIV小分子治疗中断后,CCR5敲除的T细胞比正常T细胞存活得更好,并且在大多数受试者中,HIV mRNA的循环水平明显下降。AAV也已用于体外递送,差别在于,AAV在不存在核酸酶的情况下使用DNA作为供体。

2-肝脏递送
上述提到过的HDR介导的将cDNA整合到高度表达的内源基因座上产生很强的转基因表达,这为治疗血友病提供了思路。在ZFN介导的相应cDNA敲入强表达白蛋白基因座后,肝脏可以分泌高水平的VIII因子和IX因子。最近,Cas9已被用于肝脏基因组编辑。但是考虑到SpCas9的大小,需要分成两个单独的载体:一个编码核酸酶,一个编码gRNA,有时还需要供体模板。一些较短的Cas9变体,例如SaCas9能够在单个AAV载体中递送核酸酶和gRNA。除此之外,腺病毒也可被用于递送Cas9。腺病毒载体虽然具有免疫原性,但其能在单个载体中容纳核酸酶和供体模板。

3-神经系统递送
中枢神经系统是基因编辑治疗的另一个重要靶标,尤其是通过病理等位基因的敲除为许多常染色体显性遗传等单基因疾病带来了治疗的曙光。在最近的一项工作中,使用了AAV双载体,一个病毒载体表达SpCas9,另一个表达gRNA,从而敲除突变的亨廷顿蛋白。在亨廷顿舞蹈病的小鼠模型中,这种基因编辑疗法可降低神经毒性和运动功能障碍。

4-视网膜递送
视网膜是另一个有望成功应用基因编辑治疗的领域,特别是考虑到许多单基因视网膜疾病具有良好的生物学特性、动物学模型的可用性、可用的近期临床终点以及FDA批准建立的LCA2基因治疗调控路径。10型莱伯氏先天性黑内障普遍存在的突变是由于一个隐秘的剪切位点破坏了CEP290蛋白的表达。一项已经进入I期临床试验的治疗手段是使用双载体AAV5系统来传递SpCas9和两种gRNAs,来切除基因组上的突变。

5-肌肉组织递送
有多项出色的研究在载体AAV8/AAV9中使用双重编码系统:编码SpCas9和其他两个gRNAs,来切除含有过早终止密码子的肌营养不良蛋白外显子,从而达到治疗杜氏肌营养不良症(DMD)。由此产生的外显子跳读可以恢复肌营养不良蛋白的正常表达,从而显著增强了骨骼肌的功能。

6-非病毒介导的递送机制
非病毒介导的递送尽管基因组编辑酶的效率不及病毒递送的方法,但核酸酶活性具有瞬时性优势。更重要的是,与病毒载体不同,非病毒载体(例如,基于脂质的纳米颗粒)可以重复给药,从而提高基因编辑成功的机会。 总而言之,尽管基因编辑治疗领域近年来取得了很多突破和进展,但是在完全实现基因编辑临床治疗之前,需要解决包括如何实现高递送效率,如何实现高容量载体并且实现高效递送,以及如何提供瞬时高表达的载体等一系列问题。

原文链接:https://www.nature.com/articles/s41587-020-0565-5

Cell: The architecture of SARS-CoV-2 transcriptome

Jean and Peter Medawar wrote in 1977 that a virus is “simply a piece of bad news wrapped up in proteins.” The “bad news” in the SARS-CoV-2 case is the new coronavirus carries its mysterious genome in the form of a very long ribonucleic acid (RNA) molecule. Grappling with COVID-19 pandemic, the world seems to be lost with no sense of direction in uncovering what this coronavirus (SARS-Cov-2) is composed of. Being an RNA virus, SARS-Cov-2 enters host cells and replicates a genomic RNA and produce many smaller RNAs (called “subgenomic RNAs”). These subgenomic RNAs are used for the synthesis of various proteins (spikes, envelopes, etc.) that are required for the beginning of SARS-Cov-2 lineage. Thus, the smaller RNAs make good targets for messing up new coronavirus’s conquering of our immune system. Though recent studies reported the sequence of the RNA genome, they only predicted where their genes might be, leaving the world still drown in disorientation.

The life cycle of SARS-CoV-2

Figure 1 The life cycle of SARS-CoV-2



When the spike protein of SARS-CoV-2 binds to the receptor of the host cell, the virus enters the cell, and then the envelope is peeled off, which let genomic RNA be present in the cytoplasm. The ORF1a and ORF1b RNAs are made by genomic RNA, and then translated into pp1a and pp1ab proteins, respectively. Protein pp1a and ppa1b are cleaved by protease to make a total of 16 nonstructural proteins. Some nonstructural proteins form a replication/transcription complex (RNA-dependent RNA polymerase, RdRp), which use the (+) strand genomic RNA as a template. The (+) strand genomic RNA produced through the replication process becomes the genome of the new virus particle. Subgenomic RNAs produced through the transcription are translated into structural proteins (S: spike protein, E: envelope protein, M: membrane protein, and N: nucleocapsid protein) which form a viral particle. Spike, envelope and membrane proteins enter the endoplasmic reticulum, and the nucleocapsid protein is combined with the (+) strand genomic RNA to become a nucleoprotein complex. They merge into the complete virus particle in the endoplasmic reticulum-Golgi apparatus compartment, and are excreted to extracellular region through the Golgi apparatus and the vesicle.

Led by Professors KIM V. Narry and CHANG Hyeshik, the research team of the Center for RNA Research within the Institute for Basic Science (IBS), South Korea, succeeded in dissecting the architecture of SARS-CoV-2 RNA genome, in collaboration with Korea National Institute of Health (KNIH) within Korea Centers for Disease Control & Prevention (KCDC). The researchers experimentally confirmed the predicted subgenomic RNAs that are in turn translated into viral proteins. Furthermore, they analyzed the sequence information of each RNA and revealed where genes are exactly located on a genomic RNA. “Not only to detailing the structure of SARS-CoV-2, we also discovered numerous new RNAs and multiple unknown chemical modification on the viral RNAs. Our work provides a high-resolution map of SARS-CoV-2. This map will help understand how the virus replicates and how it escapes the human defense system,” explains Professor KIM V. Narry, the corresponding author of the study.

Composition of genomic and subgenomic RNAs of SARS-CoV-2
and schematic diagram of virus particle structure

Figure 2 Composition of genomic and subgenomic RNAs of SARS-CoV-2, and schematic diagram of virus particle structure



SARS-CoV-2 RNAs are known to consists of ORF1a, ORF1b, ORFS, ORFE, ORFM, ORFN, ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10. This study, all RNAs except ORF10 were experimentally validated. The prediction that ORF10 exists seems to be wrong. There are nine subgenomic RNAs (S, E, M, N, 3a, 6, 7a, 7b, 8) indeed transcribed from genomic RNAs. Among them, S, E, M, and N RNAs are translated into each protein, respectively, forming a structure of virus particle (S: spike protein, E: envelope protein, M: membrane protein, and N: nucleocapsid protein).

It was previously known that 10 subgenomic RNAs make up the viral particle structure. However, the research team confirmed that 9 subgenomic RNAs actually exist, invalidating the remaining one subgenomic RNA. Researchers also found that there are dozens of unknown subgenomic RNAs, owing to RNA fusion and deletion events. “Though it requires further investigation, these molecular events may lead to the relatively rapid evolution of coronavirus. Moreover, we find multiple unknown chemical modifications on the viral RNAs. It is unclear yet what these modifications do, but a possibility is that they may assist the virus to avoid the attack from the host,” says Prof. Kim.

The research team suggests that modified RNAs may have new properties that are different from unmodified RNAs even though they have the same genetic information in terms of RNA base sequence. They believe if they figure out the unknown characteristics of RNA, the findings may offer a new clue for combatting the new coronavirus. Newly discovered chemical modification will also help to understand the life cycle of the virus.

SARS-CoV-2 genomic (gRNA) and subgenomic RNAs (S, 3a, E, M, 6, 7a, 7b, 8, and N)
and the location of RNA modifications

Figure 3 SARS-CoV-2 genomic (gRNA) and subgenomic RNAs (S, 3a, E, M, 6, 7a, 7b, 8, and N) and the location of RNA modifications. Modification levels are different between RNA transcripts, and the most frequent modification site is designated by red arrowhead.



Modification levels are different between RNA transcripts, and the most frequent modification site is designated by red arrowhead.

Behind the success of the study is the research team’s pairing of two complementary sequencing techniques; DNA nanoball sequencing and nanopore direct RNA sequencing. The nanopore direct RNA sequencing allows to directly analyze the entire long viral RNA without fragmentation. Conventional RNA sequencing methods usually require a step-by-step process of cutting and converting RNA to DNA before reading RNA. Meanwhile, the DNA nanoball sequencing can read only short fragments, but has the advantage of analyzing a large number of sequences with high accuracy. These two techniques turned out to be highly complementary to each other to analyze the viral RNAs.

“Now we have secured a high resolution gene map of the new coronavirus that guides us where to find each bit of genes on all of the total SARS-CoV-2 RNAs (transcriptome) and all modifications RNAs (epitranscriptome). It is time to explore the functions of the newly discovered genes and the mechanism underlying viral gene fusion. We also have to work on the RNA modifications to see if they play a role in virus replication and immune response. We firmly believe that our study will contribute to the development of diagnostics and therapeutics to combat the virus more effectively,” notes Professor KIM V. Narry.

 

Mind the Gapmer: Implications of Co-transcriptional Cleavage by Antisense Oligonucleotides

1-s2.0-S1097276520300800-gr1_lrg.jpgFigure 1. Strategic Use to Gapmer-Based Targeting to Effectively Discern lncRNA Functions at the Level of Local Transcriptional Activity Versus at the RNA Level

Gapmers (red combs) recognize nascent transcripts (blue lines) made by RNA polymerase II (green oval) and recruit RNase H1 (light blue pacman) to cleave the RNA. Targeting RNA regions made from most areas of the gene results in both a decrease in RNA levels and transcription termination on the gene (left portion of the figure). However, if the gapmer targets the 3′ portions of the gene (right side of the figure), RNA levels can be reduced without any discernable effect on transcription termination.

=======================================================

Since their first application over four decades ago (Stephenson and Zamecnik, 1978), antisense oligonucleotides (ASOs) have pioneered a feasible approach to reduce or modify RNA and protein expression. ASOs are short (15–20 bases), single-stranded nucleic acids that contain complementary sequences to RNA targets that can function in one of two ways. First, ASOs can be designed to prohibit the access of the splicing or translation machineries to the targeted transcript, thus altering or modifying protein expression. Alternatively, ASOs can contain complementary DNA regions that bind to the targeted mRNA or long non-coding RNA (lncRNA) and recruit RNase H1, an endonuclease that cleaves the RNA strand of RNA/DNA hybrids (Cerritelli and Crouch, 2009). This cleavage event makes the targeted RNA a substrate for highly processive cellular exoribonucleases and results in the degradation of the transcript. ASOs that are designed to degrade their targets in an RNase H1-dependent fashion are often designed as “gapmers,” which contain chemically modified RNA bases that flank both sides of a central 8- to 10-base DNA “gap” (Crooke et al., 1995). The DNA gap binds the targeted RNA and recruits RNase H1, while the flanking modified RNA bases enhance affinity and improve bioavailability of the oligonucleotide.

In several ways, ASOs can be a more attractive approach than using double-stranded RNAs (dsRNAs) and the RNA interference machinery to alter gene expression. Notably, single-stranded ASOs are less expensive than double-stranded RNAi mediators, and ASOs demonstrate gymnotic uptake by cells (movement across cellular membranes in the absence of lipid carriers). Several ASO therapeutics have already been approved by the Food and Drug Administration (FDA) to date, including fomivirsen for cytomegalovirus-induced retinitis, eteplirsen for Duchene muscular dystrophy, inotersen for familial amyloid polyneuropathy, and nusinersen for spinal muscular atrophy. More ASO-based therapeutics, including personalized ASOs to treat clinical syndromes caused by specific mutations, are under development. A recent example of how rapidly this field is moving is the action by the US Congress to compel the FDA to allow the use of a novel, untested ASO designed to treat Jaci Hermstad’s amyotrophic lateral sclerosis (ALS) inherited condition (Arnold, 2019).

Gapmer ASOs have been shown to function in both the nucleus and cytoplasm, making them an attractive reagent to determine the mechanism of action of nuclear lncRNAs. Several reports have previously shown that gapmers can target introns and chromatin-associated transcripts (Vickers et al., 2003, Ward et al., 2014). Thus, they have been used to discern whether lncRNAs are influencing gene expression via the transcript itself or simply by the process of transcription associated with producing the lncRNA. In this issue of Molecular Cell, complementary reports by Lee and Mendell (2020) and Lai et al. (2020) provide important new details for the mechanism of action associated with gapmer-induced transcript cleavage in the nucleus. These data provide insights into ASO design that are applicable to both lncRNA investigations as well as downstream therapeutic interventions.

The papers by Lee and Mendell (2020) and Lai et al. (2020) use gapmers directed against exons and introns of multiple pre-mRNA and lncRNA targets to make four main conclusions. First, gapmer-mediated RNase H1 cleavage clearly occurs co-transcriptionally on pre-mRNA and lncRNA on chromatin. Studies using nuclear run-on assays, pulse labeling to isolate nascent transcripts, and analysis of RNAs associated with chromatin fractions all demonstrate the propensity of gapmers to act on precursor RNAs early in their transcription by RNA polymerase II (Pol II). This confirms previous reports that indicated this could be the case (Liang et al., 2017). Second, gapmers targeting introns in nascent transcripts generally act more efficiently, in some cases much more efficiently, than gapmers that target exons. Results obtained by Lai et al. (2020) were particularly compelling for the Axtn10 gene in this regard. It is speculated that exons may often be inaccessible to gapmer ASOs due to steric hindrance of the splicing machinery. Third, both studies demonstrate that ASO-directed pre-mRNA and lncRNA cleavage is associated with the termination of Pol II transcription in an Xrn2-dependent fashion. These data confirm previous indications that Xrn2 and the “torpedo” model of transcription termination can be initiated by cleavage events in addition to those associated with polyadenylation (Fong et al., 2015). Finally, and perhaps most importantly, both studies demonstrate that gapmer ASOs targeted to the 3′ terminal regions of the pre-mRNA and lncRNA can knock down RNA expression without apparently affecting RNA Pol II association with the gene or transcription termination. This is presumably due to the natural speed of Pol II transcription completing its round on the gene before the gapmer and RNase H1 can assemble, cleave, and recruit Xrn2 to initiate its torpedo run. This finding indicates that the best—and perhaps only—way to use ASOs to differentiate lncRNA function at the RNA level versus transcription effects is to target terminal regions of the transcript to avoid confounding effects of the ASO on both transcription and RNA levels.

In summary, these two studies indicate where to best target pre-mRNAs and lncRNAs with ASOs to afford both efficient RNA knockdown along with avoiding confounding effects of the ASO on transcription of the gene (Figure 1). There are numerous other interesting nuances and prospects for future studies that are also worthwhile to point out. First, it will be instructional to determine why exons are innately harder to target with ASOs than introns. In addition to steric hindrance by the splicing machinery, the phenomenon could be due to differences in chromatin patterns associated with exons versus introns along the gene, higher densities of RNA binding proteins in exonic rather than intronic portions of a transcript, or inherent RNA structures. This information could help paint a more detailed picture of co-transcriptional RNA processing in the nucleus. Second, the study gives some indication that 5′-3′ and 3′-5′ exoribonuclease velocity, relative to Pol II elongation rates, plays a role in the efficiency of nuclear quality control of gene expression. Understanding this interplay, as above, would help develop a clearer picture of the interface between transcription and the eager ribonucleases waiting to pare and mold the ultimate outcome of RNA synthesis in the cell. Finally, Lee and Mendell (2020) outline an interesting “unrecognized experimental and therapeutic” opportunity that ASO-mediated transcription termination affords. We particularly encourage the reader to peruse their idea for a simple therapeutic designed to combat the clinical effect of mutations that act by promoting transcriptional read-through into neighboring genes. Given the applicability of ASOs at both the bench and the bedside, the more ‘”sense” that we make of these antisense reagents, the better.

石正丽课题组发现武汉病毒与蝙蝠病毒的进化相似性:武汉肺炎病毒的源头?

按:武汉病毒的传播有些出乎意料。

问题一:病毒来自何方?根据目前的共识,蝙蝠是病毒的宿主;但是自然传播必须要经过一个中间宿主,才能传播到人;一些野生动物例如果子狸,可以感染冠状病毒。那么问题来了:携带病毒的蝙蝠来自哪里?这个蝙蝠在哪里和中间宿主邂逅?

第一种可能性是抓捕和处理冠状病毒的商贩,在处理过程中被感染病毒;或者华南海鲜市场的动物在野外接触了携带冠状病毒的蝙蝠,在华南市场造成传播。但是问题在于:秋冬季节武汉很难找到蝙蝠,也很难抓到蝙蝠。那么蝙蝠是从外地来的吗?

第二种可能性是外地的野生动物,在外地感染了蝙蝠携带的冠状病毒,被商贩携带进入华南海鲜市场。那么这个传播范围更广了。但是目前线索来看,明显感染启动于武汉。

第三种可能是实验室病毒泄露。2013年就已经报道了石研究组和美国团队合作,利用反向遗传学包装出了类SARS病毒。有没有可能是石团队在全国各地抓捕的蝙蝠里筛选出在人类细胞最有感染性的病毒株,并且在实验实验动物上获得适应株;因为实验动物没有管理好,而导致病毒外泄?这种可能性在于:全国不是只有武汉才有吃蝙蝠的,也不是只有武汉才有野生动物市场;那么武汉为什么成为疫情的源头?让人产生联想的是武汉有两个病毒研究机构,一个是中科院武汉病毒所,一个是武汉大学病毒研究实验室,而且他们拥有BSL-4实验室,还做过冠状病毒的反向遗传,还专门四处抓蝙蝠分离病毒,不能不让人联想。不可能性在于,如果是武汉的实验室泄露,为何他们实验室人员没有感染的?

无论如何,石教授这篇文章,可以作为新发病毒鉴定的一个范本。

===============================================================

-在过去的二十年中,冠状病毒已引起两次大规模疫情:严重急性呼吸综合征(SARS)和中东呼吸综合征(MERS)。一般认为,主要在蝙蝠中发现的SARS 相关冠状病毒(SARSr-CoV)可能会导致未来疫情暴发。

在一项新的研究中,来自中国科学院武汉病毒研究所、武汉金银潭医院和湖北省疾病预防控制中心的研究人员报道了位于中国中部的湖北省武汉市发生了一系列病因不明的肺炎疫情。从当地的一家海鲜市场开始,到2020年1月26日为止,疫情已蔓延至中国有2050人感染,其中56人死亡,其他11个国家有35人感染。相关研究结果于2020年2月3日在线发表在Nature期刊上,论文标题为“A pneumonia outbreak associated with a new coronavirus of probable bat origin”。重要的是,Nature期刊在2020年1月20年收到这篇论文的手稿,1月29日就接受了这篇论文,并以“加快评审文章(Accelerated Article Preview)”的形式在线发表了这篇论文。论文通讯作者为中国科学院武汉病毒研究所石正丽(Zheng-Li Shi)研究员。

这些患者的典型临床症状是发烧、干咳、呼吸困难、头痛和肺炎。疾病发作后可因肺泡损伤导致进行性呼吸衰竭(如横向胸部CT图像所观察到的那样),甚至死亡。根据临床症状和其他标准,包括临床体温升高,淋巴细胞和白细胞减少(有时白细胞正常),胸部X光片上出现新的肺部浸润,三天抗生素治疗无明显好转,临床医师将这种疾病确定为病毒性肺炎。大多数早期病例似乎都与最初的那家海鲜市场有接触史,但是如今这种疾病已发展为人与人之间的传播。

在疫情开始时就进入了重症监护病房(ICU)的7名重症肺炎患者(其中有6名是海鲜市场销售者或送货者)的样本被送至中国科学院武汉病毒研究所(WIV)实验室进行病原体诊断。考虑这次疫情发生的环境与SARS相同,即在冬季和在一家海鲜市场里,石正丽及其课题组在冠状病毒(CoV)实验室中首先使用泛冠状病毒PCR引物来测试这些样本。他们发现了5个PCR阳性样本。通过使用下一代测序(NGS)对从支气管肺泡灌洗液(BALF)中收集的样本(WIV04)进行宏基因组分析以鉴定潜在的病原体。

在总共10038758个读取片段(read),或者说人类基因组过滤后的总共1582个读取片段中,有1378个读取片段与SARSr-CoV序列相匹配(图1a)。通过从头组装和靶向PCR,他们获得了一个大小29891bp的冠状病毒基因组,它与SARS-CoV BJ01(GenBank登录号AY278488.2)具有79.5%的序列一致性(sequence identity)。将这些1582个读取片段与所获得的基因组进行重新映射可取得较高的基因组覆盖。这个基因组序列已被提交GISAID网站(登录号EPI_ISL_402124)。根据世界卫生组织(WHO)的名称,他们暂时将它称为新型冠状病毒2019(2019-nCoV)。随后从其他四名患者中使用下一代测序和PCR获得了另外四个2019-nCoV全长基因组序列(WIV02,WIV05,WIV06和WIV07)(GISAID登录号EPI_ISL_402127-402130),彼此之间的一致性高于99.9%。

20200204133210128.png

图1.2019-nCoV的基因组特征,图片来自Nature, 2020, doi:10.1038/s41586-020-2012-7。
2019-nCoV基因组由冠状病毒共有的6个主要的开放阅读框(ORF)和一些其他的附属基因组成(图1b)。进一步的分析表明,一些2019-nCoV基因与SARS-CoV在核苷酸序列上的一致性低于80%。然而,用于冠状病毒物种分类的开放阅读框ORF1ab中的七个保守性复制酶结构域在2019-nCoV和SARS-CoV之间具有94.6%的氨基酸序列一致性,这意味着这两者属于同一病毒物种。

他们随后从蝙蝠冠状病毒BatCoV RaTG13中发现了一个短的RdRp区域,这个区域之前在云南省的中华菊头蝠(Rhinolophus affinis)中检测到,它与2019-nCoV具有高度的序列一致性。他们对这种RNA病毒样本(GISAID登录号EPI_ISL_402131)进行全长测序。Simplot分析显示,2019-nCoV在整个基因组中与RaTG13非常相似(图1c),全基因组序列一致性为96.2%。

通过使用2019-nCoV、RaTG13、SARS-CoV和先前报道的蝙蝠SARSr-CoV的比对基因组序列,在2019-nCoV基因组中未检测到重组事件发生的证据。对全长基因组、RNA依赖性RNA聚合酶(RdRp)基因和S基因序列的系统进化树分析均显示RaTG13与2019-nCoV存在最密切的亲缘关系,但与其他SARSr-CoV形成不同的谱系(图1d)。2019-nCoV的编码受体结合蛋白—刺突蛋白(S)—的基因除了与RaTG13的S基因具有93.1%的核酸序列一致性外,与其他冠状病毒高度不同,与所有先前描述的SARSr-CoV的核苷酸序列同一性低于75%。2019-nCoV的S基因和RaTG13的S基因比其他SARSr-CoV要长。与SARS-CoV相比,2019-nCoV的S蛋白的主要区别是N末端结构域中的三个短插入序列和受体结合基序中的5个关键氨基酸残基有4个发生了变化。2019-nCoV的S蛋白在N末端结构域的插入序列是否具有像MERS-CoV那样的唾液酸结合活性需要进一步研究。2019-nCo与RaTG13存在密切的系统进化关系为2019-nCoV起源于蝙蝠提供了证据。

他们基于S基因的受体结合结构域(不同冠状病毒基因组中变化最大的区域)快速开发了一种qPCR检测方法(图1c)。他们的数据显示,针对这种检测方法设计的引物可以将2019-nCoV与所有其他人类冠状病毒(包括与SARS-CoV存在95%一致性的蝙蝠SARSr-CoV WIV1)区分开。在这7例患者中,他们在针对qPCR和常规PCR测试的首次采样期间,在6个BALF样本和5个口腔拭子样本中检测到2019-nCoV阳性。但是,在第二次采样期间,他们在来自这些患者的口腔拭子、肛门拭子和血液中不再检测到2019-nCoV阳性(图2a)。他们必须指出,包括RdRp或E基因在内的其他qPCR靶标可能用于常规检测。基于这些发现,他们认为这种疾病应当通过呼吸道传播,但是如果将研究扩大到更多的患者,他们不能排除其他的传播可能性。

20200204135334297.png

图2.对患者样本进行分子和血清学研究,图片来自Nature, 2020, doi:10.1038/s41586-020-2012-7。
为了对2019-nCoV进行血清学检测,他们使用了先前开发的蝙蝠SARSr-CoV Rp3核衣壳蛋白(NP)作为IgG和IgM ELISA测试中的抗原,这种核衣壳蛋白与2019-nCoV的核衣壳蛋白具有92%的氨基酸一致性,结果表明与除了SARSr-CoV之外的其他人类冠状病毒不存在交叉反应。作为研究实验室,他们只能从这7名病毒感染患者中获得了5个血清样本。他们在疾病发作后的第7、8、9和18天监测了其中的一名患者(ICU-06)的病毒抗体水平,结果观察到明显的IgG和IgM抗体滴度增加趋势(在最后一天下降)(图2b)。在第二项实验中,他们在疾病发作后约20天左右对这7例病毒阳性患者中的5例进行了病毒抗体检测。所有患者样本而不是健康人样本,均显示较强的病毒IgG阳性(图2b)。他们还发现了三个IgM阳性样本,这表明是急性感染。

他们随后使用了来自ICU-06患者的BALF样本在Vero细胞和Huh7细胞中成功分离出了这种病毒(名为2019-nCoV BetaCoV/Wuhan/WIV04/2019,下称毒株WIV04)。在培养三天后,在细胞中观察到明显的致细胞病变作用。通过使用交叉反应性病毒核衣壳蛋白抗体进行免疫荧光显微镜检查、通过宏基因组测序表明它的大多数读取序列可映射到2019-nCoV基因组以及qPCR测试表明病毒载量从第1天到第3天发生增加,毒株WIV04的身份在Vero E6细胞中得到了验证。

在电子显微镜下,受感染细胞的超薄切片中的病毒颗粒显示出典型的冠状病毒形态。为了进一步确认病毒IgG阳性样本的中和活性,他们使用5个IgG阳性患者血清在Vero E6细胞中进行了血清中和测定。他们证实所有血清样本均能够以1:40~1:80的稀释度中和120 TCID50 2019-nCoV。他们还发现,这种病毒可以被马抗SARS-CoV血清在1:80的稀释度下交叉中和,但与SARS-CoV抗体交叉反应的潜力需要通过人抗SARS-CoV血清加以验证。

血管紧张素转化酶II(ACE2)被认为SARS-CoV的细胞受体。为了确定2019-nCoV是否也将ACE2作为细胞进入受体,他们使用表达或不表达人类、中华菊头蝠、果子狸、猪和小鼠的ACE2蛋白的HeLa细胞进行了病毒感染性研究。他们发现2019-nCoV能够使用除小鼠ACE2以外的所有其他物种的ACE2蛋白作为表达ACE2的细胞中的进入受体,但在不表达ACE2的细胞中不会如此,这表明它很可能是2019-nCoV的细胞受体(图3)。他们还证实2019-nCoV不使用其他的冠状病毒受体:氨基肽酶和二肽基肽酶4。

20200204140532668.png

图3.对2019-nCoV受体使用进行分析,图片来自Nature, 2020, doi:10.1038/s41586-020-2012-7。
这项研究提供了关于2019-nCoV的第一份详细报道,其中2019-nCoV是造成中国中部武汉市正在发生的急性呼吸道综合征疫情的可能病因。在所有测试的患者中观察到的病毒特异性核苷酸阳性和病毒蛋白血清转化提供了这种疾病与这种病毒的存在之间存在关联性的证据。但是,仍然有许多紧急问题需要解决。尚未通过动物实验来证实2019-nCoV与这种疾病之间的关联性以充分符合科赫法则(Koch’s Postulates)。他们还不知道这种病毒在宿主之间的传播途径。这种病毒似乎在人与人之间传播的可能性越来越大了。人们应当密切监视这种病毒是否继续演变成更强的毒性。由于缺乏特异性治疗,并考虑到SARS-CoV与2019-nCoV之间的亲缘性,一些针对SARS-CoV的药物和临床前疫苗可能可以用于抵抗这种病毒。最后,考虑到SARSr-CoV在它们的天然病毒库中的广泛传播,未来的研究应当集中在更广泛的地理区域对它们进行主动监视。从长远来看,应当为这类病毒引起的未来新兴传染病准备广谱抗病毒药物和疫苗。最重要的是,应对野生动物的驯养和消费制定严格的法规。(生物谷 Bioon.com)

参考资料:

Peng Zhou et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 2020, doi:10.1038/s41586-020-2012-7.

Nature展望2020年的新技术

1,清华大学的结构生物学家王宏伟:“更好的冷冻电镜制样方法”
在两到三年内,冷冻电镜(cryo-EM)将成为破译大分子结构最强大的工具。这些结构对于理解生化机制和药物开发是至关重要的,而更有效地解析结构可以加速这些工作的进展。
使用冷冻电镜的时候,在液氮中快速冷冻生物样本有助于保持水分,并减少高能电子束对样品造成的损害。但是,样品的准备工作是冷冻电镜发展的一个瓶颈:如果你没有一个好的样品,你就不能获取有用的信息。生物样本通常为蛋白质,在冷冻过程中,蛋白质会在薄薄的液体表面分解开。为了防止这种蛋白质的解折叠,研究人员正在开发一种方法。在滴上液体之前将蛋白质固定在二维材料上,比如碳晶格石墨烯(carbon lattice graphene)。这样,他们可以使用更小的液滴,同时保持蛋白质远离空气和水的分界面【1】。
用冷冻电镜解析结构通常需要收集和分析多达10000张图像,这意味着需要几周到一个月的工作时间。抛弃许多有缺陷的图像,理论上几十张照片就够了。而且收集和分析这些照片只需要不到一天的时间。这种通量的提高可以帮助我们更有效地了解疾病的机制并开发相对应的药物。

2,约翰霍普金斯大学的生物物理学家Sarah Woodson:“RNA分析技术的提高”
长读长RNA测序(long-read RNA sequencing)和使用适配体进行活细胞成像的技术仍在走向成熟,但预计在未来一两年内会有重大突破。
短读长的测序已经改变了RNA生物学,例如,它可以告诉你哪些RNA序列经过了修饰。然而,长读长的测序(例如Oxford Nanopore和Pacific Biosciences公司提供的测序技术)可以帮助我们确定某一特定修饰在细胞内的丰度,以及RNA某一部分的变化是否与另一部分的变化相关联。
适配体是单链的DNA或RNA分子,可以与荧光染料结合。当这些适配体与染料结合时,它们的荧光强度会增加。适配体应用广泛,比如,研究人员能够用它们追踪细胞内RNA簇的形成。
许多疾病都与RNA结构的改变有关,但这很难研究。现在,使用长读长测序和适配体可以研究包括肿瘤和阿尔茨海默氏症在内的疾病中RNA和蛋白质的聚集。使用这些技术,我们可以更好地将疾病的特征与细胞中RNA分子的变化联系起来。

3,特拉维夫大学计算生物学家Elhanan Borenstein:“解读微生物组”
过去的十年里,人们通过对微生物群落的遗传物质进行测序来探索人类相关微生物组。最近,科学家们试图整合有关基因、转录产物、蛋白质和代谢物的信息来了解微生物的作用。对代谢物的研究可以告诉我们微生物如何影响我们的健康,因为许多时候宿主和微生物之间是通过产生和消耗的代谢物来相互作用的。
微生物代谢组(例如一组粪便样本的代谢组学研究)经过了爆炸式的发展。宏基因组测序可以确定每个样本中存在的物种及其丰度,质谱和其他技术可以测量不同代谢物的浓度。通过结合这两个技术,科学家希望了解微生物的哪一个成员在做什么,从而了解特定的微生物是否决定了某些代谢物的水平。
但这些数据是复杂和多维的,可能有一个完整的相互作用网络,涉及多种物种和途径,最终产生一组代谢物。有些计算方法可以将微生物组和代谢组数据联系起来,并学习分析其中特定的模式。这些方法既有简单的、基于相关性的分析,也有复杂的、使用现有的微生物组和代谢组数据集来进行预测的机器学习方法。
Elhanan的实验室采取了不同的策略。该实验室并没有用统计方法来发现微生物和代谢物之间的联系,而是建立了特定微生物组成如何影响代谢物的机制模型,并将其作为分析的一部分。实际上,根据基因组和代谢信息,我们对每种微生物产生或吸收特定代谢产物的能力了解多少?我们可以预测特定微生物集合产生或降解特定代谢物的潜力,并将这些预测与实际代谢数据进行比较。
这类研究可以改进基于微生物组的疾病治疗手段。比如,通过这种方法,我们可以识别产生过多有害代谢物或较少有益代谢物的微生物。

4,斯坦福大学计算和系统生物学家Christina Curtis:“肿瘤可以被‘计算’”
我们看不到肿瘤形成的过程,只能看到它的结果——临床上可以检测到肿瘤的时候,许多突变已经发生了。通过建立计算模型,科学家们在考虑组织空间结构的同时,研究肿瘤发展的动力学。通过这个模型,一系列的场景都可以被模拟,也可以用模拟病人数据的突变模式生成“虚拟肿瘤”。通过比较模拟数据和实际的基因组数据,我们有可能推断出哪些参数可能导致肿瘤。
在一项模拟结肠癌肿瘤生长的研究中,研究人员使用肿瘤的序列数据和计算机模拟研究了原发性肿瘤和转移性肿瘤之间的关系【2】。分析表明,绝大多数肿瘤在原发肿瘤只有10万个细胞的情况下就已经扩散。这些细胞数量太少,无法用标准的诊断方法如结肠镜进行检测。
由于具有更高的敏感性和可扩展性,一些建模方法可以跟踪肿瘤形成过程中的谱系和空间关系,从而告诉我们肿瘤的起源,包括特定的突变如何影响细胞的稳态并推动疾病的发展。

5,加州大学戴维斯分校的遗传学家Alex Nord:“优化基因疗法”
针对增强子和其他控制基因表达的DNA调控元件的研究已经进行了大约15年。尽管完成这些研究还需要更多的工作,但我们现在已经能够基于对基因组的认识来更精确地控制基因组了。
一旦鉴定出增强子序列,科学家们就可以利用它们来驱动特定细胞类型的基因治疗。一些疾病的发生是由于基因的一个拷贝失活或缺失,CRISPR-Cas9基因编辑工具可以将转录激活因子定位到另一个基因拷贝的增强子上,从而驱动表达。在小鼠身上的研究表明,这些方法可以纠正基因缺陷导致的肥胖以及其他疾病,诸如脆性X染色体综合征(Fragile X syndrome)和雷特综合症(Rett Syndrome)等【3】。未来的一年里,研究还只会停留在小鼠模型阶段,但是这项技术已经吸引了越来越多的投资。希望随着技术发展,人类基因治疗会取得突破。

6,麻省理工学院肿瘤研究所的化学工程师J. Christopher Love:“单细胞测序技术”
我们如何更快更方便地给病人提供药物?所需的技术是多方面的。一方面,要有新的研究发现,例如单细胞测序技术。另一方面,还要把技术带给病人,也就是工业生产的部分。
在研究发现方面,麻省理工学院的科学家开发了一个便携式、廉价的平台,用于高通量单细胞RNA测序【4】。但要获得足够的分辨率来区分免疫细胞亚型(例如具有不同抗原特异性的亚型)仍然是一个挑战。单细胞RNA测序平台的商业化正在推进。样本制作时无需用离心机收集细胞,只需要把它们放在试管里,冷冻在液氮中然后运过来。也许你只需要运送一个USB驱动器大小的样本过来。这可能使在世界任何地方、对任何样本进行单细胞存储和基因组分析成为可能。

7,宾夕法尼亚大学的表观遗传学家和生物工程师Jennifer Phillips-Cremins:“把基因组的结构和功能联系在一起”
当你把一个细胞的DNA伸展开来,它大约有2米长。DNA必须与直径小于针头的细胞核相匹配,所以DNA的折叠不能是随机的。染色体形成的三维结构在空间和时间上都对生物体的寿命进行着调节。
过去的十年,随着基因组学和成像技术的进步,我们可以描绘超高分辨率的基因组折叠图景。现在最大的问题是,这些折叠模式的功能是什么?它们如何控制基因表达、DNA复制和DNA修复等基本过程?
几种合成生物学的方法可以让我们在空间和时间尺度上探测折叠的基因组。一种方法可以把DNA片段携带到细胞核特定的位置,这将使科学家能够了解DNA序列在细胞核的位置是如何控制基因功能的【5】。另一个工具,用光激活的CRISPR–Cas9将特定的DNA片段捆绑在一起,片段之间甚至可以跨越很长的距离【6】。这可以使增强子直接与成千上万甚至数百万碱基接触,因此我们可以直接评估调控序列的功能。第三个系统叫CasDrop,它使用另一个光激活的CRISPR-Cas9系统将特定的DNA片段拉入一种无膜结构中【7】。
未来,我们可以将这些三维基因组工具与基于CRISPR的活细胞成像方法结合起来,这样我们就可以在细胞中实时地设计和观察基因组。功能可以决定结构,还是结构可以决定功能?这一直是一个谜,而这些工具将帮助我们回答这个问题。

原文链接 https://www.nature.com/articles/d41586-020-00114-4

参考文献

1. Liu, N. et al. J. Am. Chem. Soc. 141, 4016–4025 (2019).2. Hu, Z. et al. Nature Genet. 51, 1113–1122 (2019).

3. Colasante, G. et al. Mol. Ther. https://doi.org/10.1016/j.ymthe.2019.08.018 (2019).

4. Gierahn, T. M. et al. Nature Meth. 14, 395–398 (2017).

5. Wang, H. et al. Cell 175, 1405–1417 (2018).

6. Kim, J. H. et al. Nature Meth. 16, 633–639 (2019).

7. Chin, Y. et al. Cell 175, 1481–1491 (2018).

基于反转录原理的基因编辑技术

Figure 1. Schematic Illustration of Prime Editing, as Proposed by Anzalone et al. (2019)

1-s2.0-S1097276519309293-gr1_lrg.jpg(A) The prime editor (PE) composed of a Cas9-H840A fused to a reverse transcriptase (RT) and pegRNA bind to target DNA.
(B) The nuclease domain of the editor nicks one DNA strand.
(C) The nicked strand binds to the primer binding site on the extended 3′ end of the pegRNA.
(D) The RT elongates the nicked DNA strand (incorporating the edit).
(E) The elongated strand competes for binding to the target DNA.
(F) A desired edit is installed after DNA repair of the heteroduplex DNA.

In recent years, powerful gene editing approaches have enabled broad interrogation of genomes and presented new avenues for human gene therapy (Jasin and Haber, 2016, Yeh et al., 2019, Rees and Liu, 2018). These technologies all fundamentally work through a similar process. First, using programmable enzymes, site-specific DNA damage is introduced into host cell genomes. This stimulates cellular DNA repair mechanisms, which then install permanent sequence alterations at targeted loci, with different repair processes enabling different types of “edits” (e.g., insertions, deletions, point mutations). The intricate organization and tight regulation of cellular DNA damage response mechanisms, however, has made controlling this process difficult. Perfect genome editing—achieving any desired change without collateral, undesired effects—has therefore remained an elusive goal. Writing in Nature in October, researchers from David Liu’s lab at the Broad Institute of Harvard and MIT reported an exciting new approach: prime editing (Anzalone et al., 2019).

Prime editing represents a significant departure from previous technologies, because unlike other implementations, this method uses exogenous reverse transcriptase activity to “write” DNA edits directly into genomic DNA (Figure 1). For this, prime editing uses two components: an RNA-programmable nickase (Streptococcus pyogenes Cas9-H840A) fused to a reverse transcriptase (e.g., engineered M-MLV RT) and a prime editing guide RNA (pegRNA) that specifies both the genomic target and edit sequence. Together, these components form a single “prime editor” that, when expressed in cells, copies pegRNA-encoded sequences into genomic DNA at pegRNA-specific, editor-induced DNA nicks. As Anzalone et al. (2019) demonstrated, this unique design gives prime editing a promising combination of capabilities, including the exciting ability to generate precise edits of different types within a majority of edited cells.

To demonstrate their method, Anzalone et al. (2019) compared prime editing with two commonly used precision techniques: editing by double-strand break (DSB)-induced homology-directed repair (HDR) (Rouet et al., 1994) and base editing (Komor et al., 2016). HDR, when paired with engineered DNA donor templates, can be used to make precise DNA sequence changes (Jasin and Haber, 2016, Yeh et al., 2019). However, HDR is inefficient in non-dividing cells, and DSB-induced editing produces highly heterogeneous mixtures of on-target mutations containing mostly insertions and deletions (indels). Base editing, on the other hand, harnesses other endogenous mechanisms of DNA repair to install single point mutations without generating an excess of undesired indels (Rees and Liu, 2018). Established base editors, however, can make only a subset of possible edits (C→T, G→A, A→G, and T→C) and are limited to preexisting “windows” of genomic sequence. By contrast, Anzalone et al. (2019) showed that prime editing can precisely install all 12 possible base-to-base conversions and small insertions and deletions, both inside and outside of predicted base editing windows, with few byproducts at the targeted locus or at predicted off-target sites. This extraordinary set of features now paves the way for many applications, including those that require both highly precise and flexible editing capabilities (e.g., high-throughput interrogation of point mutations or therapeutic correction of diverse pathogenic alleles).

Given these advantages, prime editing holds remarkable promise, but the technology is unarguably still in its infancy and additional studies are therefore needed to fully realize its potential. Editing efficiency, for example, remains a key challenge. To optimize efficiency, Anzalone et al. (2019) used a second DNA nick, made on the unedited DNA strand (bottom strand in Figure 1), to bias repair of presumed editing intermediate structures (heteroduplex DNA) toward fixation of the edit. Yet even with this “PE3” system, Anzalone et al. (2019) achieved editing rates of ∼20%–50% in only one cell line (HEK293T), with editing in other cells (K562, HeLa, U2OS, and primary cortical neurons) typically lower (e.g., 7.1% in mouse cortical neurons).

Optimistically, unexplored technical considerations may yet improve prime editing efficiency. For their part, Anzalone et al. (2019) presented general recommendations for designing prime editing experiments (e.g., primer binding site and RT template lengths), but due to limited target site sampling (12 endogenous loci) and high variability of design criteria, they did not comprehensively elucidate “rules” for prime editing. Future analysis of editing parameters, including those not yet addressed (e.g., pegRNA stability, local sequence context, and chromosomal location of genomic targets), may therefore reveal strategies for the optimal use of this approach.

Altogether, prime editing presents many questions, including one of the most fundamental: How does it work? Anzalone et al. (2019) presented a compelling model, supported by evidence from both test tubes and cell culture, that editing proceeds through (1) induction of targeted nicks, (2) reverse transcription of pegRNA-encoded RT templates, and (3) fixation of edits by DNA repair (Figure 1). However, mechanistic details of many of the key steps in this process are not yet fully understood. How does the edit, which first exists as a single-stranded DNA flap, get incorporated into the genome? How does a second nick in the complementary DNA strand favor that process? What happens when those nicks are converted to DSBs (as would be expected at some frequency)? And how do different cell states and cell types affect each of these steps? Answers to these questions will undoubtedly aid prime editing efforts.

Beyond understanding how prime editing works, we must also understand what happens when prime editing does not work as intended. Indeed, unwanted effects, including off-target editing of DNA and RNA, have been a challenge of previous editing approaches (Fu et al., 2013, Tsai et al., 2015, Kim et al., 2019, Jin et al., 2019, Zuo et al., 2019). Given this, a primary focus of future prime editing development must include careful analysis of unintended effects, such as genome-scale examination of off-target editing and a deeper understanding of cellular impact (e.g., induced stress responses and adverse growth effects across cell types).

Ultimately, the final test of prime editing will be its future application(s). Given the excitement over this development, we are sure to see rapid and innovative use of prime editing in the research community. Utility for therapeutic applications, on the other hand, is likely to take years to evaluate, and in this area, prime editing has particular challenges. Although Anzalone et al. (2019) demonstrated that prime editing works in one primary cell type (mouse cortical neurons), efficiency was low and cell-specific effects remain entirely unknown. The large size of the editor fusion protein (∼240 kDa with Streptococcus pyogenes Cas9-H840A and M-MLV RT) presents practical problems for delivery, and clinical effects are impossible to predict (e.g., interaction with the immune system). Thus, despite prime editing’s enormous promise, only time will determine the scope of its impact.

REFERENCES
Anzalone, A.V., Randolph, P.B., Davis, J.R., Sousa, A.A., Koblan, L.W., Levy, J.M., Chen, P.J., Wilson, C., Newby, G.A., Raguram, A., and Liu, D.R. (2019). Search-and-replace genome editing without dou- ble-strand breaks or donor DNA. Nature 576, 149–157.
Fu, Y., Foden, J.A., Khayter, C., Maeder, M.L., Reyon, D., Joung, J.K., and Sander, J.D. (2013). High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 31, 822–826.
Jasin, M., and Haber, J.E. (2016). The democrati- zation of gene editing: Insights from site-specific cleavage and double-strand break repair. DNA Repair (Amst.) 44, 6–16.
Jin, S., Zong, Y., Gao, Q., Zhu, Z., Wang, Y., Qin, P., Liang, C., Wang, D., Qiu, J.L., Zhang, F., and Gao, C. (2019). Cytosine, but not adenine, base ed- itors induce genome-wide off-target mutations in rice. Science 364, 292–295.
Kim, D., Kim, D.E., Lee, G., Cho, S.I., and Kim, J.S. (2019). Genome-wide target specificity of CRISPR RNA-guided adenine base editors. Nat. Biotechnol. 37, 430–435.
Komor, A.C., Kim, Y.B., Packer, M.S., Zuris, J.A., and Liu, D.R. (2016). Programmable editing of a target base in genomic DNA without double- stranded DNA cleavage. Nature 533, 420–424.
Rees, H.A., and Liu, D.R. (2018). Base editing: precision chemistry on the genome and tran- scriptome of living cells. Nat. Rev. Genet. 19, 770–788.
Rouet, P., Smih, F., and Jasin, M. (1994). Introduction of double-strand breaks into the genome of mouse cells by expression of a rare- cutting endonuclease. Mol. Cell. Biol. 14, 8096–8106.
Tsai, S.Q., Zheng, Z., Nguyen, N.T., Liebers, M., Topkar, V.V., Thapar, V., Wyvekens, N., Khayter, C., Iafrate, A.J., Le, L.P., et al. (2015). GUIDE-seq
enables genome-wide profiling of off-target cleav- age by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187–197.
Yeh, C.D., Richardson, C.D., and Corn, J.E. (2019). Advances in genome editing through control of DNA repair pathways. Nat. Cell Biol. 21, 1468–1478.
Zuo, E., Sun, Y., Wei, W., Yuan, T., Ying, W., Sun, H., Yuan, L., Steinmetz, L.M., Li, Y., and Yang, H. (2019). Cytosine base editor gener- ates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289–292.

 

 

靶向中枢神经系统的AAV衣壳修饰策略

Xnip2019-12-29_22-30-29.png

1 Introduction
The first recombinant adeno-associated virus (AAV) vectors were generated using both the protein capsid and the inverted terminal repeat (ITR) DNA sequence of AAV serotype 2 [1–4]. AAV2 vectors remain widely used today, and nearly all recombinant AAV genomes carry the AAV2 ITR sequence. However, a large number of alternative capsid variants have been identified from humans, baboons, chimpanzees, and rhesus, pigtailed, and cynomolgus macaques [5, 6]. These alternative capsids often have distinct tropism and antigenic profiles, although many have yet to be thoroughly studied. Traditionally, capsid variants are categorized according to their serotype. Each serotype is defined as an antigenically distinct viral capsid, as determined by serum cross- neutralization. In addition, several strategies can be used to alter the tropism of the AAV capsid, including chemical modification of the virus capsid, production of hybrid capsids, peptide insertion, capsid shuffling, directed evolution, and rational mutagenesis. The large array of capsid variants generated by these strategies, along with techniques that can enhance or alter their native tropisms, provides a rapidly expanding toolkit for gene transfer to the central nervous system (CNS). However, the number of options can be overwhelming, making it difficult to select the appropriate AAV vector for a specific application.

The recombinant AAV vectors described in this chapter can all be prepared using the same technique, described in Chapter 7, with the unique capsid sequence provided in trans during production. The basic T = 1 icosahedral architecture of the viral capsid does not differ among these serotypes and engineered vectors, although the proteins encapsidating the recombinant DNA are slightly different, resulting in limited structural changes. For many AAV serotypes cellular surface receptors or binding determinants have been identified, including sialic acid for AAVs 1, 4, 5, and 6 [7, 8], heparan sulfate proteoglycan (HSPG) for AAV2 [9], the laminin receptor for AAV8 [10], and galactose for AAV9 [11, 12]. In addition, human fibroblast growth factor receptor 1 and alphaV-beta5 integrin have both been proposed as co-receptors for AAV2 [13, 14], as has platelet-derived growth factor receptor for AAV5 [15]. These differences in receptor binding among capsid serotypes contribute to differences in tropism within the brain and other tissues. However, while differences in receptor affinity can drive variability among AAV serotypes, most, if not all, AAVs demonstrate broad tropism without absolute specificity, in part due to the wide presence of AAV receptors throughout the body. Different AAV variants can, however, differ in absolute levels of gene transfer to a specific tissue, as well as in their relative transduction strength among multiple tissues.

Several techniques can be used to generate novel AAV capsids with unique, targeted tropism. Chemical modification of the viral capsid with receptor-binding moieties can confer enhanced tropism, and chemical masking of native receptor-binding moieties can alter the normal tropism of AAV and shield the capsid from neutralizing antibodies. Hybrid capsids can be generated by co- expressing cap genes from different serotypes during production, combining the unique properties of both parental serotypes. Peptide insertion of novel receptor-binding elements on the capsid surface can alter the native tropism of AAV, and insertion of fluorescent proteins can be used to tag vector particles. Capsid shuffling and directed evolution can be used to create and screen a library of unique capsid variants for a desired trait, such as tropism for a specific cell type. Finally, rational modification of the viral capsid via site-directed mutagenesis can alter tropism, confer evasion of neutralizing antibodies, and increase transduction efficiency.

In this chapter, we describe the differing tropisms of AAV serotypes in the CNS and retina, the various factors that can influence AAV tropism, the techniques which can be used to alter the tropism of the vector, and the engineered variants that have been developed for use in the nervous system. This will provide an in-depth guide for selecting the optimal capsidserotype or engineered variant for specific experimental or therapeutic applications in the CNS.

2 Selection of the Capsid Serotype
Nervous cell tropism varies among AAVcapsid serotypes. In primary cultures of rat nervous cells, AAV5 appears to possess a strong glial tropism, and gene expression rarely colocalizes with the neuronal marker NeuN [16]. AAV serotypes 1, 2, 6, 7, 8, and 9 transduce both neurons and astrocytes in primary culture [16, 17]. AAVs 1, 6, and 7 appear to have the strongest neuronal tropism in vitro, with 75 % or more of transduced cells representing neurons [17]. AAV9, however, has relatively weak neuronal tropism in vitro, with less than 50 % of transduced cells representing neurons [17]. AAV5 is therefore recommended for transduction of cultured astrocytes, and AAVs 1, 6, and 7 are recommended for transduction of cultured neurons.

Following intraparenchymal brain injection, AAVs 1, 2, 5, 7, 8, 9, and rh.10 all exhibit strong neuronal tropism, as gene expression rarely colocalizes with markers of astrocytes or oligodendrocytes [18–21]. However, others have observed astroglial transduction with AAVs 1, 2, 5, 6, and 8 [22–24], and AAV8 has also been observed to transduce oligodendrocytes within the cortex [24]. AAV4 possesses strong glial tropism in vivo, and primarily drives gene expression within glial fibrillary acidic protein (GFAP)-positive astrocytes [25]. In addition, AAVrh.43 appears to possess stronger glial tropism in vivo than AAV8 [26]. Thus, while most AAV serotypes exhibit strong neuronal tropism following direct intraparenchymal brain injection, glial transduction has been observed in some cases, and AAVs 4 and rh.43 appear to possess stronger astroglial tropism than most AAV serotypes.

While most AAV serotypes appear to preferentially transduce neurons within the brain, the relative strength of neuronal transduction varies greatly. When compared against other serotypes, AAVs 2 and 4 typically mediate weaker and less widespread neuronal gene expression [19, 22, 27–31]. Thus, AAVs 2 and 4 are not recommended for widespread transduction of neurons. AAV2 diffuses less readily through both the brain and spinal cord parenchyma when compared against other serotypes, and therefore mediates transduction over a smaller area [19, 28, 32, 33]. This property can be harnessed for the targeting of small nuclei. The strongest and most widespread neuronal transduction is observed with AAV serotypes 1, 9, and rh.10 [19, 21, 23, 30, 34, 35]. Several novel serotypes, including pi.2, rh.8, hu.11, hu.32, and hu.37, also appear to mediate strong and widespread neuronal transduction in the brain [36]. However, these serotypes have not yet been extensively tested, and can also transduce glia within the white matter [36]. AAV serotypes 1, 9, and rh.10 are therefore recommended for targeting of neurons via intraparenchymal brain injection. Furthermore, AAV2 is recommended for the targeting of small brain regions due to its reduced diffusion in brain tissue.

Tropism also varies among AAV serotypes when administered to the cerebrospinal fluid (CSF), either via intrathecal or intra-cerebroventricular injection. AAV4 strongly transduces ependy-mal cells when administered to the ventricles, and demonstrates greater ependymal cell tropism than AAVs 2 or 5 [22]. Intracerebroventricular injection of AAV4 is therefore recommended for targeting of ependymal cells. In contrast, AAV7 and AAV9 can bypass the ependymal layer following intrathecal injection, penetrating the parenchyma and transducing neurons throughout the cortex, cerebellum, and spinal cord [37, 38]. AAVs 1, 2, 4, 5, 6, and 8 do not appear to extensively penetrate the brain parenchyma when administered to the CSF [22, 39, 40]. However, AAV6 mediates widespread transduction of spinal motor neurons following intrathecal injection [40]. Further, AAV8 possesses a unique tropism for large-diameter neurons of the dorsal root ganglion (DRG), and drives specific gene expression within these cells following intrathecal injection [39]. Thus, CSF injection of AAV4 can mediate transduction of epithelial cells, CSF injection of AAV7 or 9 can mediate widespread transduction of cortical, cerebellar, and spinal neurons, CSF injection of AAV6 can mediate transduction of spinal motor neurons, and CSF injection of AAV8 can mediate transduction of large-diameter DRG neurons.

When the spinal cord is targeted directly via intraparenchymal injection, AAVs 1, 5, and 9 demonstrate the strongest neuronal tropism, while AAVs 2, 6, and 8 demonstrate weaker neuronal tropism [19, 40]. AAV8 retains its tropism for large-diameter DRG neurons after intraparenchymal injection [39]. AAVs 1, 5, and 9 are therefore recommended for intraparenchymal targeting of discrete populations of spinal motor neurons, while AAV8 is recommended for intraparenchymal targeting of large-diameter DRG neurons.

When administered intravenously, AAVs 9, rh.10, rh.8, and rh.43 can penetrate the blood–brain barrier and drive gene expression throughout the nervous system [41–44]. These serotypes possess both neuronal and glial tropism when administered systemically, but transduction is primarily neuronal in neonatal animals and primarily glial in adults [41–43]. Penetration of the intact blood–brain barrier and transduction of brain tissue is limited with other serotypes, including AAVs 1, 2, 5, 6, and 8 [44–47]. See also Chapters 16 and 17 for discussion of systemic AAV administration.

Tropism also differs among AAV serotypes following ocular administration via subretinal injection, which is generally an efficient method for outer retina transduction. AAVs 1, 2, 4, 5, 7, 8, and 9 transduce cells of the retinal pigmented epithelium (RPE) [48–52]. AAVs 1, 2, and 5 transduce these RPE cells with similar efficiency when directly compared [48]. In addition, AAVs 1, 2, 5, 7, 8, and 9 transduce photoreceptors (PRs), while AAV4 does not [48–53]. AAV8 also possesses tropism for ganglion cells and cells of the inner nuclear layer [50, 54], and demonstrates greater tropism for PRs than AAV2 [54]. In nonhuman primate subretinal injections, AAV8 efficiently targets rod PRs [54], while AAV9 is superior for cone PRs and retains tropism for rod PRs [53]. AAV5 has also been shown to target both rod and cone PRs, but has not been compared directly in this setting against other serotypes [55].

Surprisingly, nervous cell tropism can also vary among vector preparations of the same serotype, even when injected under identical conditions. For example, CsCl-purified AAV8 exhibited strong astroglial tropism following intraparenchymal brain injection, while iodixanol-purified AAV8, injected under identical conditions, transduced only neurons [23]. Variability among vector preps or among injection conditions may therefore explain the differences in tropism that are frequently observed among experiments.

As a result of this variability, it is not possible to confidently restrict gene expression to neuronal or glial populations based solely on the capsidserotype. Thus, a cell type-specific promoter should be utilized if astrocyte-, oligodendrocyte-, or neuron-specific transduction is desired. However, most serotypes preferentially transduce neurons following intraparenchymal brain injection, and therefore a pancellular promoter can be used to drive neuronal gene expression, so long as the potential transduction of glia is not problematic. On the other hand, if strong glial expression is desired, a cell type-specific promoter should be utilized.

In some cases it may be desirable for AAV to undergo axonal transport, either to increase the spread of gene transfer, or to retrogradely target a specific subpopulation of projection neurons. For example, anterograde transport of AAV9 injected into the ventral tegmental area of the brain can greatly enhance transgene distribution [34], and retrograde transport of AAV1 injected into muscle or sciatic nerve can specifically label discrete pools of motor neurons [56]. Axonal transport is a fundamental property of AAV vectors, and thus any vector that is endocytosed at high levels by projection neurons is likely to transduce distal brain regions [57, 58]. AAV1 appears to be more effective than AAVs 2, 3, 4, 5, or 6 for retrograde transduction of motor neurons following muscle or sciatic nerve injection [56]. Further, AAVs 1 and 5 demonstrate greater retrograde transduction of brainstem than AAVs 2, 8, or 9 following injection of the transected spinal cord [59]. Within the brain, AAV9 is most frequently observed to undergo axonal transport, and is the recommended choice if distal transduction is desired [21, 23, 34, 36, 57, 60]. In addition, AAVs 1, 8, and rh.10 also undergo axonal transport within the brain and can be used to drive distal transduction [21, 28–30, 34, 57, 60].

3 Modification of the Capsid
To further improve the utility of AAV as a gene therapy vector, research has concentrated on altering capsid properties such as tropism, targeting specificity, and antigenicity. This can be achieved by (1) chemical modification of the capsid; (2) assembling mosaic capsids consisting of subunits from two or more different serotypes; (3) peptide insertion; (4) capsid shuffling; or (5) rational design.

Chemical modifications to improve the tropic properties of adenoviral and lentiviral vectors have resulted in moderate success (reviewed in refs. [61, 62]). Similar strategies have been applied to AAV, although to a lesser extent. Bispecific antibodies capable of binding both AAV2 and αIIbβ3 integrin can increase the transduction of αIIbβ3 integrin-expressing cells by 70-fold [63]. In another study, linkage of biotin-coated AAV to an EGF- streptavidin fusion protein increased the transduction efficiency of EGFR-expressing SKOV3.ip1 cells more than 100-fold [64]. However, despite these promising results, enhancement of AAV tropism has not been achieved in vivo. Chemical capsid modifications can also be used to mask receptor-binding domains on the AAV capsid, de-targeting the virus from its native receptors, allowing infection through alternate receptors, and shielding the capsid from neutralizing antibodies. Indeed, moderate success has been achieved by coating capsids with poly(ethylene) glycol [65], poly-[N-(2-hydroxypropyl) methacrylamide] [66], and α-dicarbonyl compounds [67]. However, chemically modified capsids have yet to be widely used in vivo, and their utility in the CNS remains limited.

AAV capsids are assembled as icosahedral particles from 60 subunits of the VP1, VP2, and VP3 structural proteins. Hybrid capsids are designed to harness the structural similarity among AAVs, combining beneficial properties from two or more different serotypes by co-expressing their capsid proteins during vector production [68]. Hybrid capsids of AAV1 and AAV2 (AAV1/2) can mediate stronger transgene expression in lung and muscle in vivo than either of the parental serotypes alone [69]. Further, AAV1/2 appears to combine the tropism of AAV2 for TH-positive dopamine neurons with the ability of AAV1 to diffuse more widely through brain tissue, mediating strong transduction of dopamine neurons in the substantia nigra, and has been used to model Parkinson’s disease in the rat [70]. Hybrid capsids can also be used to transfer binding affinities from their parental serotypes, such as HSPG binding (AAV2 or AAV3) or mucin binding (AAV4 or AAV5) [71]. However, although the composition of hybrid capsids can be influenced by expressing the parental cap genes at specific ratios, the composition of individual capsids cannot be directly controlled, and undesired capsid arrangements are likely to occur. Furthermore, direct genetic manipulation of the cap gene provides a more precise method by which the properties of different serotypes can be combined, and thus hybrid AAV capsids are rarely utilized.

The earliest successful alterations of AAV tropism by capsid engineering relied on insertion of short peptides into the AAV capsid [72]. Inserted peptides are displayed on the capsid surface and provide affinity for a receptor specifically expressed by the target cell type. Simultaneous disruption of the native capsid tropism can increase the likelihood of specific interaction with the novel target receptor. However, early attempts to provide AAV5 with the ability to bind HSPG indicated that conferring efficient receptor binding does not necessarily confer efficient transduction of the target tissue. Mutant AAV5 virions, despite being able to bind HSPG as efficiently as AAV2, lost their native infectivity and thus did not demonstrate increased tropism for HSPG-expressing cells [73]. In addition to specific insertions of known receptor binding peptides, insertion screens of random peptide libraries have also been utilized. In both cases, it must be ensured that the insertion does not negatively affect vector production, infectivity, or other properties required for gene transfer. Several regions of the capsid are amenable to insertions, including the N-termini of VP1 and VP2, as well as the various loop regions shared by all VP proteins [72, 74–78]. A peptide inserted into the common C-terminal domain shared by all three VP proteins will be displayed on every capsid subunit (60 copies per viral particle), whereas a VP1 or VP2 insertion will only be present in up to 6 or 12 copies per capsid, respectively. This is an important consideration, as the density of receptor-binding peptides on a capsid can affect its tropism [79]. Although initial experiments focused on the insertion of small peptides (5–15 amino acids), later studies have shown that insertions of full length proteins, such as GFP or mCherry, can also be tolerated without loss of virus function [78, 80]. However, despite these promising in vitro proof-of-principle studies, few AAV mutants with enhanced tropism in vivo have been published. To date, improved targeting of skeletal muscle [81, 82], cardiac muscle [83], vasculature [84–86], lung [87, 88], diseased brain endothelial cells [89], retina [90], ovarian cancer cells [91] and breast cancer cells [88] has been reported. Most of these peptide insertions are targeted between amino acids 587 and 588 of AAV2, the region that mediates HSPG binding [73, 92], in order to disrupt the function of this region and de-target AAV2 from its native tropism. The AAV2-7m8 mutant, which was generated via random insertion of a seven amino acid sequence, can efficiently target most retinal cell types following intravitreal injection [90].

In addition to peptide insertion, the development of capsid shuffling [93] and directed evolution [94], discussed in Chapter 11, has generated many promising novel AAV variants. Briefly, cap genes of different AAV serotypes are nuclease digested, mixed together, and randomly reassembled to produce mutated chimeric genomes, which are subsequently selected for a specific function or tropism via directed evolution screening. Capsids with improved transduction of heart [95], lung [96, 97], Müller glia in the retina [98], CNS [99–101], and neural and pluripotent stem cells [102, 103] have been described, with numerous others yet unpublished. Directed evolution was also used to screen the random insertion of short peptides into AAV2 VP3, resulting in the AAV2-7m8 mutant capsid, which is capable of transducing all retinal layers after intravitreal injection [90]. In addition to AAV2-7m8, ShH10 can specifically transduce Müller cells from the vitreous [98].

An improved understanding of AAV structure and biology has enabled researchers to modify vector function by rationally targeting mutations of amino acids on the viral capsid, rather than selecting clones with the desired property from a library of mutants. Some rationally designed mutants combine the desired functions of different serotypes, while others disrupt the domains responsible for unwanted characteristics. The former group includes AAV2i8, a chimera of AAV2 and AAV8 [104], AAV2.5, a chimera of AAV2 and AAV1 [105], and chimeras of AAV1 and AAV6 differing by single amino acid changes [106]. These studies indicate that changing only a small number of amino acids is sufficient to generate capsids with the characteristics of both parental serotypes. In addition, AAV2i8 and AAV2.5 possess unique antigenic properties [104, 105]. Similarly, AAV6.2, a novel vector with improved transduction in mouse airways, was generated via targeted single amino acid changes to AAV6 [107, 108]. Disruption of native AAV properties by targeted amino acid mutations has primarily focused on masking the capsid from neutralizing antibodies, which can inhibit AAV-mediated gene transfer. Several mutations that alter serum antibody recognition and neutralization of AAV2 while retaining normal vector function have been identified [109, 110]. Rational mutations have also been designed to increase transduction efficiency, reducing the vector dose required for clinically relevant transgene expression and avoiding the immune response associated with large viral loads. It is hypothesized that phosphorylation of the AAV capsid leads to ubiquitination and subsequent proteasome-mediated degradation of the vector particle, reducing transduction efficiency [111]. Indeed, mutating several surface exposed tyrosine and threonine residues significantly increased the transduction efficiency of AAVs 2, 5, and 8 [111, 112]. As hypothesized, proteosomal degradation of these capsid mutants was reduced, resulting in increased viral nuclear transport and transgene expression [111, 113, 114]. In addition, novel transduction patterns are observed when AAV2 tyrosine mutants are applied to the mouse retina, in particular following intravitreal injection, which may eliminate the need for surgically challenging subretinal vector administration [115, 116]. Tyrosine mutations of different serotypes also demonstrate improved transduction of mesenchymal stem cells [113] and the mouse brain [117, 118]. A similar strategy to disrupt AAV2 phosphorylation by targeting serine, threonine, or lysine residues can increase liver transduction in mice [119]. Although tyro-sine mutations improve retinal transduction following intravitreal injection, in the only comparison published to date, AAV2-7m8 demonstrated more efficient transduction from the vitreous [90]. Finally, double tyrosine-mutant AAV9 vectors containing AAV3 ITRs and the neuron-specific synapsin promoter appear to possess stronger neuronal tropism than AAV9 in the murine CNS following systemic delivery, although these mutant vectors were not compared directly against AAV9 or other variants [117].

4 Additional Methods to Refine Gene Targeting
The broad natural tropism of the AAVcapsid can be enhanced or made more specific by harnessing cell type-specific promoters. For example, the 1.3 kb CaMK2a promoter can drive transgene expression in glutamatergic excitatory neurons with high specificity [120]. Further, the 1.8 kb neuron-specific enolase promoter [121], the 470 bp human synapsin-1 promoter [120, 122], the 229 bp MeCP2 promoter [123], and the 2 kb herpes simplex virus 1 latency associated transcript promoter [124] can all drive neuron-specific gene expression. The GFAP promoter can drive astrocyte-specific expression, and the myelin basic protein (MBP) promoter can drive oligodendrocyte-specific expression [26, 125]. However, in order for these promoters to be effective, the AAV capsid must possess tropism for the target cell type. For example, AAV4 carrying a CaMK2a promoter is unlikely to drive strong expression within excitatory neurons, as AAV4 does not transduce this cell type efficiently [22]. Similarly, AAV2 carrying a GFAP promoter was found to drive expression primarily in neurons following intra-parenchymal brain injection, likely due to the limited astroglial tropism of AAV2 [121]. See also Chapter 6 for discussion of cell type-specific promoters.

Woodchuck hepatitis virus posttranscriptional regulatory elements (WPREs) are frequently included in the recombinant AAV genome, and can increase the strength of transgene expression [121, 126]. However, WPREs drive greater expression not only in the target cell type, but also in off-target cells that endocytose a lesser number of AAV particles. This can decrease the specificity of an engineered vector or a cell type-specific promoter. Although this effect has not been thoroughly studied, WPRE elements are not recommended when high specificity for a single cell type is desired. WPREs have also been implicated as a contributing or causal factor in oncogenesis in preclinical studies, likely via an ORF within the WPRE [127]. Modified versions that eliminate this ORF have been developed [128].

The injected dose and volume can also influence AAV tropism. Raising the injected dose increases the number of AAV particles that are endocytosed by all cells local to the injection site, driving stronger gene expression within off-target cells. For example, when 1.2 × 1011 genome copies (GC) of AAV1 carrying a human synapsin 1 promoter were intraparenchymally injected, gene expression was highly specific for inhibitory neurons [120]. However, raising the injected dose to 1.7 × 1012 GC resulted in similar levels of gene expression within excitatory and inhibitory neurons, and further raising the dose to 8.4 × 1012 GC resulted in gene expression primarily within excitatory neurons [120]. This is likely due to increased uptake of AAV1 by excitatory neurons at higher injected doses. Decreasing the injected volume while maintaining the injected dose is likely to have a similar effect, as this will apply AAV more focally, driving stronger gene expression within a smaller number of cells. On the other hand, decreasing the injected dose, or applying AAV more diffusely by increasing the injected volume, is likely to increase specificity by reducing the number of vector particles that are endocytosed per cell. Low doses of AAV are therefore recommended when cell-specific expression is desired. If strong or widespread transduction is required, a dose escalation experiment can be performed to identify the injection parameters that result in the strongest gene expression without loss of specificity. See also Chapter 14 for discussion of intraparenchymal injection. Similar findings were observed in the retina with AAV8, as increased dose shifted tropism from RPE alone to both RPE and PRs [54].

AAV tropism can also be modified via the inclusion of microRNA (miRNA) target sites within the AAV genome [129–131]. By utilizing miRNAs that are expressed only in certain cell types, gene expression can be specifically reduced within these target cells. For example, subretinally injected AAV5 typically transduces both RPE cells and PRs, as described in Subheading 2, item 7. However, binding sites for the RPE-specific miR-204 can block AAV5-mediated gene expression in RPE cells, resulting in PR-specific expression [130]. Further, binding sites for the PR-specific miR-124 can block gene expression in PRs, resulting in RPE-specific expression [130]. Thus, gene expression can also be restricted from specific populations via miRNA binding sites within the recombinant AAV genome.

原文阅读:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4993104/

https://jneurodevdisorders.biomedcentral.com/articles/10.1186/s11689-018-9234-0

https://www.cell.com/molecular-therapy-family/methods/fulltext/S2329-0501(19)30011-7

https://www.alacrita.com/whitepapers/adeno-associated-virus-gene-therapy-landscape

https://www.genemedi.net/i/aav-packaging

https://www.frontiersin.org/articles/10.3389/fnana.2019.00093/full

单细胞测序:数据的挖掘和拓展

单细胞测序如火如荼,但是如何充分利用这些已经发表的数据,并从中挖掘新的信息?就像当年的GWAS一样,如何验证得到疾病相关的SNP位点?哈佛大学的JOSHUA SANES给出了自己的建议:

  • Development: How do types diversify and mature?
  • Injury: Can we discover cell-type-specific signatures that give us insights into selective vulnerability to injury?
  • Humans: Can methods from mice be used to classify neurons in non-human primates and humans?
  • Disease: What cells express genes that predispose to or cause disease?
  • Evolution: Cell classes are conserved among vertebrates. How about cell types?

RNA-Seq: 10年回顾和展望

摘要

RNA测序(RNA-seq)在过往十年里逐渐成为全转录组水平分析差异基因表达和研究mRNA差异剪接必不可少的工具。随着二代测序技术 (NGS)的发展,RNA-seq的应用也越来越广。现已经可以应用于很多RNA层面的研究,比如单细胞基因表达、RNA翻译(translatome)和RNA结构组(structurome结构组学)。新的有意思的应用,如空间转录组学(spatialomics)也在积极研究中。通过结合新兴的三代长读长long-read和direct RNA-seq技术,以及更好的计算分析工具,RNA-seq帮助大家对RNA生物学的理解会越来越全面:从转录本在何时何地转录到RNA折叠以及分子互作发挥功能等。

前言

RNA测序(RNA-seq)自诞生起就应用于分子生物学,帮助理解各个层面的基因功能。现在的RNA-seq更常用于分析差异基因(DGE, differential gene expression),而从得到差异基因表达矩阵,该标准工作流程的基本分析步骤一直是没有太大变化:

* 始于湿实验,提取RNA,富集mRNA或消除rRNA,合成cDNA和构建测序文库。
* 然后在高通量平台(通常是Illumina)上进行测序,每个样本测序reads深度为10-30 Million reads。
* 最后一步是计算:比对/拼装测序reads到转录本,计数与转录本比对上的reads数定量,样本间过滤和标准化,样本组间基因/转录本统计差异分析。

早期的RNA-seq实验从细胞群(如来源于某个组织或器官的细胞)中得到DGE数据,并可以应用于很多物种,如玉米(Zea mays),拟南芥(Arabiodopsis thaliana),酿酒酵母(Saccharomyces cerevisae),鼠(Mus musculus)和人(Homo sapiens)。虽然RNA-seq这个词通常包含很多不同的RNA相关的方法或生物应用,但DGE分析始终是它的主要应用(表1),并且是DGE研究的常规工具。

RNA-seq的广泛应用促进了对许多生物层面的理解,如揭示了mRNA剪接的复杂性、非编码RNA和增强子RNA调控基因表达的机制。RNA-seq的发展和进步一直离不开技术发展的支持(湿实验方面和计算分析方面),且与先前的基于基因芯片的技术比起来,获得的信息更多、偏好性更小。到目前为止,已从标准的RNA-seq流程中衍生出多达100种不同的应用。大部分应用都是基于Illumina short-read测序,但最近基于long-read RNA-seq和direct RNA sequencing (dRNA-seq)的方法可以帮助解决Illumina short-read技术处理不了的问题。

本文中,我们先熟悉’baseline’流程,用short-read RNA-seq技术分析DGE。先描述短读长测序的文库构建过程、实验设计注意事项和计算分析流程,探究其应用如此广泛的原因。然后描述单细胞转录组和空间转录组的发展和应用。我们会举例说明RNA-seq在RNA生物学关键研究中的应用,包括转录和翻译的动力学分析,RNA结构,RNA-RNA和RNA-蛋白质间相互作用等。最后我们小小地展望一下RNA-seq的未来,如单细胞和空间转录组是否也会是以后的常规分析,在什么情况下long reads会替代short reads RNA-seq。不过篇幅有限,本文对RNA-seq分析还是有照顾不到的地方,比如典型的有非编码转录组,原核转录组和表观转录组。

Xnip2019-11-08_12-07-30.png
图1A:三种RNA测序方式的建库方法概览:short-read测序(黑色),long-read cDNA测序(绿色)和long-read direct RNA-seq(蓝色)。根据不同的应用目的,文库构建的复杂性和偏好性不同。short-read和long-read cDNA的建库方案在很多步骤是一样的,比如在所有建库方案中接头连接是共有的。三种方法都会受到样本质量和文库构建上下游的计算问题影响。

Xnip2019-11-08_12-09-51.png

图1B: 三种主要测序技术的比较。
Illumina workflow(左):建库之后,单独的cDNA分子在流动槽中构建测序簇,使用3’阻断的荧光标记的核苷酸进行边合成边测序。在每一轮测序中,高速摄像机拍照捕获当前激发的荧光,来判断当前是哪个核苷酸合成进来,测序长度在50-500 bp。
The Pacific Biosciences workflow(中):建库之后,每个分子与固定在纳米孔底部的聚合酶结合。然后是边合成边测序,测序长度可以高达50 kb。
The Oxford Nanopore workflow(右):建库后,将单个分子加载到流动槽中,在接头连接过程中加上的分子马达会与生物纳米孔结合。马达蛋白控制RNA链穿过生物纳米孔,引起电流变化,从而推测出经过的碱基序列,生成的测序reads大小为1-10 kb。

Xnip2019-11-08_12-10-58.png
图1C:short-read,long-read和direct RNA-seq分析

人体中,超过90%的基因(gene n)会发生可变剪接,并生成至少两种不同的表达形式(转录本x,y)。相比于long-read测序可以直接测到每一种不同的转录本,从而获得更全面的信息,short-read的测序在检测转录本上受限于短reads比对的模糊性。在short-read cDNA测序中,有很多reads比对回两个不同转录本共享的外显子上导致无法确定其真实来源。跨越2个或多个外显子的Junction reads可以改善转录异构体的分析,但当两种转录异构体共享剪接断点时就无能为力了。这些问题都增加了分析和解读结果的复杂度。long-read cDNA方法能直接检测全长转录异构体,从而移除或大幅减少检测偏好,提高差异表达转录本分析的准确率。

而以上这些方法都依赖于cDNA转换,这一过程抹去了有关RNA碱基修饰的信息,而且也只能粗略估计多聚腺苷酸(poly(A))尾巴的长度,而direct RNA-seq可以直接分析全长转录本异构体、度量碱基修饰(比如N6-甲基腺苷(M6A))和检测poly(A)尾巴长度。

RNA-seq技术的进步

在NCBI Short Read Archive (SRA)数据共享平台中多于95%的数据来自于Illumina short-read测序技术(表2)。目前几乎所有已发布的mRNA-seq数据都是short-read测序所得,所以我们认为这是RNA-seq技术的常规操作,接下来讨论它的主要流程和限制。不过在转录异构体检测的研究(图1;表1)方面,不断进步的long-read cDNA测序和dRNA-seq技术将向short-read测序技术的主导地位发起挑战。

Xnip2019-11-08_12-13-08.png

short-read cDNA测序用于差异基因分析

short-read测序是检测和定量转录组范围基因表达的最常见方式,部分原因是因为它比表达芯片更便宜、更易于应用,但更主要的是它可以获得全转录组水平高质量的表达数据。采用Illumina的short-read测序做DGE分析的核心步骤包括RNA提取,cDNA合成,接头连接,PCR扩增,测序和数据分析(图一)。由于mRNA片段化和基于beads的文库纯化过程中偏好150-200 bp的片段,导致这个方案最后获得的cDNA片段都在200 bp以下。每个样本平均测20-30 million reads,对每个基因或转录本进行定量,再统计分析差异基因(参考RNA-seq数据分析部分)。short-read RNA-seq结果很稳定,对RNA-seq的short-read测序技术多次测试比较发现,其平台内和平台间的相关性都很好。然而在样本准备和计算分析阶段有一些步骤也会引入偏好性。这些限制会影响特定生物问题的解释,比如正确地识别和定量一个基因的多个转录异构体。这一局限与研究特别长或特别多变的转录异构体尤其相关。如人的转录组中,50%的转录本长度大于2500 bp,转录本长度范围在186 bp到109 kb。尽管short-read RNA-seq 可以对更长的转录本进行细致的分析,但相应的方法很难高通量化用于全转录组范围的分析。其它的偏好性和限制可能来自于RNA-seq数据分析的计算方法,比如怎么处理在基因组上有多个匹配位置的序列。一个新的称为合成长读长测序 (synthetic long reads)可以进行全长mRNA测序和解决一部分存在的问题。在short-read RNA-seq建库前利用唯一分子标识符(UMI)标记cDNA分子,从而解决短读长问题做到测序全长mRNA。基于这个技术可以对长达4 kb的转录本异构体进行鉴定和定量。从根本上解决short-cDNA测序固有限制的最有效的方法还是long-read cDNA测序和dRNA-seq方法。

long-read cDNA 测序

尽管Illumina是目前主流的RNA-seq平台,但Pacific Biosciences(PacBio)和Oxford Nanopore(ONT)能在完整的RNA分子反转录为cDNA后进行单分子长读长测序。因为消除了short RNA-seq reads需要的组装步骤,可以解决short reads测序相关的一些问题。例如:序列比对的模糊性降低,可以鉴定更长的转录本,这些有助于更好地检测转录异构体的多样性。同时还可以降低许多short-read RNA-seq计算工具引入的剪接位点检测的高假阳性率。

基于PacBio技术的Iso-Seq能够检测长达15 kb的全长转录本cDNA reads,这有助于发现大量先前未注释的转录本,并通过全长测序确认了早期基于跨物种同源序列的基因预测结果。在标准的Iso-Seq实验流程中,模板置换逆转录酶可以将高质量RNA转化为用来测序的全长cDNA。然后将得到的cDNA进行PCR扩增,并构建PacBio单分子实时(single-molecule, real-time,SMRT)文库。因为短转录本可以很快地扩散到测序芯片的活性表面造成一定的测序偏好,建议选择1至4 kb长度的转录本一起测序,以保证这一长度范围的长短转录本有同等几率进行测序。同时PacBio测序对模板量需求很大,要求进行大体积PCR,需要优化反应体系降低过扩增的影响。PCR末端修复和PacBio SMRT 接头连接后,就可以进行long-read测序了;通过调整测序芯片的上样条件可以进一步控制测序片段的大小选择偏好。

ONT cDNA测序也可以测序全长转录本,而且适用于单细胞测序。同样使用模板置换逆转录来制备全长cDNA,在加接头制备测序文库之前,可以自己决定是否进行PCR扩增。Direct cDNA测序可消除PCR偏差,获得的测序结果质量更高 ;PCR扩增的cDNA文库的测序产出(测序获得的reads数)更高,适用于样本中RNA含量较少的情况。而目前还未在ONT cDNA测序中发现PacBio测序存在的转录本长短选择偏好。

这些long-read cDNA方法都受模板置换逆转录酶限制。这个酶可以把全长和截断的RNA都转换成cDNA。反转录酶只将5’-capped mRNA转换成cDNA,这样就降低了由于RNA降解、RNA断裂导致的转录本截断生成的cDNA和不完整的cDNA合成,从而提高数据质量。但是这些逆转录酶对ONT平台的测序reads读长有反作用。

Long-read direct RNA 测序

正如上面所讨论的,long-read和baseline short-read 平台一样,都需要在测序之前将mRNA转化成cDNA。近期Oxford Nanopore展示他们的纳米孔测序技术能直接测序RNA,也就是说,建库过程中没有修复、cDNA合成、PCR扩增这些过程,移除了这些操作过程的偏好并且保留了RNA上的表观修饰信息,这一技术也称为dRNA-seq。直接从RNA建库需要两步接头连接。首先,带有oligo(dT)悬臂的duplex adaptor与mRNA的PolyA尾巴退火连接。后续是一个可选的逆转录操作,用于提高测序通量(一般推荐做)。第二个连接操作就是添加连有分子马达的测序接头用于后续测序。随后文库加载入MinION,启动3ʹpoly(A)尾巴向5ʹcap端的RNA测序。早期研究表明,dRNA-seq的测序长度在1000 bp左右,最大测序长度超过10 kb。与短读长测序相比,长读长测序可以改善转录异构体的检测,估计PolyA尾巴的长度进行选择性多腺苷酸化分析。Nanopolish-polya工具可以分析纳米孔测序得到的数据,计算基因间或转录本间的poly(A)尾的长度。结果表明内含子保留的转录本相比于完全剪切的转录本具有稍长的PolyA尾巴。虽然dRNA-seq还处于起步阶段,但是其能直接检测RNA碱基修饰的潜力有望在表观转录组领域促进更新的发现。

长读长测序与短读长测序技术的比较

虽然长读长测序技术在转录本分析方面比短读长测序技术有一些明显的优势,但是也存在一些局限。跟成熟的短读长技术平台相比,长读长测序技术的测序通量低很多,错误率更高。而长读长测序技术的主要优势即能测序更多的独立转录本全长,依赖于高质量的RNA文库。这些局限会影响那些特别依赖长读长测序实验的灵敏性和特异性。

当前长读长测序方法的主要局限就是其通量低。在Illumina平台上,一个RUN可以生成10^9-10^10条reads,而PacBio和ONT平台上,一个RNA-seq RUN只能产生10^6-10^7 reads。这种低通量限制了应用长读长测序的项目的大小(实验样本的数目),并降低了差异基因表达检测的灵敏性。当然也不是所有的应用都需要很高的测序深度。比如如果研究者关注的是转录异构体的发现和鉴定,测序长度比测序深度更重要。测序1百万个PacBio环形一致性序列 (circular consensus-sequencing, CCS) 可以保证长度大于1 kb的高表达基因测通,ONT测序技术也是如此。因此,测序深度主要影响低中表达的基因。低通量的局限性在研究功能基因组进行大规模差异基因分析时会更明显。为了获得足够的以保证转录组表达变化检测的准确性,需要对多个样品组的多个生物学重复同时进行测序分析。在这些应用上,长读长技术不太可能取代短读长技术,除非它们的通量能提高2个数量级。随着全长RNA-seq reads数目增加,转录本检测的灵敏度将会达到Illumina平台的水平,但有着更高的特异性。通过将Illumina 的短读长RNA-Seq与PacBio的长读长Iso-Seq结合 (并且可能还与ONT方法结合),在保留转录本定量质量的基础上,可以增加RefSeq注释的全长转录异构体检测的数量、灵敏性和特异性。尽管当前长读长RNA-seq方法实验成本更高,但它们可以检测短读长方法所遗漏的转录异构体,尤其是那些难以测序但与临床相关的区域,例如高度多态的人类主要组织相容性复合体MHC或雄激素受体。

长读长测序平台的第二个主要限制是其高错误率,比成熟的Illumina测序仪要高出一到两个数量级。长读长测序平台上生成的数据还包含更多的插入-缺失错误。如果是做突变位点检测这些错误率/错误形式会影响很大,但是对转录组分析影响并不是太大,只要能区分转录本和转录异构体即可。如果是应用于对错误率敏感的项目,也有一些办法进行补救。PacBio SMRT测序平台出现的典型测序错误是随机错误,可以通过增加测序深度来进行CCS序列矫正解决。在测序过程中,cDNA的长度是人为选择控制的,连接接头后形成环形模板,每个分子可以被测序多次,从而产生长度范围是10-60 kb的连续长序列,里面包含了原始cDNA的多份拷贝。这些长序列经过计算拆分成为单个cDNA子读长 (subreads),并比对在一起互相校正获得一致性序列。插入的cDNA分子测序到的次数越多,校正后错误率越低;研究表明CCS可以将错误率降低到与短读长相当甚至更低的水平。但是,把平台的测序能力用于读取相同的分子更加加剧了其测序通量低的问题,更少的独立转录本会被测到。

长读长RNA-seq方法的敏感性还受到其他几个因素的影响。首先,用于建库的RNA分子需要是全长转录本,但由于RNA提取、分离过程中会导致RNA断裂或实验过程中RNA降解,使得理想状态并非总能实现。这种情况在短读长RNA-seq中也会导致可控的3ʹ端偏好,但对定位于应用长读长的RNA-seq分析全长转录组的研究者来说,即使是低水平的RNA降解,效果也会受限。因此,相关研究者需要在RNA提取后进行严格质控。其次,中位读长长度也会受到文库制备中的技术问题与技术偏好的限制,例如cDNA合成过程中的截断或降解的mRNA反转录成的降解cDNA。最近研发的高效逆转录酶具有更好的链特异性和更均一的3’-5’转录本覆盖,可能会改善这一过程。虽然还没有广泛使用,但是这些高效逆转录酶也提高了对结构稳定的RNAs(如tRNAs)的覆盖检测,这是其它在基于oligo-dT和全转录组分析 (WTA) 的方法中使用的逆转录酶很难达到的效果。第三,长读长测序平台固有的偏好(如长插入文库在测序芯片上的更不容易进行测序)会降低更长转录本的覆盖率。

长读长测序 (不管是基于cDNA还是RNA) 因为读长长,解决了短读长测序方法用于转录异构体分析的短板。长读长方法可以获得从Poly(A)尾巴到5ʹ帽子的全长转录本读长。因此,这些方法对转录本和转录异构体的分析不再依赖于短序列重构转录本或推测转录本的存在;而是每个测序到的reads都代表它所来源的RNA分子。基于全长cDNA测序或dRNA-seq的差异基因分析依赖于PacBio和ONT技术的通量提高。长读长RNA-seq与深度短读长RNA-seq技术结合的思路正在迅速被研究者用于更全面的分析,这非常类似于基因组组装所采取的混合组装方式。随着研究的深入,长读长和dRNA-seq方法将会揭示:即便在研究的很透彻的物种中,已经鉴定出的基因和转录本可能也只是冰山一角。随着方法的成熟和测序通量的增加,基于长读长的差异转录本分析将会成为常规研究。基于组装的长读长RNA-seq (synthetic long-read RNA-seq)或其它技术的发展对这个领域的影响还有待观察。从目前来看,Illumina短读长RNA-seq依然占据了该领域的主导地位。后面我们只会集中讨论短读长测序。

改良RNA-seq建库方法

RNA-seq方法源于早期的表达序列标签 (expressed-sequence tag)和表达芯片技术,最初用于分析多聚腺苷酸化的转录本。但是,二代测序的应用发现了这些方法的局限性,虽然在表达芯片中并不明显。因此,在RNA-seq技术首次发表后不久,许多文库制备方法的改进相继推出。例如,片段化RNA而非cDNA可以降低3’/5’偏好,链特异性文库制备方法能够更好的区分正链和负链转录的基因,这些改进都能获得更准确的转录本丰度估计。片段化RNA和构建链特异性文库很快成了大部分RNA-seq文库制备试剂盒的标配。这里我们简要描述了RNA-seq方法的其它改进,以便研究者可以根据特定的生物学问题或样本自身特征进行选择。这些改进包括不基于oligo-dT的RNA富集方法,特异性富集3ʹ或5ʹ末端转录本的方法,使用UMIs区分PCR duplicates的方法,以及针对降解的RNA构建文库的方法。这些方法的组合(也包括dRNA-seq和后面提到的分析其它状态的RNA的方法)允许研究者揭示由可变poly(A) (alternative poly(A), APA),或选择性启动子 (alternative promoter)和可变剪接 (alternative splicing)导致的转录组的复杂性。

Poly(A)富集的替代方法

大多数发表的RNA-seq数据都是基于oligo-dT方法富集包含poly(A)尾巴的转录本,定位于分析转录组上的蛋白质编码区 (生信宝典注:部分lncRNA也有poly(A)尾巴)。但是这种方法除了会导致3ʹ端偏好外,很多不含Poly-A尾巴的非编码RNA,例如miRNA和增强子RNA不会被测到。完全不进行选择而使用全部提取的RNA也不合适,因为这会导致高达95%的测序数据来源于rRNA。因此,研究者选择将oligo-dT富集用于mRNA-seq,移除rRNA进行全转录组测序(WTA)。短链非编码RNAs(如miRNA)既无法用oligo-dT方法富集,WTA测序中也很难覆盖,因此对其研究需要特定的分离建库方法,一般是切胶或磁珠分选后直接连接接头 (sequential RNA ligation,通常构建出来都是链特异性文库) (生信宝典注:这一点尤其要注意)。

WTA生成的RNA-seq数据包含编码和一些非编码RNA。WTA方法也适用于Poly-A尾巴与转录本其它部分分开了的降解了的样品。移除rRNA有两种方法,一种是将rRNAs从总RNA中分离出来(所谓的pull-out法),另一种是使用RNAse H酶降解rRNA。这两种方法都需要使用序列特异性和物种特异性的、能与细胞质rRNA (5S rRNA,5.8S rRNA,18S rRNA和28S rRNA)和线粒体rRNA (12S rRNA和16S rRNA)互补的寡核苷酸探针。为了简化人类、大鼠、小鼠或细菌 (16S和23S rRNA)样本的处理,上述探针混合后再加入提取的总RNA中,与其中的rRNA杂交以便下一步的清除。其它高丰度的转录本,例如珠蛋白RNA (globin)或线粒体RNA也可以按照类似的方法去除。Pull-out方法中探针是带有生物素的,然后使用链霉素包裹的磁珠从总RNA溶液中除去探针-rRNA复合物,剩余的RNA用于建库测序,试剂盒有Ribo-Zero (Illumina,USA) (生信宝典注:还是Illumina取名字霸气)和RiboMinus (Thermo Fisher,USA)。RNAse H方法使用RNAse H (NEBNext RNA depletion(NEB,USA))和RiboErase (Kapa Biossystems,USA)降解oligo-DNA:RNA复合物。最近的比较表明,在RNA质量高的前提下,这两种方法都可以将产出数据中rRNA的比例降低至20%以下。但是,研究还表示RNase H方法比pull-out法的稳定性要好。另外对应用不同试剂盒获得的数据进行差异基因分析时要注意转录本长度的偏好性的影响。作者还描述了另外一种类似于RNase H的方法,效果也不错但之前没有报道过。ZapR方法是Takara Bio的专利技术,它使用一种酶来降解RNA-seq文库中的rRNA片段。相比于oligo-dT RNA测序方法,rRNA移除建库方法的一个局限是需要更高的测序深度,主要是因为文库中还有一定的rRNA留存。

Oligo-dT和rRNA移除法都可以用于后续实验的DGE分析,研究者们通常会延续实验室一直使用的方法或最容易使用的方法。然而,对于这些方法的选择需要根据情况做一些考量,尤其是那些易降解的样本,如果采用WTA方法会检测到更多的转录本,但是其实验成本也高于oligo-dT方法。

富集RNA 3ʹ端用于Tag RNA-seq以及可变多聚腺苷酸分析 (Enriching RNA 3ʹends for Tag RNA- seq and alternative polyadenylation analysis)

标准的短读长Illumina方法应用于高质量差异基因分析时需要对每个样本测序1000万到3000万条(10M到30M条)reads。如果研究者只关注基因水平的表达,并且样本数目比较多和生物重复比较多时,或者实验样品材料受限时,建议采用3ʹtag计数。由于测序集中在转录本的3ʹ末端,需要的测序深度会降低,就可以降低成本或同时测序更多样本。富集3ʹ末端也可以用于检测由于mRNA前体上发生的选择性多聚腺苷酸化导致的单个转录本的poly(A)位点的变化。

3ʹ mRNA-seq方法中每个转录本获得一条测序片段 (tag read),通常是对其3’末端的测序。tag read的数目理论上与转录本的丰度是成正比的。标签测序法 (tag-sequencing protocols),例如QuantSeq (Lexogen, Austria)通常比标准RNA-seq实验流程更为简单。标签测序法采用随机引物或带有oligo-dT的引物进行PCR扩增分选出转录本的3’末端的同时加上接头序列,优化掉了poly(A)富集、rRNA移除和接头连接等步骤。这一方法可以在更低的测序深度条件下达到与标准RNA-seq相当的敏感性,因此可以混合更多样本同时测序。因为不需要考虑外显子连接检测 (exon junction)和基因长度归一化,这一方法的数据分析也简化了(生信宝典注:其实也是需要考虑的,转录本末端或UTR区也会存在剪接,具体取决于测序读长和特定基因的结构。不过如果使用STAR/BWA等有soft-clip机制的比对工具也可以不考虑。)。但是,3ʹ mRNA-seq方法可能会受到转录本序列相似区域 (homopolymeric region) 导致的引物结合错误进而导致扩增出错误的片段的影响;也只能进行非常有限的转录异构体分析,这会抵消这一方法因为测序深度需求低带来的高性价比,尤其是对于那些仅够一次使用的样本。

mRNAs的选择性多腺苷酸化(APA)会产生3ʹ UTR长度不等的转录异构体。对于一个特定的基因来说,这不只是多转录出几个异构体,而是3ʹUTR中存在的顺式调控元件会影响转录本自身的调控。能够研究APA的方法可以让研究者们对miRNA的调控、mRNA的稳定性和定位、以及mRNA的翻译有更多理解。APA法要求是富集转录本的3ʹ末端,从而提升检测信号和灵敏度,而前面提到的3ʹ mRNA-seq标签测序法则正合适。其它方法如多聚腺苷酸位点测序 (polyadenylation site sequencing, PAS-seq)法,首先将mRNA打断为150 bp左右的片段,然后使用带有oligo-dT的引物进行模板置换生成cDNA用于后续测序,其中的80%的测序序列来源于3ʹUTR。TAIL-seq则避免使用oligo-dT,RNA打断前,先移除rRNA,然后在转录本poly(A)尾巴连接3ʹ接头。片段化后,再加上5ʹ接头就完成了文库制备。在RNA-蛋白互作分析方法如交联免疫沉淀 (cross-linking immunoprecipitation, CLIP)测序和dRNA-seq中也能评估APA。

富集RNA 5ʹ末端用于转录起始位点鉴定 (Enriching RNA 5ʹends for transcription start- site mapping)

富集5ʹ端RNA (7-methylguanosine 5ʹ-capped RNA)的测序的方法常用来鉴定启动子和转录起始位点(TSSs),可以做为DGE分析的补充。有多种方法都可以实现这个操作,但很少作为常规使用。在CAGE (cap analysis of gene expression)和RAMPAGE (RNA annotation and mapping of promoters for analysis of gene expression)方法中,使用随机引物完成cDNA第一条链合成后,mRNA 5ʹ帽子结构上用生物素标记,然后使用链霉亲和素富集5’ cDNA。CAGE使用II型限制性内切酶切割5ʹ端接头下游21-27 bp位置生成短cDNA序列。而RAMPAGE则使用模板置换 (template switching)来生成稍微长一些的cDNA,进行富集测序。单细胞标签逆转录测序技术 (single-cell-tagged reverse transcription sequencing, STRT-seq)能够在单细胞水平上鉴定TSS位点。这一方法使用生物素标记的模板置换寡核苷酸来合成cDNA,磁珠捕获并在5’端片段化然后测序。CAGE应用到的5ʹ末端标记技术是由日本理化所 (Riken)开发用于在早期功能基因研究中最大化获得全长cDNA的方法。日本理化所领导的小鼠功能注释 (FANTOM, Functional Annotation of the Mouse)项目中使用CAGE技术鉴定了1300多个人类和小鼠原代细胞、组织和细胞系的TSSs (转录起始位点),这充分显示了CAGE的强大。在最近的一个方法比较研究中,CAGE也表现最佳。但是作者同时也说到,仅使用5ʹ末端捕获测序鉴定出的TSS位点假阳性比较多,建议结合其他独立的方法进一步验证,如DNase I测序或H3K4me3染色质免疫共沉淀测序 (ChIP-seq)。

使用唯一分子标识符来检测PCR重复

RNA-seq数据通常有较高的重复率 (duplication rates),即许多测序序列会比对到转录组的相同位置。在全基因组测序中,比对到同一位置的序列被认为是PCR扩增引入的技术噪音,通常只保留1条用于后续分析;而在RNA-seq中,这些重复的序列则因为可能是真实的生物信号而被保留。高表达的转录本在样本中可能有数百万份RNA拷贝,当做为cDNA测序时,产生相同的片段也是合理的。因此,在比对 (alignment)过程中,不建议计算去除比对到同一位置的序列,因为它们代表了真正的生物信号。尤其是在使用单端测序 (single-end sequencing)时更是如此,因为一对片段只要一端序列相同就会被认为是一个重复 (duplicate);而双端测序 (paired-end sequencing)中,片段化的两端必须发生在同样位置才会导致duplicate,而这个的发生概率比较低。但是,在制备cDNA文库时,由于PCR的偏好性,还是会引入duplication reads;很难去评估PCR引入的重复reads和生物重复reads的比例并把其作为一个质控因素校正RNA-seq实验的结果。

UMIs被认为是一个处理扩增偏好性的方法。在cDNA分子扩增前加入随机UMIs可以用于识别并计算移除PCR引入的重复,而不影响到基因自身表达引入的重复,进而改善基因表达定量的结果和评估等位基因的转录。如果一对测序reads包含有相同的UMI并且比对到转录组的同样位置,则被认为是技术引入的重复 (对单端测序来说,这里的一对测序reads是测序生成的两条序列;对双端测序来说,一对测序reads指同时包含左端和右端的两条测序序列)。

UMIs已经被证明能够通过降低检测到的基因表达变化波动和假阳性率改善RNA-seq差异基因的统计分析。因为单细胞数据的扩增偏好更严重,UMI的使用对单细胞数据结果可靠性至关重要。当使用RNA-seq数据进行变异检测 (variant calling)时,UMIs也非常有用。高表达的转录本更容易达到适合变异检测的高覆盖率要求,尤其在考虑了重复reads时,而UMIs可用于移除PCR扩增引入的reads,从而校正等位基因频率的计算。UMIs已成为单细胞RNA-seq (scRNA-seq)的文库制备试剂盒的标配,也越来越多的用于常规RNA-seq。

改善降解了的RNA的分析

RNA-seq文库制备方法的发展也促进了低质量或降解了的RNA的分析,例如从临床获得的福尔马林固定石蜡包埋(FFPE)存储的样本中的RNA。低质量的RNA会导致不均匀的基因覆盖,更高的DGE假阳性率和更高的重复率,与文库的复杂性呈负相关。文库制备方法优化的方向是尽量降低RNA降解的影响。这些方法在开发基于RNA-seq的诊断技术中尤为重要,如类似于基于21个基因RNA特征来预测乳腺癌复发的OncotypeDX试剂盒(尚不基于测序)类似的检测工具。虽然现在有几种方法可以使用,但是比较研究显示两种方法表现最佳,即RNase H与RNA exome。如前所述,RNase H法使用核酸酶消化RNA:DNA复合物中的rRNA,但保留降解的mRNA用于后续测序。RNA exome方法使用寡核苷酸探针来捕获RNA-seq文库分子,非常类似于外显子测序 (exome sequencing)使用的策略。这两种方法应用简单,并都能在保留降解的和片段化的mRNA的前提下降低混入的rRNA的影响,进而获得高质量的和高稳定性的基因表达数据。3ʹ末端标记测序技术与扩增子测序(PCR扩增超过2万个外显子)方法也可以用于分析降解的RNA,但这两种方法并没有RNase H方法应用广泛。

设计更好的RNA-seq实验

好的DGE RNA-seq实验设计对获取高质量和有生物意义的数据是至关重要的。特别需要考虑的是生物重复的数目、测序深度、采用单端还是双端测序。

生物重复与统计检出力 (replication and experimental power)
实验中必须包含足够的生物学重复以捕获组内样品自身存在的生物差异。定量分析的可信度更多地取决于生物重复,而非测序深度或reads长度。尽管RNA-seq的技术稳定性高于微阵列平台,但生物系统固有的随机变异要求进行常规RNA-seq实验必须要重复一次。额外的重复能够帮助发现异常样品;并且在后续分析前,如有必要时移除或降低异常样品的权重。确定最佳重复数需要仔细考虑几个因素,包括预期的最小变化幅度 (effect size)、组内变异、可接受的假阳性和假阴性率以及最大能用于实验的样本量,并且可以通过使用RNA-seq实验设计工具或统计功效工具进行辅助设计。(http://www.biostathandbook.com/power.html )

样品生物学重复数据选择 1必要性 2需要多少重复?

确定实验的正确重复数并不总是那么容易。一项48个重复的酵母研究表明,当分析中仅包含3个重复时,许多用于DGE分析的工具仅检测到20-40%的差异表达基因。该研究表明,至少应使用六个生物重复,这大大超过了RNA-seq文献中通常报道的三个或四个重复。最近的一项研究表明,四个重复可能就足够了,但它强调了测量生物学差异的必要性-例如,在确定出重复数之前先进行预实验。对于高度多样化的样本(例如来自癌症患者肿瘤的临床组织),可能需要进行更多重复才能检测出高可信度的变化。

确定最佳测序深度
RNA-seq文库构建好后,就需要确定测序深度了。测序深度是指每个样品获得的测序序列数量。对于真核基因组中的bulk RNA DGE实验,通常需要每个样品大约10–30百万条测序reads。但是,多个物种的比较分析表明,对于最高表达的50%的基因来说,每个样本只需要测序1百万条 reads就可以获得与测序3千万条reads相似的表达定量结果。如果只关注最高表达的基因相对大的表达变化,并且有合适的生物学重复,那么较少的测序就足以产生驱动后续实验的假说。测序完成后,估计的测序深度可以通过检查样品之间reads的分布和绘制饱和度曲线验证,并且饱和曲线还可以评估加测是否能提高检测敏感性。随着测序仪测序通量的增加,将一个实验的所有样品混合到一起同时上机测序(甚至在同一个lane里面测序)是控制技术偏差的标准做法。总产出reads数是样本数与每个样本期望获得的reads数的乘积;如果有必要,混合的文库测序足够多的次数以达到所需的总reads数。混样测序需要仔细测定每个RNA-seq文库的浓度,并假定混合的不同样品中cDNA的总量相差不大(低方差),因此读取的总reads数才能均匀地分到各个样品中。在进行昂贵的多通道混合测序之前,运行单个lane确认样品之间cDNA总量相差不大是值得的预操作。

选择测序参数:reads长度和单端或双端测序
最后需要确定的测序参数包括reads长度以及是生成单端还是双端reads。

在许多测序应用中,测序reads的长度对数据可用性有很大影响,更长的测序reads可以覆盖更多的测序DNA。当使用RNA-seq鉴定DGE时,影响数据的可用性的重要因素是确定每个reads来自转录组中哪个基因的能力。一旦可以明确地确定reads位置,测序更长的reads在基于定量的分析中就没必要了。对于更定加性的RNA-seq分析(例如鉴定特定isoforms),更长的reads可能会更有帮助。

单端测序与双端测序的问题类似。在单端测序中,每个cDNA片段的一个末端(3′或5′)用于产生测序reads,而双端测序中每个片段产生两个测序reads(一个3′和一个5′)。在需要测序尽可能多核苷酸的实验中,首选long-read paired-end测序。在DGE分析中,用户只需要计算比对到转录本的reads数即可,故不需要对转录本片段的每个碱基都进行测序。例如,将“短”的50 bp的单端测序与“长”的100 bp的双端测序的DGE分析比较表明单端测序也可以获得一致的结果。这是因为单端测序足以确定大多数测序片段来源的基因。相同的研究还表明,短的单端测序会降低检测转录isoform的能力,更少的reads会跨越exon-exon junction。双端测序还可以帮助消除序列比对 (read mapping)的歧义,适用于可变外显子定量 (alternative-exon),融合转录本检测和新转录本发现 ,尤其在注释较差的转录组应用中效果明显。

实际上,单端或双端测序的选择通常取决于成本或用户可用的测序技术。在发布Illumina NovaSeq之前,在大多数情况下,单端测序每百万条reads的成本要低于paired-end测序,因此在相同的实验成本下,可以测序更多的重复或测序更深。如果需要在获取大量较短的单端reads与生成较长和/或双端的reads之间进行选择,则测序深度的增加将对提高DGE检测的敏感性更重要。

RNA-seq数据分析

在过去的十年中,用于分析RNA-seq以确定差异表达的计算方法的数量已成倍增加,即使对于简单的RNA-seq DGE,在每个阶段的分析实践中也存在很大差异。而且,每个阶段使用的方法的差异以及不同技术组合形成的分析流程都可能会对从数据得出的生物学结论产生重大影响。最优工具组合取决于研究的特定生物学问题以及可用的计算资源。尽管有多种衡量方式,但我们对工具和技术的评估落脚点在它们鉴定出的差异基因的准确性。为了完成这个评估,至少需要四个不同的分析阶段(图2;表2)。第一阶段把测序平台生成的原始测序数据比对到转录组。第二阶段量化与每个基因或转录本来源的reads数量,构建表达矩阵。该过程可能包括1个或多个子过程如比对,组装和定量,或者它也可以一个从读取计数生成表达矩阵。通常有一个第三阶段,包括过滤低表达的基因和至关重要的移除样品间技术差异的标准化过程。DGE的最后阶段是构建样本分组和其它协变量的统计模型,计算差异表达置信度。

Xnip2019-11-08_12-31-10.png

图2

第1阶段-测序reads的比对和组装

测序完成后,分析的起点是包含测序碱基的FASTQ文件。最常见的第一步是将测序reads比对到已知的转录组(或注释的基因组),将每个测序reads转换为一个或多个基因组坐标。传统上,该过程是通过几个不同的比对工具(如TopHat,STAR或HISAT)完成的,其都依赖参考基因组的存在。由于测序的cDNA来自RNA,可能跨越外显子边界,因此与参考基因组(包含内含子和外显子)比对时需要进行剪接比对,即允许reads中出现大片段gap。

如果没有可用的包含已知外显子边界的高质量基因组注释,或者如果希望将reads与转录本(而不是基因)相关联,则需要在比对后执行转录组组装步骤。诸如StringTie和SOAPdenovo-Trans之类的组装工具使用比对reads的gap来推测外显子边界和可能的剪接位点。转录本重头组装特别适用于参考基因组注释缺失或不完整的物种,或者对异常转录本感兴趣(例如在肿瘤组织中)的研究。转录组组装方法受益于双端测序和/或更长的reads的使用,增加跨越splice junctions的可能性。但是,通常不需要从RNA-seq数据中从头做转录组组装来确定DGE (生信宝典注:无参分析组装是必须的)。

最近,涌现了一些计算效率高的“alignment free”工具,例如Sailfish,Kallisto和Salmon,它们将测序reads直接与转录本关联,而无需单独的定量步骤。这些工具在定量高丰度(以及长度更长)的转录本方面表现出很好的性能。但是,它们在定量低丰度或短转录本方面不够准确。(39个工具,120种组合深度评估 (转录组分析工具哪家强))

不同的比对工具如何分配ambiguous reads的策略会影响最后的表达估计。对于可能来自多个不同基因、假基因或转录本的多映射reads (multi-map),这些影响尤为明显。对12种基因表达估计方法的比较显示,某些比对方法低估了许多临床相关基因的表达,这主要取决于对ambiguous reads的处理。在RNA-seq数据的计算分析中,对如何正确分配比对到多个位置的reads进行模型探索仍然是研究的一个重点领域。一种常见的做法是在定量前过滤掉这些reads,但这会导致结果产生偏差。其他方法包括生成包含合并映射重叠区域的“融合”表达特征,以及计算每个基因的映射不确定性估计,以用于后续的置信度的计算。

第2阶段-定量转录本丰度

将reads比对到基因组或转录组后,下一步就是将它们分配给基因或转录本,获得表达矩阵。不同的比较研究表明,定量过程中采用的方法对最终结果的影响最大,甚至比比对工具影响更大。单个基因(即该基因的所有转录亚型)的定量是基于转录组注释计算与已知基因重叠的reads数。但是,把短reads分配到特定isoforms则需要统计模型估计,尤其是很多reads不跨越剪接点,并且不能明确分配给特定isoform时。即使在仅研究基因水平差异表达的情况下,定量isoform的差异也会获得更准确的结果,尤其是基因在不同条件下主要表达不同长度的isoform时。例如,如果某个基因的一个isoform在一个样品组中的长度是另一样品组中的isoforms的一半,但表达速率是后者的两倍,则纯基于基因的定量将无法检测到这一表达差异。

常用的定量工具包括RSEM,CuffLinks,MMSeq和HTSeq,以及上述的无比对直接定量工具。基于reads计数的工具(例如HTSeq或featureCounts)通常会丢弃许多比对的序列,包括那些具有多个匹配位置或比对到多个表达特征的reads。这可以在随后的分析中消除同源和重叠的转录本。RSEM使用期望最大化模型来分配模糊的reads,而无参考的比对方法(例如Kallisto)则将这些reads用于后续的定量,这可能会导致结果偏差。转录本丰度估计可以转换成等效的read计数,能完成这一转换的部分工具依赖tximport包。量化步骤结束后会得到一个合并的表达矩阵,每个表达特征(基因或转录本)各占一行,每个样品各占一列,中间的值是实际读数 (reads count)或估计的表达丰度。

第3阶段-过滤和标准化

通常,基因或转录本的reads count需要进行过滤和标准化,以移除测序深度、表达模式和技术偏差的影响。过滤去除在所有样本中都低丰度表达的基因是很直接的方式,并且已经证明可以改善对真正差异表达基因的检测。标准化表达矩阵的方法要复杂一些。简单的转换可以校正丰度,降低GC含量和测序深度的影响。如今人们已经认识到诸如早期应用的RPKM之类的方法是不够的,并已被能够校正样本之间更细微差异的方法所替代,例如四分位数或中位数归一化。(什么?你做的差异基因方法不合适?)

比较研究表明,normalization方法的选择可能对最终结果和生物学结论有重要影响。大多数基于计算的标准化方法依赖于两个关键假设:首先,大多数基因的表达水平在生物重复中变化不大;第二,不同的样本组总的mRNA水平没有显著差异。而当这些基本假设不成立时,就需要仔细考虑是否以及如何执行标准化了。例如,如果一组特定的基因在一个样品组中高表达,而相同的基因加上另一组基因在另一个样品组中表达,那么简单地标准化测序深度是不合适的,因为在第二个样本组中相同数目的reads会分给更多数目的基因。标准化方法如edgeR所使用的的M-值的加权截尾均值 (trimmed mean of M-values , TMM)可以处理这一情况。确定合适的标准化方法是困难的;一种选择是尝试使用多种方法进行分析,然后比较结果的一致性。如果结果对标准化方法高度敏感,则应进一步探索数据以确定差异的来源。必须注意,这一比较不会被用于选择与原始假设吻合的结果的归一化方法。

解决此类问题的一种方法是使用spike-in对照RNA-即在文库制备过程中引入预定浓度的外源RNA序列。RNA-seq常用的spike-in有 External RNA Controls Consortium mix (ERCCs),spike-in RNA variants (SIRVs)和sequencing spike-ins (Sequins)。由于spike-in的RNA浓度是预先知道的,并且浓度与产生的reads的数量直接相关,因此可以校准样品中转录本的表达水平。有人认为,如果没有spike-in对照,则不能正确地分析总体表达变化较大的项目。然而,在实践中,可能难以始终如一地以预设水平掺入spike-ins ,并且它们在标准化基因水平上的reads计数时比在转录本水平上更可靠,因为单个isoform可以在样品中以显着不同的浓度表达。目前,尽管已发表的RNA-seq DGE实验中spike-in对照并未得到广泛使用,但随着单细胞实验的开展这一状况可能会改变,因为单细胞RNA-seq中spike-in应用广泛,当然前提是这个技术能进一步优化达到稳定的水平。

第4阶段-差异表达分析

获得表达矩阵后,就可以构建统计模型评估哪些转录本发生了显著的表达改变。有几个常用工具可以完成此任务;一些基于基因水平的表达计数,其它的基于转录本水平的表达计数。基因水平的工具通常依赖于比对的reads计数,并使用广义线性模型来进行复杂实验设计的评估。这些工具包括EdgeR,DESeq2和limma + voom等工具,这些工具计算效率高并且彼此之间结果稳定性好。评估差异isoforms表达的工具,例如CuffDiff,MMSEQ和Ballgown,往往需要更多的计算资源,并且结果的变化也更大。但是,在差异表达工具应用之前的操作(即关于比对、定量、过滤和标准化)对最终结果的影响更大。

Xnip2019-11-08_12-32-10.png

其它非bulk RNA分析

来自组织和/或细胞群体的RNA-seq彻底革新了我们对生物学的理解,但是它无法简单地用于解析特定的细胞类型,并且不能保留空间信息,这些对于理解生物系统的复杂性都是至关重要的。使用户能够处理非bulk RNA的方法与标准RNA-seq protocols非常相似,但是可以解决的问题却截然不同。单细胞测序已经揭示了在过去我们认为研究透彻的疾病中存在着未知的细胞类型,例如发现肺离子细胞 (ionocyte cells),这可能与囊性纤维化的病理学机制有关。空间分辨率的RNA-seq对实体组织中细胞间相互作用也有了新的发现,例如揭示成年心脏组织中存在一小部分胎儿标志物基因表达的细胞群体。在可预见的将来,Bulk RNA-seq将仍然是占主导地位且有价值的工具。但是,单细胞实验和分析方法正在被研究人员迅速采用,并且随着空间RNA-seq方法的成熟,它们也有可能成为常规RNA-seq工具的一部分。两种方法都将提高我们探究多细胞生物复杂性的能力,并且可能都需要与bulk RNA-seq方法结合使用。在这里,我们简要介绍了主要的单细胞和空间分辨转录组方法,它们与bulk RNA-seq的区别以及用户需要考虑的新问题。

Xnip2019-11-08_12-33-12.png

图3

单细胞分析
scRNA-seq最早于2009年报道,方法是在含有裂解缓冲液的Eppendorf管中分离单个卵母细胞。其在新生物学问题的应用,以及可用的实验和计算方法发展之快以至于最新的综述也迅速过时了。每种scRNA-seq方法都需要解离实体组织,分离单个细胞(使用非常不同的方法),并对其RNA进行标记和扩增以进行测序,并且所有步骤都脱胎于bulk RNA-seq protocols。(单细胞转录组教程汇总)

机械分解和collagenase及DNase的酶解在单细胞悬浮液中产生的活细胞比例最高,但是这一比例具有高度组织特异性,最好根据经验确定,并且要非常小心。一旦制备了单细胞悬液,就可以通过各种方法分离单个细胞(图3a);由于大多数实验室都可以使用流式细胞仪,因此最容易获得的方法是将细胞直接分选到含有裂解缓冲液的微量滴定板中。对于更高通量的实验,存在多种用于分离细胞的技术,但需要构建或购买特定的单细胞仪器。单个细胞可以在微流体芯片中进行物理捕获,或按照泊松分布模型加载到纳米孔设备中,也可以通过基于液滴的微流控技术(例如在Drop-Seq,InDrop中)分离单细胞并与后续反应试剂包裹在一个液滴中,或者采用原位序列条形码标记(例如单细胞组合索引RNA测序(sci-RNA-seq)和基于分池连接的转录组测序(split-pool ligation-based transcriptome sequencing, SPLiT-seq))。单细胞分离后会被裂解释放RNA到溶液中以进行cDNA合成,并用于RNA-seq文库制备。通常在文库制备过程中会使用PCR扩增单个细胞的RNA。这一步扩增会引入PCR偏差,需要使用UMI进行校正。尽管由于逆转录过程符合Poisson采样分布,但只有10–20%的转录本会被逆转录,限制了转录本检测的敏感性,不过各种方法都可以生成可用的数据。在湿实验室之外,计算方法也在迅速发展,并且最近出现了关于scRNA-seq实验的设计指南。方法学的飞速发展意味着scRNA-seq方法的技术会快速过时。尽管如此,Ziegenhain等人提供了scRNA-seq方法的综述,强调了UMI在数据分析中的重要性,并展示了所比较的的六种方法中哪一种最敏感。但是,他们的研究不包括被广泛采用的10X Genomics技术。

用户选择scRNA-seq方法时应考虑的主要因素包括他们是否需要测序全长转录本,测序更多细胞(广度)或每个细胞测序更深获得更多转录本(深度)和实验预算之间的权衡。全长scRNA-seq方法通常具有较低的通量,因为每个细胞需要独立处理直到获得最终的scRNA-seq库。然而,这一方法允许用户研究可变剪接和等位基因特异性表达。非全长检测方法只测序转录本的3’或5’末端,这在检测isoforms表达时会受限,但是由于在单个细胞cDNA合成后可以pool到一起,因此可以分析的细胞数量要高出2-3个数量级。单细胞测序的广度是指同时测序的细胞、组织或样品的数量,而深度是指给定数量的测序reads可分析覆盖多少转录本。尽管实验中能测序的细胞数量是由选择的方法决定的,但它确实具有一定的灵活性,随着所分析的细胞数量的增加,增加的测序成本通常会限制转录组测序的深度。因此,可以根据广度和深度这两个维度来评估不同的scRNA-seq系统。通常,基于X孔板 (plate-based)的方法或微流控方法通常捕获最少的细胞,但每个细胞检测更多的基因,而基于液滴的系统可用于分析最大数量的细胞,如有的项目一次分析超过一百万个细胞。

scRNA-seq的发展正在推动大规模的细胞图谱项目,以期确定生物体或组织中所有细胞类型。Human Cell Atlas和NIH Brain Initiative项目分别对人体和大脑中存在的所有细胞类型进行测序。The Human Cell Atlas旨在在第一阶段对3千万至1亿个细胞进行测序,并且随着技术的发展,其广度和深度将不断增加。该项目的最新成果包括发现肺离子细胞 (ionocyte cells),以及发现儿童和成人的肾脏癌起源于不同细胞类型。但是,研究者应该意识到scRNA-seq技术几乎可以应用于任何生物体。最近,对拟南芥根细胞原生质体的单细胞分析表明,即使植物细胞坚硬的细胞壁都不是分离单细胞并且进行测序的障碍。scRNA-seq正在迅速成为生物学家工具箱的标配,并可能在10年内像今天的bulk RNA-seq一样广泛使用。

空间分辨的RNA-seq方法
当前的bulk和scRNA-seq方法为用户提供了有关组织或细胞群体的高度详细的数据,但都没有保留细胞的空间位置信息,这降低了确定细胞所处环境与基因表达之间关系的能力。实现空间转录组学研究方法的两个技术是“空间编码” (spatial encoding)和“原位转录组学” (in situ transcriptomics)。空间编码方法在RNA-seq文库制备过程中记录空间信息,方法是分离空间固定的细胞 (spatially restricted cells)(例如通过激光捕获显微切割(LCM)),或根据分离前的位置加入条形码编码 (从组织切片中捕获mRNA)。原位转录组学方法是在组织切片内的细胞进行RNA进测序或RNA成像获得表达数据。我们推荐对此感兴趣的读者阅读最近的相关综述以获得更多了解。

LCM配合RNA-seq已成功从组织切片中分离和测序单个细胞或特定区域。尽管需要专用设备,但LCM在许多机构中广泛可用。尽管它可以实现高空间分辨率,但是却很费力,因此很难做大规模。在Spatial Transcriptomics(美国10X Genomics公司)和Slide-seq方法中,采用寡核苷酸芯片 (oligo- arrayed microarray slides)和布满寡核苷酸的凝珠 (densely packed oligo-coated beads) 直接从冷冻组织切片中捕获RNA进行测序。寡核苷酸包含spatial barcode,UMI和oligo-dT引物,可唯一识别每个转录本及其位置。测序reads比对回玻片坐标获得空间基因表达信息。已经证明,Spatial Transcriptomics可用于多种物种的组织,包括小鼠脑和人乳腺癌组织、人心脏组织和拟南芥花序组织。Slide-seq是一项最新开发的技术,已显示可用于小鼠大脑的冷冻切片分析。这些直接的mRNA捕获方法不需要专门的设备,具有相对简单的分析方法,并且可能大规模应用于许多组织。但是,有两个重要的问题有待解决。首先,该技术只能应用于新鲜的冷冻组织。其次,分辨率受到芯片大小和寡核苷酸凝珠间距的限制;当前应用的芯片大小分别为6.5×7 mm和3×3 mm,限制了可以检测的组织切片的大小。Spatial Transcriptomics的凝珠直径为100 µm,间隔为100 µm,这意味着它们不够小或不够密,以致无法实现单细胞分辨率。Slide-seq的凝珠 (beads)小得多,直径仅为10 μm,并且堆积致密,提供了十倍的空间分辨率,大约一半的beads可以获得单个细胞数据。计算整合分析组织消化分离后scRNA-seq与空间编码数据可以提高分辨率,但是还需要随着技术的发展这才能成为常规的RNA-seq工具。

能替代上述空间分辨RNA-seq方法的技术包括原位测序和基于成像的单分子荧光原位杂交技术。与RNA-seq方法相比,这些方法产生的转录组谱更窄(能检测的转录本更少),但可直接检测RNA,而靶向方法则可分析低丰度转录本。同时,它们提供有关组织结构和微环境的信息,并可生成亚细胞数据。虽然取得了很多进展,但基于成像的方法的主要局限性是对高分辨率或超高分辨率显微镜与自动流控相结合的需求,以及成像所花费的时间可能长达数小时,甚至数天。相较于测序成本以快于摩尔定律预测的速度下降,让基于成像的系统能进行高通量分析处理的机会却很有限。

目前,上述所有提到的空间转录组学方法都受到无法生成深度转录组数据、细胞分辨率和/或成本(时间和/或金钱)非常高的限制,但是相关方法正在迅速改进,并且已经应用于临床样品。用于空间组转录组学分析的特定计算方法开始出现。此外,原位RNA测序和基于成像的方法的进步已使获得10^3至10^5个细胞的转录组数据成为可能,这于基于液滴的单细胞方法可获得的细胞量相似。未来的发展可能会使空间转录组学可以被更广泛的用户使用。但是,大多数用户可能不太需要真正的单细胞或亚细胞分辨率。这样,对检测更多转录本的需求和对广泛的组织或样品的适用性可能会推动这些技术在特定领域的发展。如果可以克服空间转录组技术的这些局限性,那么它可能会被广泛采用。

非稳定状态RNA的分析

DGE研究使用RNA-seq来测量稳态mRNA水平,这是通过平衡mRNA转录、加工和降解的速率来维持的。但是,RNA-seq也可用于研究转录和翻译的过程和动态变化,这些研究为基因表达研究提供了新的视角。

捕获新生RNA测量活跃转录
基因表达实质上是一个动态过程,DGE分析无法检测复杂转录响应过程中的细微和快速变化,也不能鉴定不稳定的非编码RNA(例如增强子RNA)。RNA-seq可用于定位TSS并定量正在转录的新生RNA,从而能够研究RNA动力学。但是,与DGE分析相比,新生RNA的研究具有挑战性,因为它们的半衰期短且丰度低。因此,了解RNA动力学的重要性催生了多种分析新生RNA研究方法。这些方法揭示了启动子的不同转录程度,转录激活状态的RNA聚合酶II(Pol II)在启动子近端的停留是基因表达调控的关键步骤,新生RNA可以直接调节转录,并且它的序列和结构影响转录延伸、暂停和停滞 (stalling),以及染色体修饰酶和增强子RNAs的结合。旨在区分新转录的RNA和其他RNA的新生RNA-seq方法可以大致分为三类:run-on方法,基于Pol II免疫沉淀(IP)的方法和代谢标记方法(图4)。

Xnip2019-11-08_14-38-09.png

图4

Run-on方法依赖于转录时掺入核苷酸类似物,用于从总RNA中富集新生RNA,并可以测量RNA瞬时转录(图4a)。Global run-on sequencing(GRO-seq)和precision nuclear run-on sequencing(PRO-seq)通过在转录过程中分别将5-溴尿苷5′-三磷酸(BrU)或生物素标记的核苷酸掺入新生RNA中来实现这一目标。在添加外源生物素标记的核苷酸并恢复转录之前,分离细胞核并洗去内源核苷酸。测序免疫沉淀或亲和层析富集的新生转录本可以确定转录组范围内活性转录的RNA聚合酶的位置和活性。取决于转录时掺入的标记核苷酸的数量,GRO-seq只能达到10-50 bp的分辨率,这降低了TSS定位的精度。PRO-seq可实现单碱基分辨率的定位,因为在生物素核苷酸掺入后转录会停止,从而可以确定掺入位点。Run-on方法在概念上很简单-仅将掺入修饰了的核苷酸的RNA分子富集用于测序,但实际上,背景非新生RNA的存在会增加所需的读取深度。这些方法的使用揭示了在启动子上发散或双向转录起始的程度,并确定了增强子RNA在调节基因表达中的作用。通过结合对5′-帽RNA的特异性富集,GRO-cap,PRO-cap或小的5′-帽RNA测序(START-seq)提高了检测转录起始的敏感性和特异性和捕获可能在转录过程中被加工去除的RNA,减少转录后加帽的RNA产生的背景信号。

Pol II IP方法,例如native elongating transcription sequencing (NET-seq) 和native elongating transcript sequencing for mammalian chromatin (mNET-seq),使用anti-FLAG (for FLAG-tagged Pol II) 或其它结合Pol II C末端功能域(CTD)的各种抗体拉下Pol II相关的RNA。尽管非新生的Pol II结合的RNA和背景mRNA会导致更高的测序深度并混淆分析,但富集测序与这些染色质复合物相关的新生RNA可用于绘制TSS位点。NET-seq可能特异性较低,与Pol II强相关的任何RNA都可能污染新生RNA的富集,NET-seq数据中存在的tRNA和小核仁RNA可以说明这一点。在mNET-seq中使用的多种CTD抗体揭示了CTD修饰调控转录的机制,检测RNA加工中间体并能够将特定Pol II的新生RNA定位于TSS。然而,这些能力是以更复杂的实验为代价的,需要更多的细胞和更高的总体测序成本。

用核苷酸类似物4-硫尿苷(4 sU)进行代谢标记 (metabolic pulse-labelling)可以鉴定新生的RNA(图4c)。但是,在需要较长标记时间的方法中,大多数转录本都会被标记,限制其灵敏度。通过特异地靶向RNA的3′末端(即最接近RNA聚合酶的新转录的RNA),瞬时转录组测序(TT-seq)和硫醇(SH)-连接的烷基化RNA代谢测序(SLAMseq)减少5’RNA的信号。TT-seq将标记时间限制为5分钟,以便仅标记新转录本的3′末端,并且在生物素亲和纯化之前增加RNA片段化步骤以富集标记的RNA。SLAM-seq整合了3′mRNA-seq文库制备(尽管它也可以使用其他文库制备方法,例如miRNA文库),只测序标记了的新转录的RNA,而不是整个转录本。另外,在SLAM-seq中,在RNA提取后加入碘乙酰胺,用于烷基化整合到新生的RNA中的4 sU残基。这一修饰诱导了逆转录依赖的胸腺嘧啶至胞嘧啶(T> C)核苷酸转换,在测序分析中会被检测为“突变”,从而直接鉴定出4 sU整合位点。但是,低整合率意味着只有少数4 sU位点被转换为了胞嘧啶,限制检测敏感性。TUC-seq和TimeLapse-seq这两种方法也使用T> C突变分析,但不富集3’末端。他们已用于探索细胞干扰后的转录响应和测量RNA半衰期。

用于新生RNA分析的方法尚未直接做过比较。检测新生RNA的测序方法都受到非特异性背景和/或降解的RNA混入的负面影响,使得测序需要更高的深度。通过仅测序RNA 3′末端,PRO-seq,TT-seq和SLAM-seq中非新生RNA的影响会被降低,但是几乎没有证据表明任何一种方法会优于其他方法。亲和层析捕获比较费力,并且需要比代谢标记法更高的起始RNA,但是确定标记 (pulse-labelling)所需的时间很复杂,标记时间短时后续用于分析的RNA也会少,限制了检测敏感性。近来组织特异性RNA标记技术和用于“突变”分析的新计算方法的发展,可能会促使用户对新生RNA和其他RNA的检测从生化(基于生物素的)富集转换为生信富集。新生RNA检测方法的进一步发展以及它们与其他方法(例如空间转录组或RNA–RNA和RNA–蛋白质相互作用方法)的结合,将使我们对转录过程有更深入的了解。

核糖体图谱定量活性转录
RNA-seq的主要重点在于分析样品中现存的mRNA的种类和数量,但是mRNA的存在并不直接对应于蛋白质的产生。两种方法-多聚核糖体图谱 (polysomal profiling)和Ribo-seq技术允许我们跳出转录研究翻译组。核糖体翻译mRNA是受到高度调控的,蛋白质水平主要由翻译活性决定。Polysomal profiling和Ribo-seq帮助研究一个转录本上结合了多少核糖体及它们在转录本上的分布规律(图5)。这允许我们推断在特定时间或细胞状态下哪些转录本正在活跃翻译。两种方法均假设mRNA上的核糖体密度与蛋白质合成水平相关。样品比较分析发现在发育过程中或翻译失调相关疾病中,如纤维化,阮病毒病或癌症,处理前后随着时间推移的核糖体动力学。

Xnip2019-11-08_14-38-55.png

图5

Polysome profiling多核糖体分析使用蔗糖梯度超速离心法将多个核糖体结合的mRNA (polysomal fraction)与单个或无核糖体结合的mRNA (monosomal fraction)分离分别用于RNA-seq文库制备(图5a)。在polysomal fraction比monosomal fraction中检测到更高丰度的mRNAs翻译活性更高。该方法不仅可以推断单个mRNA的翻译状态,还可以生成核糖体占有率和密度的高分辨率图谱(尽管它无法确定核糖体的位置)。后续也对原始方法进行了一些改进。例如,使用非线性蔗糖梯度改善了在不同浓度蔗糖溶液临界浓度处多聚核糖体mRNA的收集;应用Smart-seq文库制备方法可以检测低至10 ng的多聚核糖体mRNA;使用更高分辨率的蔗糖梯度和深度测序允许检测转录本异构体特异性翻译。然而,多核糖体谱分析只能产生相对低分辨率的翻译谱,并且是需要专门设备,限制了其广泛使用。

Ribo-seq基于RNA印记,最初是在酵母中开发。它使用环己酰胺抑制翻译延伸进而导致核糖体停滞在mRNA上。用RNase I消化mRNA会留下核糖体保护的20–30个核苷酸印记,用于后续构建RNA-seq文库(图5b)。Ribo-seq可以获得高分辨率翻译谱,同时检测单个转录本上核糖体丰度和定位。能够获得多聚核糖体分析无法检测到的核糖体在转录本上位置的分布,意味着可以检测到影响蛋白质表达调控的翻译暂停事件 (translation pausing)。Ribo-seq技术的优化包括缓冲液和酶的优化,可以更清楚地揭示Ribo-seq数据的3 bp周期性,以及barcode和UMI的使用可以确定单分子事件。尽管最近开发了用于寻找开放阅读框,用于差异或isoforms水平翻译分析和用于研究密码子偏好性的特定工具,但标准RNA-seq工具仍可用于计算分析。Ribo-seq的主要局限性在于依赖超速离心和由于核酸酶批次间活性的差异需要凭经验确定消化条件。

前面提到的方法不能区分翻译起始、延伸和终止的信号,但是对Ribo-seq的改进使得可以对翻译动力学进行进一步研究。定量翻译起始测序(QTI-seq)通过化学“冻结”富集起始核糖体,同时从相关mRNA中去除延伸核糖体来定位翻译起始位点 (生信宝典注:原文写的是maps transcription initiation sites,应该是笔误)。在组装成熟核糖体之前,Translation complex profile sequencing (TCP-seq)通过富集与成熟核糖体RNA组装前的40S核糖体小亚基结合的RNA来定位翻译起始位点。同时,由于这种方法保留了核糖体的完整性,因此也可以分析和比较80S核糖体部分,从而获得更完整的翻译动力学分析(图5b)。

所有的翻译组方法在概念上都是相似的;他们假设mRNA核糖体密度与蛋白质合成水平相关。尽管它们的样品制备方案不同,但是都需要大量的起始细胞。最终,可能需要将它们与RNA-seq结合以了解基因表达水平,并与蛋白质组学结合以确定蛋白质水平,才能全面了解mRNA翻译。如果想详细了解翻译组分析,文中也推荐了其它综述。

超越基因表达分析

RNA在其他生物分子和生物过程(例如剪接和翻译)的调控中起着重要作用,这些过程涉及RNA与各种蛋白质和/或其他RNA分子的相互作用。RNA-seq可用于探究分子内和分子间RNA-RNA相互作用(RRI),或RNA与蛋白质的互作,从而可以更深入地了解转录和翻译过程(图6)。为互作组 (interactome)分析而开发的各种方法都有一个共同点:富集相互作用的RNA。一些方法利用了天然的生物相互作用,另一些方法则在目标分子之间发生瞬时结合或共价结合。大多数使用抗体,亲和层析或探针杂交来富集用于测序的RNA。在这里,我们简要介绍基于RNA-seq的结构组 (structurome)和互作组 (interactome)。

Xnip2019-11-08_14-39-51.png

图6

通过分子内RNA相互作用探测RNA结构
核糖体RNA和tRNA构成细胞的大部分RNA。它们与其他有特定结构的非编码RNA一起在基因调控到翻译的多种细胞过程发挥作用。用于解析RNA结构的方法主要有两种,分别是基于核酶的方法和化学探针法。核糖核酸酶消化法于1965年首次用于确定(tRNA-Ala)RNA结构。在随后的40年中开发了化学方法,例如基于引物延伸化学分析进行选择性2′-羟基乙酰化法(SHAPE),可以在碱基对分辨率下确定tRNA-Asp的结构。但是,只有将各种核酶法和化学法与RNA-seq结合使用,才能进行全转录组范围而非单个RNA水平的结构分析,这会加深我们关于RNA对结构组复杂性和重要性的理解。在这里,我们着眼于核酶法和化学探针法之间的主要差异(图6a)。请阅读Strobedl的综述做更全面的了解。

核酶法,例如RNA结构并行分析法(PARS, parallel analysis of RNA-structure)和片段测序(FRAG-seq, fragmentation sequencing),使用可以消化单链RNA(ssRNA)或双链RNA(dsRNA)的核酶。核酸酶消化后剩余的RNA用作RNA-seq文库制备。随后通过对所得RNA-seq数据进行计算分析,确定结构化(双链)和非结构化(单链)区域。核酸酶简单易用并允许对ssRNA和dsRNA进行研究,但由于核酸酶消化的随机性,它们的分辨率比化学法要低。此外,核酶的大体型使得它们不能进入细胞,进而不适用于体内研究。

化学映射方法使用与RNA分子反应的化学探针标记结构化或非结构化核苷酸。这些标记可阻止逆转录或导致cDNA误整合 (micincorporation),进而可通过对RNA-seq reads进行测序和分析以获得结构组学结果。SHAPE测序(SHAPE–seq)通过与RNA骨架的核-2′-羟基反应来标记未配对的ssRNA,发夹环中的碱基堆积会降低标记效率。Structure–seq和硫酸二甲酯测序(DMS-seq, dimethyl sulfate )用DMS标记腺嘌呤和胞嘧啶残基,阻断了逆转录,使得能够通过分析所得的截断cDNA推断出RNA结构。SHAPE和突变图谱分析(SHAPE–MaP)和DMS突变图谱分析(DMS–MaPseq)都优化了实验条件提高逆转录酶的合成能力并防止cDNA截断。相反,化学标记会导致误掺入事件,然后使用RNA-seq数据分析这些“突变”以揭示RNA结构。化学探针是小分子,可以在体内研究更具生物学意义的结构体;由于细胞内环境的动态变化,数据的变异度也会高一些。化学法还可以用于进行新生RNA的结构分析,并揭示共转录RNA折叠的顺序。

核酸酶和逆转录阻断法通常产生短RNA片段,并且仅检测单个消化位点或化学标记,而误掺入和突变检测方法每条测序reads可能检测到多个化学标记位点。这些方法都不是没有偏好的, 逆转录阻断效率不会达到100%,诱导突变的化学标记可能会阻断cDNA的合成,这两个因素都会影响数据的分析解释。Spike-in对照可能会提高结构组分析的质量,但尚未得到广泛使用。SHAPE方法的比较揭示了仅在体内实验中明显的效率差异,强调了比较此类复杂方法时需要特殊注意。

这些方法揭示了RNA结构在基因和蛋白质调控机制中的新作用。例如,对DMS数据的分析发现,RNA结构可以调节APA,并可能减缓催化活性区域的翻译,从而为蛋白质折叠提供更多时间减少错误折叠事件。可能需要结合使用多种结构RNA-seq方法才能获得完整的结构组图谱。随着该领域研究的深入,我们可能会发现RNA结构与发育或疾病状态之间的联系。最近的结果表明异常RNA结构在重复扩增导致的疾病中可能有调控作用。最终,结构组分析可以促使开发靶向结构清晰的RNA的小分子,从而开辟疾病治疗药物开发的新领域。

探索RNA–RNA分子间互作 (RRI)
分子间RRI在转录后调控中起重要作用,例如miRNA靶向3’UTR。已经开发的用于研究分子间RRI的工具,可用于靶向和全转录组的分析。这些方法有共同的操作流程,其中RNA分子在断裂和就近自连之前先进行交联固定互作状态(图6b)。通过不同方法生成的大多数(但不是全部)嵌合cDNA源自稳定碱基配对(即相互作用)的RNA分子之间的连接。靶向方法,例如CLASH (crosslinking, ligation and sequencing of hybrids),RIA-seq (RNA interactome analysis and sequencing), RAP-RNA (RNA antisense purification followed by RNA sequencing)可以生成单个RNA的深度相互作用图谱。CLASH可使用IP富集法分析特定蛋白质复合物介导的RRI,而RIA–seq使用反义寡核苷酸pull down与靶标RNA相互作用的RNA。两种方法都不能区分直接和间接RRI,这使生物学解释变得复杂。为了提高RRI分析的分辨率,RAP–RNA使用psoralen和其他交联剂,然后用反义寡核苷酸捕获RNA,并通过高通量RNA-seq检测直接和间接RRI。尽管该方法确实允许进行更特异的分析,但它需要准备多个文库(每种交联剂一个)。

全转录组方法与靶向方法基本相似:相互作用的RNA在体内进行交联并富集。富集通过减少连接反应中携带的非相互作用RNA的量来提高特异性,可以通过2D凝胶纯化富集(如PARIS,psoralen analysis of RNA interactions and structures法中)或使用生物素亲和层析富集( 如 SPLASH,sequencing of psoralen crosslinked, ligated and selected hybrids),或通过RNase R消化去除未交联的RNA(如LIGR-seq,ligation of interacting RNA followed by RNA- seq)。连接后,去交联,然后进行RNA-seq文库制备和测序。PARIS方法产生最大数目的相互作用,但每个样品需要7500万条测序reads,比其他RRI方法要多很多,并且是DGE分析平均测序深度的两倍以上。

整合RNA互作数据分析可以同时对多种相互作用进行探索,并揭示了不同种类RNA的RRI分布的变异。总的来讲,90%的RRI有mRNA参与。近一半有miRNA或长链非编码RNA参与,并且大多数互作都靶向mRNA。这些数据整合比较分析揭示了特定RNA种类在不同方法中存在很大偏好性,这导致方法之间几乎没有检测到共有的互作。因此,要完整了解RRI,可能需要使用不止一种方法。但是,RRI方法存在一些局限性。也许最具挑战性的是RRI是动态的,并且受结构构象和其他分子间相互作用的影响,如果没有重复,结果就很难解释。分子内相互作用为分子间RRI分析增加了噪音,这要求将高度结构化的RNA(例如rRNA)过滤并去除。其他问题包括RNA提取过程中的相互作用破坏,需要稳定的交联方法,但最常用的RRI交联试剂 psoralen和4′-氨基-甲基三氧杂沙仑(AMT)-仅能低效交联嘧啶,降低了方法的敏感性。此外,邻近连接步骤效率低下,并且可能同时连接相互作用和非相互作用的RNA,从而进一步降低了灵敏度。

研究RNA与蛋白质的相互作用。
ChIP-seq已成为探索DNA-蛋白质相互作用的必不可少的工具。一种类似的IP方法可以用于研究RNA与蛋白质的相互作用。RNA与蛋白质的相互作用方法也依靠IP,利用一种针对感兴趣的蛋白的抗体来捕获其结合的RNA进行分析(最初是结合微阵列芯片使用)(图6c)。各种RNA与蛋白质相互作用方法之间最明显的区别是互作的RNA和蛋白质是否进行交联以及如何交联:有些方法避免交联(直接IP),另一些方法则使用甲醛进行交联,而另一些方法则使用紫外线(UV)进行交联。.最简单的方法是RIP-seq( RNA
immunoprecipitation and sequencing ),通常但并非总是使用细胞内未加改造的蛋白的抗体富集,并且不需要RNA片段化处理。其操作简单使得该方法易于采用。RIP-seq可以获得有生物意义的分析结果,但是有两个大的缺点。首先,用于保持RNA与蛋白质相互作用的温和洗涤条件意味着相对高水平的非特异性结合片段也会得以富集。第二,RNA片段化步骤的缺失降低了结合位点的分辨率。因此,RIP-seq结果高度可变,并取决于RNA-蛋白质结合的天然稳定性。使用甲醛交联在RNA及其相互作用的蛋白质之间产生可逆的共价键可以提高稳定性并减少非特异性RNA的pull down,但是甲醛也会产生蛋白质-蛋白质交联。可以通过与0.1%甲醛进行轻度交联(比用于ChIP–seq研究的低10倍)来缓和这种影响,这在在多个蛋白质靶标上获得了高质量的结果。

在CLIP中引入的254-nm UV交联是一项至关重要的改进,它提高了RNA-蛋白质相互作用分析方法的特异性和结合位点鉴定的分辨率。UV交联会在蛋白质和RNA的相互作用位点之间建立共价键,但至关重要的是,不会导致互作蛋白的交联。这样可以稳定RNA与蛋白质的结合,从而允许使用之前会破坏RNA-蛋白互作的更严格的富集操作,减少背景信号。随后,CLIP protocol已成为许多方法开发的基础。单核苷酸分辨率CLIP(iCLIP)将UMI纳入文库制备中以去除PCR重复。同时它还利用交联核苷酸上cDNA合成过程中普遍存在的未成熟终止的优势,通过截断的cDNA扩增获得单核苷酸分辨率的交联位点的定量检测图谱。PAR-CLIP(Photoactivatable- ribonucleoside-enhanced CLIP)通过使用4 sU和356-nm UV交联获得单核苷酸分辨率的RNA-蛋白互作图谱。4 sU在细胞培养过程中被整合进入内源性RNA,而356 nm的紫外线照射仅在4 sU插入位点产生交联(获得高特异性)。在所得序列数据中检测反转录诱导的T>C替换可实现碱基对分辨率的检测解析,并可区分交联片段与非交联片段,从而进一步降低背景信号。对CLIP的最新改进提高了它的效率和敏感性。红外CLIP(irCLIP)采用红外凝胶可视化和基于beads的纯化功能取代了放射性同位素检测。这些改变使得试验操作更简单,而且仅需20,000个细胞 (iCLIP通常需要1-2百万个细胞)就可以进行RNA-蛋白质互作分析。eCLIP (enhanced CLIP)去掉了RNA-蛋白质复合物的质控和可视化过程,将样品barcode与RNA adaptor结合在一起,使多个样品可以更早地混合,并用beads代替凝胶进行片段富集。这些更改旨在简化用户的操作,作为ENCODE项目的一部分,已经针对近200种蛋白质进行了eCLIP实验。但是,irCLIP和eCLIP目前均未得到广泛采用,部分原因是eCLIP和irCLIP敏感性的某些提高可能是由于特异性的降低所致;支持这一结论的是,这两种方法检测到的PTBP1结合位点处结合基序和调控的外显子富集度降低。由于大量公开可用的数据为计算分析提供了新的资源,因此重点考虑CLIP数据的质量控制,过滤,鉴定结合位点 (peak calling)和标准化所采用的方法,这些都会影响数据的生物学解释。对此感兴趣的读者建议继续阅读推荐的综述。

某些RRI方法和所有的RNA-蛋白质的互作检测依赖于IP富集,因此仅能应用于有比较好的结合抗体的蛋白质的分析,而且非特异性抗体结合仍然是一个问题-尽管不只限于该领域。RNA结构也影响RNA与蛋白质的相互作用;一些蛋白质识别特定的RNA二级结构或与这些结构竞争结合RNA,这使体外的发现用于研究体内生物调控变得复杂。此外,RRI和RNA-蛋白质相互作用方法通常检测的是特定转录本或特定位置互作的平均值。实验方法、计算方法和单分子测序的进一步发展可能有助于解析这些内部的生物差异。

结论

Wang,Gerstein和Snyder在他们的预测中认为:RNA-seq将“给真核转录组分析带来革命性变革”。但是,即使他们也可能对技术拓展应用到如此之多的RNA层面感到惊讶。今天,我们可以分析RNA生物学的许多方面,这对功能基因组的理解,研究发育以及引起癌症和其他疾病的分子失调都是必不可少的。尽管生物学发现阶段还远远没有结束,但临床已经在使用基于RNA-seq的检测试验。单细胞测序已成为许多实验室的标配,空间单细胞组学分析随着方法的进一步发展也很可能会遵循类似的发展路径。对大部分的研究者而言,长读长测序方法有可能取代Illumina的短读长RNA-seq作为默认的研究方法。为了使这种情况发生,就增加通量和降低错误率方面,长读长测序技术还需要进行重大改进。如果长读长测序变得与短读长测序一样便宜可靠,那么除了对RNA降解的样品之外,鉴定mRNA isoforms都会首选长读长测序。考虑到这一点,任何关于RNA-seq在未来十年内发展的预测都可能会过于保守。

Prime Editing: Adding Precision and Flexibility to CRISPR Editing

There are over 75,000 pathogenic genetic variants that have been identified in humans and catalogued in the ClinVar database. Previously developed genome editing methods using nucleases and base editors have the potential to correct only a minority of those variants in most cell types. A new technique from David Liu’s lab at the Broad Institute could add more precision and flexibility to the CRISPR editing world.

Xnip2019-10-24_10-47-58.png

This new approach, published in Nature earlier this week, is called prime editing. It’s a “search-and-replace” genome editing technique that mediates targeted insertions, deletions, and all possible base-to-base conversions. And, it can combine different types of edits with one another. All of this is possible without double strand breaks (DSBs) or donor DNA templates. How does this work? First, an engineered prime editing guide RNA (pegRNA) that both specifies the target site and contains the desired edit(s) engages the prime editor protein. This primer editor protein consists of a Cas9 nickase fused to a reverse transcriptase. The Cas9 nickase part of the protein is guided to the DNA target site by the pegRNA. After nicking by Cas9, the reverse transcriptase domain uses the pegRNA to template reverse transcription of the desired edit, directly polymerizing DNA onto the nicked target DNA strand. The edited DNA strand replaces the original DNA strand, creating a heteroduplex containing one edited strand and one unedited strand. Lastly, the editor guides resolution of the heteroduplex to favor copying the edit onto the unedited strand, completing the process.

Let’s examine the parts in more detail.Xnip2019-10-24_10-49-10.png

The prime editor: A fusion between Cas9 and reverse transcriptase

To decrease the components prime editing would introduce into the cell, the team fused the M-MLV reverse transcriptase (RT) with the Cas9 H840A nickase to create the prime editor (PE). They found that orientation matters: fusing the RT to the C-terminus of the Cas9 nickase resulted in higher editing efficiency. They called this complex PE1.

Building upon prior reverse transcriptase research, (Baranauskas et al., 2012; Arezi and Hogrefe, 2009), the Liu lab created and evaluated 19 PE1 variants with RT mutations known to increase activity, enhance binding between the template and primer binding site, increase processivity, or improve thermostability. What came out on top? The Cas9 nickase fused to a pentamutant of M-MLV RT. They called this system PE2, which had prime editing efficiencies on average 2.3- to 5.1-fold (though up to 45-fold) higher across different genomic sites compared to PE1.

The pegRNA: A template and guide all in one

The other important component of prime editing is the prime editing guide RNA (pegRNA). The pegRNA is a guide RNA that also encodes the RT template, which includes the desired edit and homology to the genomic DNA locus. Sequence complementary to the nicked genomic DNA strand serves as a primer binding site (PBS). This PBS sequence hybridizes to the target site and serves as the point of initiation for reverse transcription.

To optimize pegRNAs, the team found that extending the pegRNA primer binding site to at least eight nucleotides enabled more efficient prime editing in HEK293T cells.

Prime Editor 3 (PE3): Resolving mismatched DNA to favor the edit

Once the prime editor incorporates the edit into one strand, there’s a mismatch between the original sequence on one strand and the edited sequence on the other strand. To guide heteroduplex resolution to favor the edit, the Liu lab turned to a strategy they previously used when they developed base editing (Komor, et al, 2016). By nicking the non-edited strand, they can cause the cell to remake that strand using the edited strand as the template.

A third prime editing system called PE3 does just this by including an additional sgRNA. Using this sgRNA, the prime editor nicks the unedited strand away from the initial nick site (to avoid creating a double strand break), increasing editing efficiencies 2-3 fold with indel frequencies between 1-10%.

Advantages of prime editing

Less constrained by PAM sequence location
The prime editor extends the reach of CRISPR genome editing as it can edit near or far from PAM sites making it less constrained by PAM availability like other methods. The PAM-to-edit distance can be over 30 base pairs for prime editing. Since PAM sites occur every ~8 base pairs on either DNA strand, many previously developed base editors (Table 1 from Rees and Liu, 2019) with a <8 base pair editing window cannot edit within what Fyodor Urnov refers to as “PAM deserts” in the genome.

More versatile and precise than base editing (in certain circumstances)
Base editors developed thus far can only create a subset of changes (C->T, G->A, A->G, and T->C). Prime editing allows for all 12 possible base-to-base changes.

Prime editing is also more precise. Base editors, for example, will edit all the C’s or A’s within the base editing window, while prime editors make a specific edit defined by the pegRNA. In cases when bystander editing is unacceptable, prime editors can be used to avoid this possibility.

However, there are instances where traditional base editors are preferred. For instance, if target nucleotides are positioned within the canonical base editing window, base editing has higher efficiency and fewer indels than prime editing. But for positions that aren’t well positioned within the editing window, prime editing is more efficient due to its lower dependence on PAM placement.

Fewer byproducts and more efficient than homology directed repair
Homology directed repair (HDR) stimulated by double strand breaks has been widely used to generate precise changes. However, the efficiency of Cas9 cleavage is relatively high while the efficiency of HDR is relatively low, meaning that most Cas9-induced DSBs are repaired by non-homologous end joining. As a result, Cas9 treatment causes most products to be indels while the efficiency of HDR is typically less than 10%. In contrast, prime editing can offer ~20-50% efficiency in HEK293T cells with 1-10% indels. In other tested cell types, including post-mitotic primary mouse cortical neurons, the authors report lower prime editing efficiencies, but still see much higher ratios of desired edits to indel byproducts than Cas9-initiated HDR.

What’s next for prime editing?

While prime editing is an exciting step towards more versatile genome editing, it’s new at this point and warrants many additional studies. In their paper, the Liu lab points out the need to investigate off-target prime editing in a genome-wide manner, identify any inadvertent effects the prime editors may have on the cells, and assess in vitro and in vivo delivery strategies. It’s exciting to see the amount of discussion on Twitter about prime editing (here, here, and here) and we look forward to seeing what comes next for prime editing.

Application of Prime Editing

Editing from 1 to 44 bases

Prime editing allows point change to maximum 44-nt long knock-in and maximum 80-nt long knock-out.

Though tagging of fluorescence is not possible, the prime editing platform can add a flag tag and a 6-histine tag that would be useful to isolate endogenous protein in native complexes. It is also a solution when an antibody is not available to locate the protein into the cells.

We can also note that the CRE-Lox system requires only 34 nt with 2 recognition regions of 13bp and 1 spacer region of 8bp. Thus, larger insertion can be performed with a successive combination of the prime editing and the CRE-Lox system using CRE mRNA.

KO of up to 26 codon is also possible with prime editing platform.

Knock-in with no donor

CRISPR-CAS9 gene editing leads to KI using a donor template to repair the double stranded break cause by the endonuclease activity of the CAS9. Without a donor, the classic CRISPR-CAS9 system leads only to KO and so lost of function.

Prime editing provides means to generate changes from 1 up to 44 bases without a donor. Thus, a transient expression of the prime editing complex is enough. There is no risk of genotoxicity caused by random insertion of a plasmid donor and the delivery into the cells is simpler and so, it should more efficient. These 2 key points provide interesting therapeutics perspectives for the up to 75000 genetic diseases.

Further readinghttps://www.genengnews.com/insights/genome-editing-heads-to-primetime/

References

Anzalone, Andrew V., et al. “Search-and-replace genome editing without double-strand breaks or donor DNA.” Nature (2019): 1-1. PubMed PMID: 31634902.

Arezi, Bahram, and Holly Hogrefe. “Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer.” Nucleic acids research 37.2 (2008): 473-481. PubMed PMID: 19056821. PubMed Central PMCID: PMC2632894.

Baranauskas, Aurimas, et al. “Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants.” Protein Engineering, Design & Selection 25.10 (2012): 657-668. PubMed PMID: 22691702.

Komor, Alexis C., et al. “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.” Nature 533.7603 (2016): 420. PubMed PMID: 27096365. PubMed Central PMCID: PMC4873371.

Rees, Holly A., and David R. Liu. “Base editing: precision chemistry on the genome and transcriptome of living cells.” Nature reviews genetics 19.12 (2018): 770-788. PubMed PMID: 30323312. PubMed Central PMCID: PMC6535181.

By: https://blog.addgene.org/prime-editing-crisp-cas-reverse-transcriptase

内容介绍:

人类的致病遗传变异既有点突变又有碱基插入缺失突变【1】。2016年,哈佛大学David Liu实验室开发出新型单碱基编辑器CBE(Cytosine base editors),实现了C·G–T·A碱基对的自由转换【2】,2017年底他们又构建出ABE(Adenine Base Editor),可实现A·T–G·C碱基对的转换,这对众多点突变遗传病的治疗有重要意义,因而在基因治疗领域的应用前景相当广阔【3】(详见:突破丨Nature长文发表基因编辑最新成果——无需切割DNA也能自由替换ATGC)。不过,除了C·G–T·A和A·T–G·C碱基转换,对其它类型的碱基突变以及碱基的插入缺失突变,目前依然缺乏有效的研究工具:传统的同源重组修复(HDR)需要外源的双链/单链DNA模板,系统复杂且效率低下,这极大地限制了相关工作的开展,因此开发更加高效且广谱的精准基因编辑工具迫在眉睫。

2019年10月21日,哈佛大学David Liu实验室在Nature杂志上发表了题为Search-and-replace genome editing without double-strand breaks or donor DNA的论文。文章开发出了全新的精准基因编辑工具PE (Prime Editors),新工具PE无需额外的DNA模板便可有效实现所有12种单碱基的自由转换,而且还能有效实现多碱基的精准插入与删除(最多可插入44bp的碱基,可删除80bp的碱基),这一全能性的工具为基因编辑领域带来了重大变革。

新工具PE是以CRISPR-Cas9系统为基础,在两方面加以改造:首先是改造单链引导RNA (sgRNA),其3’末端增加了一段RNA序列,新获得的RNA被称作pegRNA;第二则是将Cas9切口酶(H840A突变型,只切断含PAM的靶点DNA链)与逆转录酶融合获得新的融合蛋白。pegRNA的3’端序列有双重角色,一段序列作为引物结合位点(PBS),与断裂的靶DNA链3’末端互补以起始逆转录过程,另一端序列则是逆转录的模板(RT模板),其上携带有目标点突变或插入缺失突变以实现精准的基因编辑(图1)。

get1.jpeg

图1 改造后的pegRNA结构

基因编辑工具PE的基本原理如图2所示,首先是在pegRNA的引导下,Cas9 H840切口酶切断含PAM的靶点DNA链,断裂的靶DNA链与pegRNA的3’末端PBS序列互补并结合,之后逆转录酶发挥功能,沿RT模板序列开始逆转录反应。反应结束后DNA链的切口处会形成处在动态平衡中的5’-和3’-flap结构,其中3’flap结构的DNA链携带有目标突变,而5’flap结构的DNA链则无任何突变。细胞内5’flap结构易被结构特异性内切酶识别并切除,之后经DNA连接和修复后靶位点处便实现了精准的基因编辑。

get2.jpeg

图2 基因编辑工具PE的基本原理

在经体外验证和酵母中的验证之后,研究者将野生型的鼠白血病病毒(M-MLV)逆转录酶融合在Cas9 H840切口酶的C末端,构建出了第一代精准基因编辑工具PE1,在293T细胞中PE1的点突变效率为0.7~5.5%, 碱基的增加/删除效率则为4~17%,依然有更大的提升空间。之后研究者通过优化M-MLV逆转录酶得到第二代编辑工具PE2,其点突变效率和碱基的增删效率较PE1有两倍以上的提高。

PE1/2系统只编辑双链DNA的一条链,另一条非编辑链需进一步的DNA修复以完成精准编辑。传统上,通过Cas9切口酶切断非编辑链可以有效提高该链的修复效率。为此研究者在PE2的基础上,增加可切断非编辑链的sgRNA,最终获得新的PE3和PE3b系统(图3)。新系统的编辑效率较PE2提升了近3倍,在293T细胞中的最高编辑效率可达78%。当然,由于使用了两条sgRNA,PE3的随机插入缺失(Indels)风险也随之提高,这是PE3未来需要加以改进的不足之处。

get3.jpeg

图3 PE3和PE3b的原理图:PE3增加的sgRNA识别位点是未编辑的基因组DNA,PE3b增加的sgRNA只识别编辑后的基因组DNA。

研究者对不同的工具进行比较后发现,与单碱基编辑工具CBE、ABE相比,PE3/3b的单碱基编辑在效率上略有不如,但能实现更精准的编辑;而与传统的HDR相比,PE3/3b有着更高的编辑效率和更低的Indels风险。此外,PE3/3b系统在U2OS、K562、HeLa三种细胞系以及小鼠皮层原代神经元中均能发挥精准编辑效果,这表明新系统有着广泛的适用性。
总体而言,本研究开发的新工具PE是精准基因编辑领域的重大突破,在单碱基随意转换和小片段多碱基的增删方面潜力巨大,这将极大的推动生物医学的基础研究和临床基因治疗研究。

专家点评:

大约有75000人类基因组位点和遗传疾病有关【4】。随着基因编辑快速发展,直接修改基因组治疗遗传疾病带来了可能性。使用CRISPR/Cas9基因编辑系统在致病位点产生DNA双链断裂,利用细胞的NHEJ或者HDR DNA修复通路,已经在动物模型上实现了治疗的人类遗传疾病的目的【5】。但是在大量细胞中同时诱导产生DNA双链断裂,有可能导致基因组的异位,倒位【6】,激活p53信号【7】等潜在问题,损伤基因组。为了消除这些不利因素,基因编辑大牛David liu与同事将失去催化活性的Cas9与脱氨酶融合,在sgRNA指导下,实现了对相应位点的碱基替换,建立了单碱基编辑系统【8,9】。根据融合的脱氨酶不同分为胞嘧啶单碱基编辑系统和腺嘌呤单碱基编辑系统,可以不产生DNA双链断裂的情况下将基因组中C转变成T, 或者将A转变成G。较高安全性和高效性促使单碱基编辑系统问世以来迅速在应用于动物植物以及人类基因组的编辑研究,并且利用动物模型研究治疗遗传疾病的探索。然而,最近多个实验室报道,单碱编辑工具会导致严重的基因组【10,11】和转录组【12,13】
范围内的脱靶,虽有一些改良版本,可以消除转录组范围内的脱靶【12,14】,但是单碱基编辑系统编辑窗口比较窄,而且仅能编辑C-T, T-C, A-G, G-C,不能做C-A, C-G, G-C, G-T, A-C, A-T, T-A和T-G,也不能做插入替换等编辑,这些问题大大局限了单碱基编辑工具在遗传疾病治疗的应用范围。

近日,David liu 团队在Nature杂志发表了题为“Search-and-replace genome editing without double-strand breaks or donor DNA ”的研究论文,建立了一种被称为Prime editing的基因编辑新系统。在不产生DNA双链断裂,不使用DNA模版的情况下,可以在酵母和哺乳动物细胞靶位点高效产生DNA插入、删除和任意单碱基的替换。

他们将nCas9(H840A)与M-MLV逆转录酶形成融合蛋白,基因组结合序列、骨架序列、新的遗传信息以及单链DNA结合序列共同组建了pegRNA。在pegRNA指导下融合蛋白结合到基因组特定序列,nCas9使用单链切割活性,切开pegRNA非互补链。pegRNA携带的单链DNA结合序列与被切开的非互补链按照碱基匹配规律结合,单链DNA暴露一个3’-自由羟基,逆转录酶与之结合,按照碱基匹配规律以RNA为模版合成DNA。随后,合成后的DNA整合到基因组中,实现了靶向位点高效产生DNA的插入、删除和单碱基替换完成基因编辑的过程,构建了被称为PE1的系统。在PE1的基础上David liu团队又修改了M-MLV逆转录氨基酸序列(D200N+L603W+T330P+T306K+W313F),改善了热稳定性,逆转录连续性以及DNA:RNA结合紧密性,提高了编辑效率,建立了PE2系统。在PE2基础上,在pegRNA下游50bp附近,引入另一条与pegRNA方向相反的sgRNA,达到分别切割非互补链的目的,建立PE3系统,进一步提高了编辑效率。

Prime editing基因编辑系统与以往基因编辑系统相比避免了DNA双链断裂的产生、提高了基因编辑效率,拓展适用范围。然而Prime editing组成构件太大也限制了其在体内的临床应用。此外,逆转录酶作为主要构成要件,在细胞中过量表达,其安全性仍然是一个需要考虑的问题.

1. Rees, H.A. & Liu, D.R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat Rev Genet (2018).
2. Komor AC, Kim YB, Packer MS, Zuris JA, Liu DR. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016).
3. Gaudelli, N.M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017).
4. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res
44, D862-D868, doi:10.1093/nar/gkv1222 (2016).
5. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821, doi:10.1126/science.1225829 science.1225829 [pii] (2012).
6. Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat Biotechnol 36, 765-771, doi:10.1038/nbt.4192nbt.4192 [pii] (2018).
7. Haapaniemi, E., Botla, S., Persson, J., Schmierer, B. & Taipale, J. CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response. Nat Med 24, 927-930, doi:10.1038/s41591-018-0049-z10.1038/s41591-018-0049-z [pii] (2018).
8. Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471, doi:10.1038/nature24644 nature24644 [pii] (2017).
9. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature 533, 420-424, doi:10.1038/nature17946nature17946 [pii] (2016).
10. Zuo, E. et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos.Science 364, 289-292, doi:10.1126/science.aav9973 science.aav9973 [pii] (2019).
11. Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292-+, doi:10.1126/science.aaw7166 (2019).
12. Zhou, C. et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275-278, doi:10.1038/s41586-019-1314-0 10.1038/s41586-019-1314-0 [pii] (2019).
13. Grunewald, J. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433-437, doi:10.1038/s41586-019-1161-z10.1038/s41586-019-1161-z [pii] (2019).
14. Grunewald, J. et al. CRISPR DNA base editors with reduced RNA off-target and self-editing activities.Nat Biotechnol 37, 1041-1048, doi:10.1038/s41587-019-0236-610.1038/s41587-019-0236-6 [pii] (2019).

The Evolution of Phase-Separated TDP-43 in Stress

Two articles in Neuron (Volume 102, Issue 2, 17 April 2019) reveal important insights into the potential mechanisms giving rise to the aggregates of TDP-43 that characterize more than 95% of the cases of amyotrophic lateral sclerosis (ALS; Mann et al., 2019, Gasset-Rosa et al., 2019). RNA binding proteins (RBPs) organize their RNA targets by coalescing to create structures called RNA granules. There are many types of RNA granules, but the stress granule (SG) stands out in disease relevance because multiple prior studies suggest that TDP-43 and other disease-linked RBPs co-localize with SGs, which suggests that they might regulate the translational stress response. The linkage of RBPs with RNA granules is conceptually appealing because it places consolidation of RBPs within a physiological pathway that is regulated, providing a model for understanding disease pathogenesis and offering novel targets for therapeutic intervention. However, few groups have observed colocalization of TDP-43 pathology with SG proteins in human brain, which reasonably raises questions about whether TDP-43 pathology truly arises from SG biology.

1-s2.0-S0896627319303307-gr1_lrg.jpg

Figure 1. Paths to Formation of Pathological TDP-43 Granules
(A) Basal conditions with TDP-43 in the nucleus.
(B) Stress or pathological seeds induce formation of cytoplasmic TDP-43 granules, as well as some TDP-43 not associated with RNA or stress granules.
(C) The TDP-43 matures into pathological TDP-43 granules.
(D) Neuropathology accrues as unresolvable components accumulate.

The manuscripts by the Donnelly and Cleveland groups provide powerful complement and validation for a model, suggesting that the pathological TDP-43 granules are distinct from SGs, although in some cases appear to transition through an SG intermediate (Mann et al., 2019, Gasset-Rosa et al., 2019). Donnelly’s approach is to investigate TDP-43 aggregation follows the optogenetic approach pioneered by the Brangwynne lab, which used Cry2 to enable optically induced RNA granules composed of FUS and other RBPs (Shin et al., 2017). In the current report, Donnelly’s team added Cry2 to the N terminus of TDP-43 and was able to optically induce TDP-43 oligomers that share similarities with TDP pathology seen in ALS. These granules have properties of gels based on being immobile, unlike classic RNA granules in which the RBPs remain highly mobile as phase separated into liquid droplets; this process is termed liquid-liquid phase separation (LLPS). The optically oligomerized TDP-43 form consolidate that contain phosphorylated TDP-43 and p62, which will be referred to pathological TDP-43 granules. Deletion constructs showed that the low complexity domain (LCD) also possessed the ability to form oligomers; these oligomers are dynamic at first but over 4 h evolve into immobile aggregates that contain TDP-43 phosphorylated at Ser 409/410 (pTDP-43) and p62, which is similar to the neuropathology of ALS.

The twist in the story and physiological insights come from their observations that RNA binding to TDP-43 inhibits formation of the pathological TDP-43 granules. The Donnelly team shows that including RNA recognition motifs (RRMs) prevents the tendency of LCD TDP-43 to aggregate. In addition, while stress induces most TDP-43 to form aggregates associated with SG markers, some stress-induced TDP-43 aggregates contain neither RNA nor SG markers; the heterogeneity of TDP-43 aggregates in stress is something that is commonly observed, but this work provides a useful model for understanding this biology. They emphasize the presence of a SG-independent pathway by showing that optically induced granules of WT TDP-43 also do not co-localize with SGs. This latter observation highlights that TDP-43 can form inclusions through a pathway separate from SGs but also might reflect the ability of the Cry2 system to force the aggregation without the requirement of a physiological pathway to stimulate the process. Finally, they show that a modified oligomer that binds TDP-43 with very high affinity can prevent formation of pathological TDP-43 granules and toxicity, which suggests the presence of crosstalk between the mechanisms producing TDP-43 SGs and pathological TDP-43 granules.

The Cleveland group uses independent approaches yet arrive at similar conclusions. They show that TDP-43 droplets can be induced by sonicated fibrils of aggregates TDP-43 or FUS, but not SOD1. This provides an immediate link to the idea of propagation, which is well accepted from prion diseases, considered quite possible for synucleinopathies and tauopathies but more controversial for TDP-43-opathies (Stewart et al., 2012). A striking aspect of these fibril-induced TDP-43 granules is that they remain as dynamic droplets for up to a month and only form gels with insoluble TDP-43 upon exposure to a stress, such as arsenite. This insoluble TDP-43 contained pTDP-43, suggesting immediate parallels to disease pathology. The pathological transition of TDP-43 observed by both the Cleveland and Donnelly groups is consistent with a recent study from Bonini’s group, who also showed that stress triggered two types of TDP-43 inclusions, one associated with SGs and one that has TDP-43 inclusions containing the pTDP-43 (McGurk et al., 2018). The theme of stress and aggregation also appears in work from Polymenidou’s team, who observed that stress induced a rapid transition of nuclear TDP-43 from soluble oligomers to insoluble, aggregated TDP-43, which they speculated was not functional (Afroz et al., 2017).

These accumulating studies suggest that pathological TDP-43 granules are distinct from SGs yet exhibit crosstalk with SG pathways (Figure 1). For instance, genetic regulators of SGs, such as ataxin-2 and tankyrase 1,2, inhibit the accumulation of TDP-43 pathology in vivo (Becker et al., 2017, McGurk et al., 2018). The SG is a membraneless organelle that forms when RNA translation stalls. SGs are characterized by the presence of mRNA, 40S ribosomal proteins, and particular RNA binding proteins, such as TIA1, G3BP, Caprin, or UBAP2L. Cleveland’s group looks at proteins that co-localize with the TDP-43 granules; importantly, many of their experiments use cell lines in which GFP has been introduced into the endogenous TDP-43 by CRISPR, which avoids overexpression artifacts. Using these lines, they find that almost 100% of the TDP-43 granules co-localized with classic SG markers initially, including RNA, TIA1, G3BP1, or UBAP2L, but after 90 min most of the TDP-43 transitions to granules that no longer co-localized with the SG markers. The observation of SG-independent TDP-43 granules parallels those of the Donnelly and Bonini groups but also highlights that TDP-43 can evolve from a SG into a pathological TDP-43 granule. Indeed, in the Cleveland study, a large fraction of the pathological TDP-43 granules cluster around bona-fide stress granules, which emphasizes their origin and suggests the close relationship with SGs.

This maturation of proteins from SGs, as well as differential effects of varied aggregates, is also observed with tau, where we showed that in hippocampal neuron cultures tau oligomers only associate with SGs transiently during the first hour after exposure to exogenous tau seeds, and then form aggregates not associated with SGs (Jiang et al., 2019). Translating cell culture studies to in vivo results, though, can be tricky. The biological response to tau propagation in vivo appears to be lengthened compared to the response in cultured cells. Propagated oligomeric (but not fibrillar) tau produces SGs whose co-localization with tau remain evident 3 months after injection, which contrasts with the rapid kinetics in cultured cells (Jiang et al., 2019). The similar responses of TDP-43 and tau suggest that the maturation of pathological aggregates might be a generalizable phenomenon, but the relationship between this maturation model and human neuropathology remains a question.

Exogenous fibrils also induce cytoplasmic aggregates composed of multiple nuclear and nuclear pore proteins: is this pathological crossover seeding? Pathological seeding and propagation of disease pathology are thought to contribute significantly to the pathophysiology of prion diseases, tauopathies, and synucleinopathies. The typical case of ALS presents with clinical symptoms in very different regions, but from there, the pathology appears to extend in adjacent areas progressively (Stewart et al., 2012). This pathological duality likely represents what occurs in most neurodegenerative diseases, with the relative proportion of stochastic and propagation events differing among diseases and even among patients.

The study of cellular responses to seeding by exogenous fibrils represents an important aspect of the work by Cleveland’s group. Seeding by sonicated TDP-43 or FUS aggregates, but not SOD1 aggregates, is sufficient to induce TDP-43 pathology and cytoplasmic aggregates of multiple nuclear porins (Nups), as well as significant toxicity. The ability of exogenous FUS aggregates to elicit seeding of cellular TDP-43 aggregates suggests that the occurrence of cross-seeding, which is a phenomenon first noted for some strains of prions in yeast but also seen for α-synuclein and tau. Cross-seeding is further suggested because exogenous TDP-43 or FUS appear able to cross-seed cellular aggregates of nuclear pore proteins, such as Nup62, Nup107, and RanGap1. Interestingly, many of the nuclear pore proteins did not co-localize with TDP-43 aggregates, suggesting that the aggregation occurred through an independent process.

The involvement of nuclear pore proteins is particularly important because their contributions to pathology of multiple type of neurodegenerative disease, including ALS, frontotemporal dementia (FTD), and Alzheimer’s disease (AD), have been increasingly noted since the first observation that C9orf72 repeat expansions disrupt nuclear transport (Zhang et al., 2015). Cleveland’s group found that aggregations of RNA binding proteins, nuclear pore proteins, and possibly other proteins could not be reversed by cycloheximide, which is known to inhibit SG formation. However, it is notable that Lloyd and colleagues found that aggregates of similar proteins that were induced by other stresses could be reversed by other SG inhibitors, such as Isrib (Zhang et al., 2018). This raises the possibility that even the cytoplasmic aggregates of nuclear pore proteins might exhibit crosstalk with the SG pathway.

These studies provide important advances in our knowledge of the mechanisms of formation of pathological aggregates (Figure 1). They highlight a pathological TDP-43 granule that is not a SG but, in some cases, evolves through a SG and in other cases evolves independently of SGs. Future studies will need to elucidate the relative proportion of pathological TDP-43 aggregates that accumulate through each pathway in patients with ALS.

REFERENCES
Afroz, T., Hock, E.M., Ernst, P., Foglieni, C., Jambeau, M., Gilhespy, L.A.B., Laferriere, F., Maniecka, Z., Plu€ckthun, A., Mittl, P., et al. (2017). Functional and dynamic polymerization of the ALS-linked protein TDP-43 antagonizes its pathologic aggregation. Nat. Commun. 8, 45.
Becker, L.A., Huang, B., Bieri, G., Ma, R., Knowles, D.A., Jafar-Nejad, P., Messing, J., Kim, H.J., Soriano, A., Auburger, G., et al. (2017). Therapeutic reduction of ataxin-2 extends lifespan and reduces pathology in TDP-43 mice. Nature 544, 367–371.
Gasset-Rosa, F., Lu, S., Yu, H., Chen, C., Melamed, Z., Guo, L., Shorter, J., Da Cruz, S., and Cleveland, D.W. (2019). Cytoplasmic TDP-43 de-mixing independent of stress granules drives inhibition of nuclear import, loss of nuclear TDP-43, and cell death. Neuron 102, this issue, 339–357.
Jiang, L., Ash, P.E.A., Maziuk, B.F., Ballance, H.I., Boudeau, S., Abdullatif, A.A., Orlando, M., Petrucelli, L., Ikezu, T., and Wolozin, B. (2019). TIA1 regulates the generation and response to toxic tau oligomers. Acta Neuropathol. 137, 259–277.
Mann, J.R., Gleixner, A.M., Mauna, J.C., Gomes, E., DeChellis-Marks, M.R., Needham, P.G., Copley, K.E., Hurtle, B., Portz, B., Pyles, N.J., et al. (2019). RNA binding antagonizes neurotoxic phase transitions of TDP-43. Neuron 102, this issue, 321–338.
McGurk, L., Gomes, E., Guo, L., Mojsilovic- Petrovic, J., Tran, V., Kalb, R.G., Shorter, J., and Bonini, N.M. (2018). Poly(ADP-ribose) prevents pathological phase separation of TDP-43 by pro- moting liquid demixing and stress granule localiza- tion. Mol. Cell 71, 703–717.e9.
Shin, Y., Berry, J., Pannucci, N., Haataja, M.P., Toettcher, J.E., and Brangwynne, C.P. (2017). Spatiotemporal control of intracellular phase tran- sitions using light-activated optoDroplets. Cell 168, 159–171.e14.
Stewart, H., Rutherford, N.J., Briemberg, H., Krieger, C., Cashman, N., Fabros, M., Baker, M., Fok, A., DeJesus-Hernandez, M., Eisen, A., et al. (2012). Clinical and pathological features of amyotrophic lateral sclerosis caused by mutation in the C9ORF72 gene on chromosome 9p. Acta Neuropathol. 123, 409–417.
Zhang, K., Donnelly, C.J., Haeusler, A.R., Grima, J.C., Machamer, J.B., Steinwald, P., Daley, E.L., Miller, S.J., Cunningham, K.M., Vidensky, S., et al. (2015). The C9orf72 repeat expansion disrupts nucleocytoplasmic transport. Nature 525, 56–61.
Zhang, K., Daigle, J.G., Cunningham, K.M., Coyne, A.N., Ruan, K., Grima, J.C., Bowen, K.E., Wadhwa, H., Yang, P., Rigo, F., et al. (2018). Stress granule assembly disrupts nucleocytoplasmic transport. Cell 173, 958–971.e17.

除了火热的甲基化,RNA还有糖基化修饰

I new pre-print from Caroline Bertozzi’s lab shows that some RNA molecules are glycosylated. At least some of these glyco-RNA molecules might reside inside the ER lumen.

There are four major types of bio-molecules: proteins, nucleic acids, lipids and sugars. Apart from energy source, sugars or glycans are also attached to proteins and lipids, and these modifications are important for their function, related mostly to membrane surface, or secretion.

Using metabolic labeling of human cells with a glycan precursor, Ryan Flynn, a postdoc at Caroline Bertozzi’s lab,  found that RNA is also labeled by this precursor.

This is really exciting since RNA was never shown to be glycosylated before, so this opens a whole new level of regulation on RNA. In their pre-print they proved, convincingly I think, that this is a bona-fide glycosylation on RNA molecules. That it occurs only on guanine residues, and that at least some of the enzymes required for protein glycosylation are involved. They further show that the bulk of glycosylated RNAs are Y RNAs and small nucleolar RNAs, in particular Y5 RNA and U3 snoRNA. Finally, they perform some crud cell fractionation with a biochemical assay to show that the glycoRNA is found in the membrane fraction, and at least some of it is in the lumen of membrane organelles, most likely the ER, but they don’t prove that. This will require more in depth fractionation to see the golgi, endosomes, lysosomes etc…

What does the glycoRNA do on the surface or in the lumen of these organelles? Which of the glycoRNA species is found where? Is this related to the fact that Y RNAs are found in exosomes or secreted as free RNP? Is the glycoRNA actually found in biofluids?

Flynn et al show that in a CRISPR’ed human cell line that is knocked-out of Y5 RNA there is  reduction in total glycoRNA – indicating that this RNA is one of the major glycoRNA molecules. The cells grow fine so its not essential. But I think a better question will be what will happen if you mutate the glycosylated G’s (but make compensatory mutation to keep the structure). Will this affect the known function of the Y RNA as an RNP regulator or in DNA replication? Will it reveal defects in secretion? in ER function? One can then pull-down the Y RNA and find the associated proteins, and then compare that to the unglycosylated mutant. This will probably help in answering questions about the function.

The authors suggests that maybe the glycosyl moiety helps the RNA to associate with membranes, maybe even go through them. I have always wondered how RNA in exosomes gets out after the exosome is engulfed by acceptor cells, and typically goes through the endo-lysosomal pathway. Maybe that’s how the RNA leaves the exosome to do whatever it is doing in acceptor cells. I wonder if adding a GFP-mimic RNA aptamer like Mango could help in determining the localization of the Y RNA (or an unglycosylated mutant).

Overall i think that this is a ground breaking discovery and I’m sure that we will find more glycoRNAs in different cell types, in different organisms, with a variety of functions.

And then maybe we will find phosphorylated RNA, ubiquitynated RNA, and who knows what else…

本文来自:https://greenfluorescentblog.wordpress.com/2019/10/09/sugar-coated-rna/

Beyond CRISPR: What’s current and upcoming in genome editing

How did “genome editing” become a household phrase so quickly? This question, posed by Jerel Davis of the investment firm Versant Ventures, opened a gene-editing panel at the 2019 Life Science Innovation Northwest (LSINW) conference in Seattle, Washington. “Genome editing is a juxtaposition of two discoveries,” explained panelist Philip Gregory from the gene and cell therapy company Bluebird Bio: Nucleases can make double-stranded DNA breaks (DSBs) at specific sequences, and DSBs activate repairs that can change DNA.

Xnip2019-09-29_14-44-12.png

DSB repair has two mechanisms. Nonhomologous end joining (NHEJ) links ends together, often creating insertions and deletions (indels) in the process. In genome editing, this can be used to knock out gene function. Homology-directed repair (HDR) fixes DSBs using DNA with a similar sequence. Providing cells with external homologous donor DNA introduces edits via HDR.

Many genome-editing systems work by activating DSB repair at specific sites using engineered zinc-finger nucleases (ZFNs), transcription activator-like effector-based nucleases (TALENs), or meganucleases (1). Currently, the dominant genome-editing method is CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-CRISPR-associated protein 9) (2). How do researchers choose among these systems?

“The primary consideration is the end product,” says Jon Hennebold, Oregon Health & Science University in Portland. Hennebold leads a multisite U.S. National Institutes of Health–funded program on genome-editing efficiency and safety. Companies use proprietary genome-editing systems optimized for specificity to reduce off-target effects (mutations at unintended sites). Most academic labs can get the product they want with CRISPR, which is fast and easy. “You can order the components and get started in 24 to 48 hours,” Hennebold says, “Other methods don’t have that commercial support.”

CRISPR: It gets the job done

Academic labs have no reason to work with other methods,” says Charles Gersbach, a biomedical engineer at Duke University in Durham, North Carolina. “For plain-vanilla genome editing, Cas9 and a gRNA will get the job done.” Cas9, an enzyme from bacterial antiviral systems, makes DSBs at DNA sites that are complementary to a guide RNA (gRNA) and also have a nearby protospacer-adjacent motif (PAM) sequence. CRISPR repeats aren’t needed for editing, so Cas9 plus a gRNA can knock out genes by NHEJ. Providing a DNA fragment promotes HDR-mediated edits.

Ru Gunawardane, director of Stem Cells and Gene Editing at Seattle’s Allen Institute for Cell Science and an LSINW panelist, says CRISPR has been “a game changer” in fulfilling the institute’s mission of understanding how cells act in normal, disease, and treatment conditions. Researchers at the institute use CRISPR to tag organelle markers in stem cells with fluorescent proteins, then track these fusion proteins and their interactions under different situations. Currently, their work includes differentiating tagged cells into cardiomyocytes.

“We’ve tagged 40 to 50 sites so far,” Gunawardane says. “Once you have the CRISPR platform, all you have to change is the gRNA and the template for introducing the tag at the right location in the genome.” However, institute researchers do months of downstream quality control, such as live imaging and sequencing, before using the cells experimentally or making them available for research.

Caixia Gao, a plant biologist at the Chinese Academy of Sciences in Beijing, says CRISPR is also common in her field. “All methods are very efficient at making site-specific mutations,” she says, “but CRISPR takes the least time and has the lowest costs.”

Xnip2019-09-29_14-46-09.png

CRISPR alternatives

CRISPR-mediated genome editing has drawbacks, though. The PAM requirement limits target sequences. Cas9 is large, so its gene is difficult to deliver to cells via vectors such as adeno-associated viruses commonly used in gene therapy. Scientists worry about off-target effects, although experts note that concerns about unintended mutations are often based on calculations from studies on improving editing. These studies may deliberately use low-specificity conditions to facilitate monitoring progress.

To ensure the highest confidence in their products, companies invest time and money in custom genome-editing methods focused on efficiency and specificity. Initial investments pay off, industry scientists say, by preventing problems later in development.

ZFNs are the genome-editing reagents used by the genomic medicine company Sangamo, based in Brisbane, California. Chief Technology Officer Ed Rebar explains that Sangamo’s core editing reagent is a ZFN dimer. The typical target site is 36 basepairs. Each ZFN is a chimeric protein of the nuclease domain from the FokI restriction enzyme and an array of zinc-finger DNA-binding domains built by “mixing and matching” from Sangamo’s archive of thousands of two-finger subunits. Strategies for diversifying the ZFN architecture for high targeting capability include attaching the FokI domain to the N- or C-terminus of the zinc-finger array and inserting base-skipping linkers between fingers. With Sangamo’s high-throughput, automated process for generating ZFNs, Rebar says, “Starting from a target gene name, we can generate an initial set of editing reagents within two weeks.”

In a demonstration study, Rebar and colleagues designed ZFNs that introduced indels at 25 of 28 bases in a promoter relevant to studying hemoglobinopathies (3). Despite this precision and the advantage of being smaller than Cas9, ZFNs are not as commonly used as CRISPR-based methods. Sangamo provides ZFNs via industry and academic partnerships but holds the modules, expertise—and patents—for making them.

TALENs attach FokI to arrays of DNA-binding modules, originally from plant pathogens, that each target a single basepair. TALENs are smaller than Cas9, but larger than ZFNs. The modules have high DNA-binding affinity but include repeated sequences that create cloning challenges.

Dan Carlson is chief scientific officer at Recombinetics, a St. Paul, Minnesota–based biotechnology company that uses TALENs and CRISPR to generate animals and cell lines for clinical research models and agriculture. Using these methods, he says, “we can target almost any site in a genome.” With in-house resources, even TALENs take only “a few hundred bucks and about a week” to generate, Carlson adds, so scientists choose the method that is most reproducible, consistent, and specific, based on pilot studies. These initial investments ensure the company is responsible with resources, he says. “It costs too much to sort out problems on the back end.”

Meganucleases, also called homing endonucleases, are smaller than Cas9, despite their name, which refers to recognition sequences that can be up to 40 basepairs in length. Hybrid megaTALs combine the simple assembly of TALENS with the DNA-cleavage specificity of meganucleases. Two biotech companies that use meganuclease-based methods are Bluebird Bio in Cambridge, Massachusetts, and Precision BioSciences in Durham, North Carolina.

Barry Stoddard, a structural biologist at Fred Hutchinson Cancer Research Center, Seattle, has a panel of 50–60 meganucleases that his lab engineers to recognize specific sequences. “It takes one day to make CRISPR to target a gene,” he says, “and 100 days to make a meganuclease.” Still, Stoddard gets many requests for engineered meganucleases, because their precision is highly valued for applications such as developing therapeutics for which “100 days is nothing.”

To DSB or not to DSB

Relying on HDR for editing risks introducing indels or chromosomal translocations. Even with a precisely targeted nuclease, with HDR, “you’re at the mercy of the cell,” Stoddard observes. For editing without the unpredictability of HDR, he adds, watch for developments in site-specific recombinases (SSRs).

“SSRs do the whole thing,” says Marshall Stark, molecular geneticist at the University of Glasgow in Scotland. “They break and rejoin the DNA with no need for host factors.” Even in cells with low or no HDR, SSRs can integrate exogenous DNA at a targeted site. Under optimal conditions, Stark says, “SSRs can be extremely efficient, with recombination approaching 100% in a few minutes.” SSRs can make switch-like changes such as inverting a DNA segment’s orientation. This makes them valuable for creating electronic circuit–like pathways that control cell behavior for industrial purposes or synthetic biology applications, such as making biocomputers. However, SSRs have complex, rare, 30- to 50-basepair target sites.

Stark names two approaches to adapting SSRs for more widespread genome editing: (1) using directed evolution that selects for new target specificities and (2) making fusion proteins. For example, he and others are attaching SSR-derived recombinase domains to zinc-finger modules that bind specific DNA sequences. The technology is “still at the investigative level,” Stark notes. “If you have a particular target in mind, it’s still a lot of work to make a recombinase for it.”

For genome-editing without DSBs, researchers use Cas9 that is still directed by gRNAs but does not cut DNA or makes only single-stranded nicks. Cas9 variants are fused to transcription activators or repressors, or to enzymes that alter chromatin structure by modifying DNA or DNA-packaging histone proteins to change gene expression. This epigenome editing resembles natural gene regulation, Gersbach says, “without risk of off-target changes to DNA sequences.” The method is a basic research tool for studying epigenetics and has potential therapeutic uses, such as reactivating the silenced gene that causes the intellectual disability Fragile X syndrome (4).

Base editing makes single-basepair changes while avoiding unintended mutations from DSB repair. It works in cells without HDR. Innovations in this method are published regularly, but the first base editors developed by David Liu’s group at Harvard University had a disabled Cas9 targeting a DNA sequence fused to an enzyme that converts cytosine to uracil. The fusion protein changes a cytosine–guanine pair to thymine–adenine. Another base editor, which Liu’s lab generated through protein engineering and directed evolution, changes adenine–thymine to guanine–cytosine. Just these two base-editing systems can make one-third of all possible basepair changes, Liu asserts, and potentially correct 62% of known human pathogenic point mutations.

As of summer 2019, more than 100 research papers described experiments using base editing, Liu says, “including several that corrected animal models of human genetic diseases by directly reversing point mutations.” For example, one editor corrected a mutation that causes phenylketonuria (5). Liu and others are diversifying base editing—expanding base-changing options, increasing specificity, and improving activity in live animals and at target sites that require distinguishing between highly similar sequences.

Our genome-editing future

As a scientist using genome-editing technology, Carlson hopes researchers apply it for the good of humankind and the planet. He hopes the public understands that getting a final product is actually a long process. The biotechnology company Recombinetics got media attention for using TALENs to breed polled (hornless) cows—which saves farmers the trouble of dehorning them. The project started in 2012, and Carlson says the company continues to work on making the editing more efficient.

Given its popularity and availability, CRISPR dominates genome-editing predictions. CRISPR-based systems will continue to improve incrementally, Carlson says. Researchers regularly publish about improved gRNAs with higher efficiency or specificity. Multiple Cas-type enzymes have been discovered or engineered with different PAMs or activities (6). For example, Cas13 targets RNA and is the foundation of RNA base editing. This method and Liu’s DNA base editing are licensed to Beam Therapeutics, whose cofounders include Liu and Feng Zhang, who developed CRISPR for mammalian cells.

CRISPR methodological improvements include treating cells with small molecules during editing to nudge DSB repair away from NHEJ and toward HDR. Controllable systems switch on Cas9 using light or small molecules, limiting its activity in order to reduce off-target effects. Researchers are scouring the microbial world for new Cas-type enzymes and entirely new genome-editing systems. “We’re still identifying new molecules with editing capacity and we don’t fully understand the editing tools we have,” Hennebold says. “We still have a lot to learn.”

The practice of using CRISPR to correct disease-causing mutations is growing: Editas Medicine and Allergan announced human in vivo CRISPR-therapy trials for an inherited blindness. A potential hurdle to therapeutic CRISPR is the possibility of human immune responses to its bacterial components. For instance, a majority of tested blood samples showed existing immune responses to Cas9, which is commonly taken from Staphylococcus or Streptococcus bacteria (7).

The genome-editing wish list includes better methods for multiplexing—editing more than one gene at a time. For example, multiplexing would speed developments in T-cell–based immunotherapy, which works for many patients but requires altering multiple genes. And plant scientists often want to create “stacks” of linked genes that are inherited together as a package for resistance to disease, pests, and other agricultural threats. Multiplexing would accelerate creating these products.

In principle, multiplexing is simple with CRISPR, requiring only the introduction of a single Cas enzyme and of gRNAs and template DNAs for each targeted gene. Gunawardane has tried CRISPR multiplexing to tag multiple genes in the same cell and says it’s achievable, but in practice, gets increasingly complicated with each added gene. Systems using SSRs, ZFNs, or meganucleases may offer advantages such as smaller components that allow easier introduction.

Ask scientists about genome-editing challenges and they mention delivery of components into cells. They say to watch for transient systems that deliver editing enzymes as proteins instead of their genes so that the proteins are degraded after acting instead of being continuously expressed. Limiting activity in this manner could reduce off-target effects. Gao notes that DNA-independent delivery of genome-editing systems could alleviate concerns about genetically modified organisms (GMOs). “Proteins can’t integrate into the genome,” she says, “so if no foreign DNA is delivered at all, the resulting plants should be considered non-GMO.”

CRISPR is already very powerful, and so many people are working on it and other genome-editing systems that they’ll inevitably continue to improve, Gao says. “Scientists like to make new tools and new technology,” she says, “so we’re really seeing progress every day. Now, we say we can edit any target in principle, but in five years it will be true.”

References

A. J. Bogdanove, et al.Nucleic Acids Res. 46, 4845–4871 (2018), doi: 10.1093/nar/gky289.

A. C. Komor, A. H. Badran, D. R. Liu, Cell 168, 20–36 (2017), doi: 10.1016/j.cell.2016.10.044.

D. E. Paschon et al.Nat. Comm. 10, 1133 (2019), doi: 10.1038/s41467-019-08867-x.

X. S. Liu et al.Cell 172, 979–992 (2018), doi: 10.1016/j.cell.2018.01.012.

L. Villiger et al.Nat. Med. 24, 1519–1525 (2018), doi: 10.1038/s41591-018-0209-1.

A. Pickar-Oliver, C. A. Gersbach, Nat. Rev. Mol. Cell Biol. 20, 490–507 (2019), doi: 10.1038/s41580-019-0131-5.

C. T. Charlesworth et al.Nat. Med. 25, 249–254 (2019), doi: 10.1038/s41591-018-0326-x.

Source:  https://www.sciencemag.org/features/2019/09/beyond-crispr-what-s-current-and-upcoming-genome-editing

AAV transduction is enhanced in the presence of sub-neutralizing concentrations of antibodies

Although it has been widely observed in the field, there are few reports in the literature about the mechanism and prevalence of antibody-dependent enhancement of AAV. One paper has observed that mouse anti-AAV2 antiserum is capable of enhancing transduction in monocytic cell lines such as THP-1 and U937, and that blocking FcγRI and FcγRII with anti-FcγRI and FcγRII antibodies decreases this enhancement (275). Enhancement has also been observed in mouse bone marrow macrophages in the presence of mouse serum from mice pre-immunized with AAV2 (276). This group determined that enhancement in these cell lines was due to complement protein C3, as recombinant C3 can bind to AAV2 capsids and heat inactivation abrogated this effect. These two reports, although supporting the ability of AAV to undergo enhancement, do little to define the mechanism of enhancement observed in non-immune cells and whether this enhancement occurs in vivo. For other viruses, enhancement has been reported to occur either through the Fc receptor mediating uptake into immune cells for viruses such as Dengue virus (277, 278), or complement-bound antibodies mediating entry into non-immune cells such as Ebola (279) and Parvovirus B19 (280). This is observed in situations where there are low affinity antibodies, such as during secondary infection with a different serotype than the primary infection (281). We have observed enhancement in vitro of up to 10-fold from serum samples that are neutralizing against other AAV serotypes. By understanding more about the mechanisms by which sub- detectable levels of antibodies affect transduction, we can develop methods to circumvent their activity in ways that are translatable to the clinic. We have used both in vitro and in vivo studies in mice in an attempt to dissect the role anti-AAV antibodies are playing in AAV entry outside of a classical neutralization mechanism.

Adapted From: 

A Genome-Wide Knock-Out Screen Identifies Novel Host Cell Entry Factor Requirements for Divergent Adeno-Associated Virus Serotypes. A dissertation presented by Amanda Mary Dudek, Harvard University, September 2018.

[REFS]

275. Mori S, Takeuchi T, Kanda T. 2008. Antibody-dependent enhancement of adeno- associated virus infection of human monocytic cell lines. Virology 375:141-147.
276. Zaiss AK, Cotter MJ, White LR, Clark SA, Wong NC, Holers VM, Bartlett JS, Muruve DA. 2008. Complement is an essential component of the immune response to adeno- associated virus vectors. J Virol 82:2727-2740.
277. Rodrigo WW, Jin X, Blackley SD, Rose RC, Schlesinger JJ. 2006. Differential enhancement of dengue virus immune complex infectivity mediated by signaling- competent and signaling-incompetent human Fcgamma RIA (CD64) or FcgammaRIIA (CD32). J Virol 80:10128-10138.
278. Moi ML, Lim CK, Takasaki T, Kurane I. 2010. Involvement of the Fc gamma receptor IIA cytoplasmic domain in antibody-dependent enhancement of dengue virus infection. J Gen Virol 91:103-111.
279. Takada A, Feldmann H, Ksiazek TG, Kawaoka Y. 2003. Antibody-dependent enhancement of Ebola virus infection. J Virol 77:7539-7544.
280. von Kietzell K, Pozzuto T, Heilbronn R, Grossl T, Fechner H, Weger S. 2014. Antibody-mediated enhancement of parvovirus B19 uptake into endothelial cells mediated by a receptor for complement factor C1q. J Virol 88:8102-8115.
281. de Alwis R, Williams KL, Schmid MA, Lai CY, Patel B, Smith SA, Crowe JE, Wang WK, Harris E, de Silva AM. 2014. Dengue viruses are enhanced by distinct populations of serotype cross-reactive antibodies in human immune sera. PLoS Pathog 10:e1004386.

AAV免疫反应的应对策略

These are my notes from Dr. Roland Herzog’s lecture at the Broad Institute’s New Therapeutic Modalities Workshop series on September 20, 2019.

AAV is a preferred gene therapy vector for several reasons including its safety profile. AAV as produced in the lab and for drugs is produced without helper virus and is devoid of viral coding sequences, so that it would be replication-incompetent even in the presence of helper virus. There are now a variety of capsids with tropisms for different tissues of interest for treatment of genetic diseases. AAV is used in two approved gene therapy drugs, voretigene neparvovec for RPE65 loss of function and onasemnogene abeparvovec for SMN1 loss of function. Therapies are under development for MTM1 loss of function (X-linked myotobular myopathy) and for F8 and F9 loss of function (hemophilia A and B respectively).

Blood coagulation is driven ultimately by fibrin polymerization, but to trigger fibrin polymerization you first need a whole cascade called the blood coagulation cascade. Key steps in this process are mediated by Factor IX (F9) and its co-factor Factor IIX (F8). Both of these are X-linked, and worldwide about 1 in 5,000 males born has hemophila. Purified clotting factor (either from human donors or recombinant protein) is standard of care but has short duration of action and has to be administered 3x per week, at a cost of $300,000 per year.

A longstanding dream is to have a one-time therapy for hemophilia using viral expression of the missing coagulation factor in the liver. There is a clear dose-response from the genetics of the disease — for F9, <1% wild-type clotting activity corresponds to severe disease, 1-5% to moderate disease, and >5% to mild disease [George 2017]. Early trials achieved just 5% wild-type clotting activity, but have now demonstrated duration of that expression up to 8 years post-treatment. A more recent trials have achived 33% wild-type activity [George 2017].

Although AAV are less immunogenic than many viruses, their immunogenicity continues to be a major hurdle in a few ways. First, immune responses have been obseved in trials. Second, treated patients develop seropositivity against the AAV and so might not be able to receive a second treatment if a limit to single-treatment AAV durability is ever encountered. Third, patients with pre-existing seropositivity against the AAV type of interest are always excluded from clinical trials.

How does the body recognize and respond to AAV? The protein capsid is the obvious source of immunogenicity and we know that the different AAV serotypes correspond to different changes in the capsid. But there are at least three other possible sources of immune response: the viral DNA genome, the therapeutic transgene being packaged, or the dsRNA that is generated upon transcription of the viral genome.

The viral genome itself can be immunogenic through stimulation of innate immunity via TLR9 [Martino 2011]. TLR9 responds to unmethylated CpG content in DNA, which is higher in pathogens than in human DNA, and depleting CpG content from the AAV vector can reduce this response [Faust 2013]. And George Church’s lab has a spinout, Ally Therapeutics, employing another approach to reduce innate immune response to viral vector DNA.

Immune response to the therapeutic transgene is particularly a concern in individuals with null mutations, where their immune system has never seen that protein before. Antibodies against dystrophin and α-1 antitrypsin have been detected in patients with these deficiencies [Mendell 2010Calcedo 2017]. Evidence from several models suggests that the nature of the host’s mutation is a major determinant of immune response to the transgene [Cao 2009Rogers 2014]. Dose (number of viral genomes) also matters a lot [Kumar 2017]. In a dog model of hemophilia, the introduction, into muscle, of wild-type Factor IX itself was immunogenic to dogs with a null mutation, but not to those with a missense mutation [Herzog 1999Herzog 2001]. This immune reaction could be avoided by directing the gene therapy to the liver instead of muscle [Mount 2002]. There is now evidence that gene transfer to the liver may be a viable strategy, more broadly, for encouraging immune tolerance to a therapeutic transgene [Markusic 2013Perrin 2016]. Treatment with rapamycin may also be able to help suppress immune response to transgenes [Moghimi 2011].

One recent report indicates that a dsRNA-sensing mechanism is involved in innate immune response to AAV [Shao 2018], apparently because dsRNA is formed when the viral genome is transcribed in transduced cells.

When the body responds to the AAV capsid, killer T cells (CTLs) may not only attack the virus, but may also destroy transduced host cells [Manno 2006Mingozzi 2007Martino 2013].

In Q&A, the topic came up of excluding seropositive patients from AAV gene therapy clinical trials. Dr. Herzog noted that the assays used to assess seropositivity are not at all standardized. Practically every company invents their own assay for the data they submit to FDA. In a typical format, AAV expressing a reporter gene, usually luciferase, is added to cells growing in a 96-well plate, with patient serum mixed in at a series of dilutions. The cells treated with saline or negative serum will light up with luciferase, but if the patient has neutralizing antibodies, then the cells where AAV was neutralized by their serum will remain dark, and the highest serum dilution at which lack of luciferase expression is observed tells you the titer. The problem is that different AAVs have different tropisms for different cultured cell lines, and so the multiplicity of infection (MOI, number of copies of AAV per cell) needs to be different in different assays. But if the MOI is 100x higher in one assay than another, then the patient antibody titer required to neutralize luciferase expression is also 100x higher, and so “titer” cannot at all be compared between assays. If a physician wanted to make an informed decision of whether to treat a seropositive patient with an approved AAV therapy based on whether there existed good safety and efficacy data for patients with that titer of seropositivity, they would have a difficult task in front of them.

To what extent would/does seropositivity impact on the safety and efficacy of the drug? Some investigators have viewed seropositivity as mostly an efficacy issue. In fact, there have even been efforts to spike active AAV with a large amount of empty AAV capsid, on the theory that the empty virus sops up the immune response, allowing more of the active AAV to evade neutralization and transduce the desired cells. However, there have also been some reports of complement activation in AAV-treated patients in clinical trials, suggesting that immune response could potentially rise to the level of a safety issue.

 

AAV Engineering Identifies a Species Barrier That Highlights a Portal to the Brain

In this issue of Molecular Therapy, Hordeaux et al(1).present a new chapter in the fascinating story on gene transfer across the blood-brain barrier (BBB). Few in our field—and beyond—would have predicted the twists and turns in the story of this unexpected biology with transformative therapeutic implications.

gr2.jpg
The tale starts in 2008 with a series of observations in which a 25 nm proteinaceous adeno-associated virus (AAV) particle at a high dose could transduce targets in the peripheral and CNS via systemic routes of administration in mice(2,3) and larger animals(3,4) .Moreover, the efficiencies of gene transfer supported remarkable levels of correction of disease models, most notably in spinal muscular atrophy (SMA)(3,5) .A decade later, these findings culminated in a successful Ph1/2 study for SMA type 1 and the likelihood of a drug approval for this otherwise fatal disorder(6) .This initial academic success was translated commercially by Avexis (now acquired by Novartis), and this SMA program is currently being reviewed for drug approval by the US Food and Drug Administration (FDA).

These findings and developments have energized a field of vector discovery to further improve on this unique ability that allows gene transfer to the CNS and peripheral nervous system via a non-invasive injection route. That potential was reached in 2016 when Deverman et al (7) . identified an AAV variant that leapfrogged AAV9 in its ability to traverse the BBB.

AAV-PHP.B was selected from a diverse library of AAV9 variants that incorporated random 7-mer peptides in a known permissive insertion site of the rigid icosahedral AAV capsid. Using a system called Cre recombination-based AAV-targeted evolution (CREATE), developed in the Gradinaru lab at Caltech, Deverman was able to enrich those AAVs that landed in glial fibrillary acidic protein (GFAP)-positive astrocytes in the brains of mice that were injected intravenously with the library. The clever design of CREATE gave them a handle on specificity and sensitivity: the in vivo selection was performed in GFAP-transgenic C57BL/6 mice and the CREATE-library of vectors enabled selective amplification of Cre-recombined AAV genomes with high sensitivity in astrocytes in the CNS. After several rounds of purifying selection, PHP.B arose as a dominant species. Validation studies confirmed the efficiency of transduction by AAV-PHP.B following intravenous administration to be between one and two orders of magnitude greater than AAV9. PHP.B has since been widely adopted as a research tool in neuroscience. The AAV9 variant was also quickly considered as a therapeutic vector system and licensed to various gene therapy companies, and its remarkable improvements in efficiency compared to AAV9 in mice suggested a tantalizing set of opportunities to unlock targets in neurological and neuromuscular disorders, to reduce dose requirements to achieve therapeutic levels of transduction in the CNS, and to minimize dose-related toxicities. The authors continued on this successful track and have since generated other variants with improved properties(8) .

The next twist in this already remarkable story was unfortunately more sobering. In a Molecular Therapy report last year, Hordeaux et al(9) . presented data that the BBB advantage of PHP.B over AAV9 did not translate to a non-human primate model or to another mouse strain, BALB/c. Concerns about species barriers had been raised in the past for other AAVs, but such barriers had not been observed in such a dramatic and narrow manner.

In this issue of Molecular Therapy, Hordeaux et al(1) . present the next surprising chapter in the story of AAV and BBB, following up on their initial observation demonstrating a species barrier. In an elegant study that combines both old and new school genetics, the authors were able to segregate the enhanced PHP.B BBB-crossing phenotype to the C57BL/6 Ly6a (also known as Sca-1) haplotype. Confirmatory studies in Ly6a knockout (KO) mice, the mapping of the phenotype to a small set of small nucleotide polymorphisms (SNPs) in the BALB/c Ly6a, and the observation that AAV-PHP.B’s CNS infectivity was not affected following an intracerebroventricular injection of BALB/c mice gave further credence to its specific role in BBB transport of this AAV variant. Remarkably, the 18-kDa mouse glycosyl phosphatidylinositol-anchored cell surface protein is well-known for its role in hematopoiesis and its utility as a marker of stem cells and progenitors, but not for its role in BBB transport.

So where does this tale of unexpected biology leave us after all its twists and turns? For starters, AAV-PHP.B was, is, and remains a powerful research tool for neuroscientists, and the current study really defines its utility to strains of mice with the C57BL/6 Ly6a haplotype. Second, the development of AAV-PHP.B by Deverman et al.(7) illustrates the potential of AAV engineering and the dynamic range in activity or potency that can be obtained. It puts the bar quite high for the rest of us working in a dense AAV discovery space. It also highlights how methodological sophistication, such as CREATE, may be required to identify AAVs with a specific transduction performance, such as PHP.B. Third, the consecutive studies by Hordeaux et al.(1,9) underline the limits of vector engineering, particularly those using methods of directed evolution such as the one that generated AAV-PHP.B. These potent selective screens are subject to the genetic biases of the model upon which they are used and, hence, the output reagents require significant scrutiny and validation before consideration for translation. The exquisite sensitivity of CREATE and other such systems might inadvertently heighten the concern that model-specific biases are introduced. Future studies should reveal where the verdict lands on this glass-half-full, glass-half-empty assessment of the utility of vector engineering approaches such as directed evolution. Lastly, the fact that we can tie a mechanism and a host factor to explain aspects of AAVs biology and tropism in vivo is a remarkable step forward. While this study may present findings on a vector with relatively limited therapeutic utility given the species restriction, the implication of Ly6a in BBB transport is novel and opens the door to use it and its functional homologs as a target portal to the brain for AAV and other therapies in humans.

REFERENCES
1. Hordeaux, J., Yuan, Y., Clark, P.M., Wang, Q.,Martino, R.A., Sims, J.J., Bell, P., Raymond, A., Stanford, W.L., and Wilson, J.M. (2019). The GPILinked Protein LY6A Drives AAV-PHP.B Transport across the Blood-Brain Barrier. Mol. Ther. 27, this issue, 912–921.
2. Foust, K.D., Nurre, E., Montgomery, C.L., Hernandez,
A., Chan, C.M., and Kaspar, B.K. (2009). Intravascular AAV9 preferentially targets neonatal neurons and adult astrocytes. Nat. Biotechnol. 27, 59–65.
3. Duque, S., Joussemet, B., Riviere, C., Marais, T., Dubreil, L., Douar, A.M., Fyfe, J., Moullier, P., Colle,M.A., and Barkats, M. (2009). Intravenous administration of self-complementary AAV9 enables transgene delivery to adult motor neurons. Mol. Ther. 17,
1187–1196.
4. Bevan, A.K., Duque, S., Foust, K.D., Morales, P.R., Braun, L., Schmelzer, L., Chan, C.M., McCrate, M., Chicoine, L.G., Coley, B.D., et al. (2011). Systemic gene delivery in large species for targeting spinal cord, brain, and peripheral tissues for pediatric disorders. Mol. Ther. 19, 1971–1980.
5. Foust, K.D., Wang, X., McGovern, V.L., Braun, L., Bevan, A.K., Haidet, A.M., Le, T.T., Morales, P.R., Rich, M.M., Burghes, A.H., and Kaspar, B.K. (2010). Rescue of the spinal muscular atrophy phenotype in a mouse model by early postnatal delivery of SMN.
Nat. Biotechnol. 28, 271–274.
6. Mendell, J.R., Al-Zaidy, S., Shell, R., Arnold, W.D.,Rodino-Klapac, L.R., Prior, T.W., Lowes, L.,Alfano, L., Berry, K., Church, K., et al. (2017).Single-Dose Gene-Replacement Therapy for Spinal Muscular Atrophy. N. Engl. J. Med. 377, 1713–1722.
7. Deverman, B.E., Pravdo, P.L., Simpson, B.P., Kumar,S.R., Chan, K.Y., Banerjee, A., Wu, W.L., Yang, B.,Huber, N., Pasca, S.P., and Gradinaru, V. (2016).Cre-dependent selection yields AAV variants for wide spread gene transfer to the adult brain. Nat. Biotechnol. 34, 204–209.
8. Chan, K.Y., Jang, M.J., Yoo, B.B., Greenbaum, A., Ravi, N., Wu, W.L., Sánchez-Guardado, L., Lois, C., Mazmanian, S.K., Deverman, B.E., and Gradinaru, V. (2017). Engineered AAVs for effi- cient noninvasive gene delivery to the central and peripheral nervous systems. Nat. Neurosci. 20, 1172–1179.
9. Hordeaux, J., Wang, Q., Katz, N., Buza, E.L., Bell, P., and Wilson, J.M. (2018). The Neurotropic Properties of AAV-PHP.B Are Limited to C57BL/6J Mice. Mol. Ther. 26, 664–668.

Cell:CAG Repeats, 而不是 PolyQ长度决定着亨廷顿舞蹈病发病时间

亨廷顿舞蹈病(Huntington’s disease, HD)是一种遗传性的致命性疾病。在这种疾病中,大脑中的神经细胞随着时间的推移而遭受破坏。它在生命中的任何时候都可能变得明显,但通常开始于一个人的三四十岁。在一项新的研究中,来自亨廷顿舞蹈病基因修饰物联盟(Genetic Modifiers of Huntington’s Disease Consortium)的研究人员针对HD发病时间的这个普遍接受的观点提出了质疑。相关研究结果发表在2019年8月8日的Cell期刊上,论文标题为“CAG Repeat Not Polyglutamine Length Determines Timing of Huntington’s Disease Onset”。

HD和许多其他神经退行性疾病是由遗传发生扩增的CAG重复序列引起的,其中CAG密码子编码一种称为谷氨酰胺的氨基酸。这些疾病的发病年龄与扩增的CAG重复序列的长度呈负相关,而且被认为是由DNA中这种CAG重复序列编码的多聚谷氨酰胺(polyglutamine)的毒性增加所致。

当美国马萨诸塞州综合医院基因组医学中心分子神经遗传学部门的James Gusella博士、Jong-Min Lee博士、Marcy MacDonald博士及其同事们分析了9000多名HD患者的信息时,他们发现HD发病时间是由于个体DNA中扩增的CAG重复序列的特性,而不是由于多聚谷氨酰胺的长度决定的。此外,这些研究人员发现,参与DNA维持和修复的多个基因可以改变HD发病的时间,这就使得它比基于遗传的CAG重复序列的长度预期的更早或更晚地发生。

Gusella说道,“我们的数据支持这样的假设,即CAG重复序列的这个关键特性是它随着个体年龄的增长而发生进一步扩增的倾向,从而导致特定脑细胞中的CAG重复序列越来越长,直到达到临界阈值长度并产生毒性。通过在这种疾病过程的早期着重关注DNA重复序列本身而不是它编码的蛋白,我们的研究结果改变了科学家们对HD和其他DNA重复序列疾病的看法。”

这些研究结果表明,CAG重复序列本身或者改变它在神经元中扩增的DNA维持过程可能是可开发能延迟或预防HD和其他重复序列疾病发病的治疗方法的潜在靶标。Gusella说道,“目前已有许多方法可以改变HD中CAG重复序列的长度或纯度,并开发抑制或激活特定DNA维持蛋白的药物。”

参考资料:

1.Jong-Min Lee et al. CAG Repeat Not Polyglutamine Length Determines Timing of Huntington’s Disease Onset. Cell, 2019, doi:10.1016/j.cell.2019.06.036.

2.Insights on timing of Huntington’s Disease onset
https://medicalxpress.com/news/2019-08-insights-huntington-disease-onset.html

首个CRISPR产品进入临床

Editas Medicine and its partner Allergan have advanced AGN-151587 into a phase I/II trial for patients with Leber congenital amaurosis type 10, a rare and inherited form of blindness.

AGN-151587, previously called Edit-101, is the first CRISPR–Cas9 genome-editing medicine that is administered directly to patients. Doctors inject the adeno-associated virus-based candidate subretinally, so that it can cut out a mutation in the CEP290 gene in photoreceptor cells in the eye. Spark Therapeutics and Novartis’s voretigene neparvovec, the first gene therapy to gain approval in the US, corrects a different form of the inherited eye disease, by introducing a normal copy of the RPE65 gene to patients with Leber congenital amaurosis type 2.

Several other CRISPR-focused companies have prioritized ex vivo applications of their technologies.

With CRISPR Therapeutics and partner Vertex Pharmaceuticals’ CTX001, for example, patient’s haematopoietic stem cells are harvested and then engineered ex vivo with CRISPR to boost the production of fetal haemoglobin, before being re-infused into patients. CRISPR is used in this case to cut the DNA that encodes BCL11A, a transcription factor that otherwise represses fetal haemoglobin expression. The partners launched phase I/II development of CTX001 in patients with β-thalassaemia and with sickle cell disease last year, making it the first ex vivo CRISPR-based candidate into the clinic in Europe and the US.

CRISPR Therapeutics and Tmunity Therapeutics have also independently started testing cancer-killing cellular therapies that are engineered using CRISPR. CRISPR Therapeutics’ phase I candidate CTX110 is an off-the-shelf CD19-targeting chimeric antigen receptor-T cell, and the company credits its use of CRISPR as a means of achieving enhanced precision and efficiency during the manufacturing of this therapeutic. Tmunity’s phase I NY-ESO-1-redirected T cells are autologous T cell receptor (TCR)-engineered therapeutics. The company uses CRISPR during the production of these cells to disrupt expression of endogenous TCRα, TCRβ and PD-1.

Intellia Therapeutics and partner Regeneron are working towards a 2020 investigational new drug application for their in vivo NTLA-2001, a systemically delivered CRISPR treatment for transthyretin amyloidosis.

A novel approach to reverse proteinopathies

Autosomal dominant tubulointerstitial kidney disease-MUC1 (ADTKD-MUC1) is a slowly progressive disease, for which there are no therapies and the pathogenetic mechanism is unknown. Writing in Cell, Dvela-Levitt et al. report that ADTKD-MUC1 is a toxic proteinopathy and identify a small molecule that clears the accumulated toxic protein by modifying its intracellular trafficking.

Mucin 1 (MUC1) is a transmembrane mucin expressed in epithelial cells. In patients with ADTKD-MUC1, frameshift (fs) mutations in MUC1 produce a truncated protein, MUC1fs.

To dissect how MUC1fs causes ADTKD, the authors first examined the subcellular localization of wild-type MUC1 (MUC1wt) and MUC1fs in kidney biopsy samples from a patient heterozygous for the MUC1 frameshift mutation. MUC1wt was present in the apical membrane in tubule and collecting duct cells, whereas MUC1fs accumulated intracellularly, suggesting that MUC1fs is retained in the secretory pathway and does not transit to the plasma membrane. These results were confirmed in heterozygous knock-in mice expressing a human MUC1fs allele ( /fs mice), in kidney organoids generated from patient induced pluripotent stem cells and in a patient-derived immortalized tubular epithelial cell line ( /MUC1fs cells). Importantly, /fs mice developed a similar tubulointerstitial pathology to that in patients with ADTKD-MUC1.

Immunofluorescence analysis with a panel of secretory pathway markers revealed that MUC1fs accumulates in the cis-Golgi and in vesicles marked with the cargo receptor TMED9, which is involved in Golgi–endoplasmic reticulum (ER) retrograde transport.

Next, the authors investigated whether MUC1fs accumulation triggered the unfolded protein response (UPR), which attempts to maintain proteostasis in the face of ER stress. Analysis of RNA sequencing data and follow-up studies probing the UPR at the protein level indicated that the ATF6 branch of the UPR was upregulated in /MUC1fs cells compared with control ( / ) cells. Furthermore, ATF6 inhibition exacerbated MUC1fs accumulation in /MUC1fs cells and led to increased apoptosis compared with control cells. In fact, treatment with the general ER stressor thapsigargin increased apoptosis in /MUC1fs cells in vitro and in vivo. Thus, MUC1fs accumulation causes a toxic proteinopathy, and activation of ATF6 attempts to protect tubular epithelial cells from this toxicity.

The authors screened a drug repurposing library of 3,713 drugs to identify compounds that could relieve the block in MUC1fs intracellular transport (that is, reduce MUC1fs levels by >30%, without cellular toxicity). Additional screens using a wider range of doses and eliminating compounds that reduced MUC1wt levels or MUC1fs mRNA levels, or did not rescue thapsigargin-induced apoptosis in /MUC1fs cells, whittled down the 203 hits to a single compound, BRD4780.

Administration of BRD4780 to /fs mice reduced MUC1fs but not MUC1wt levels in the kidneys, and similar results were obtained in kidney organoids derived from patients. In a time course, BRD4780 promoted transit of MUC1fs from the early secretory pathway to endosomes and lysosomes, but only in the presence of a functional secretory pathway. In fact, TMED9 levels were increased in /MUC1fs cells, and TMED9 deletion phenocopied the effect of BRD4780 on MUC1fs accumulation. Mechanistically, the authors provided evidence suggesting that BRD4780 binds directly to TMED9, which releases MUC1fs from the early secretory pathway, enabling transit of MUC1fs into lysosomes to be degraded.

This ability of BRD4780 to promote removal of misfolded proteins that accumulate in the secretory pathway extends to other proteinopathies, both of the kidneys and other organs. For example, in cell lines, BRD4780 treatment reduced the levels of mutant uromodulin, which causes ADTKD–UMOD, and mutant rhodopsin, which causes retinitis pigmentosa. Thus, BRD4780 might have wider therapeutic potential in treating toxic proteinopathies.

This study identifies a novel approach to clear toxic protein accumulations in various, currently untreatable proteinopathies, and BRD4780 represents a valuable therapeutic lead.

References
Dvela-Levitt, M. et al. Small molecule targets TMED9 and promotes lysosomal degradation to reverse proteinopathy. Cell 178, 1–15 (2019)

靶向大脑的基因治疗策略

Genome editing has rapidly transformed biomedical research and has demonstrated therapeutic promise via successes in tissue culture, ex vivo, embryonic editing, and animal models of human disease. For successful translation, a genome-editing therapeutic must be safe, effective, and ideally straightforward to manufacture. DNA encoding the RNA and protein components of a CRISPR-derived genome editing enzyme such as Cas9 can be delivered by adeno-associated virus (AAV) with high efficacy, but safety may be a concern and the manufacturing burden is substantial [1]. One emerging alternative is the delivery of genome-editing enzymes in the form of a pre-assembled ribonucleoprotein (RNA and protein, or RNP) complex. This approach is appealing because it ensures a tight therapeutic window: the RNP will be degraded in less than 24 h. By contrast, viral expression can result in prolonged expression of the genome-editing enzyme that persists for days. This has been associated with an increased prevalence of unintended off-target edits compared with RNP-based editing [2]. The nuclear localization signal (NLS) has routinely been used to ensure transport of RNP from the cytosol to the nucleus, but transporting a large genome-editing enzyme from the cell exterior to the cytosol presents a distinct challenge. Several strategies have been successful in promoting the cellular import of Cas9 RNP, such as modification of the Cas9 protein to include incidentally membrane-disrupting NLS sequences [3], or appending negatively charged domains to the Cas9 protein to promote its interaction with polymers that promote cellular entry [4]. The Murthy lab has developed an approach that uses a nucleating gold nanoparticle conjugated to single-stranded DNA to recruit Cas9 RNP, all of which is coated in a cationic polymer that facilitates delivery across the cell membrane, dubbed CRISPR-Gold [5].

The brain is an appealing site for initial forays into therapeutic genome editing because it is anatomically insular, allowing straightforward surgical access and providing an immunoprivileged status that ameliorates the risks associated with introduction of viral vectors [1] and/or genome-editing enzymes [6]. In a recent study by Lee and colleagues [7], the Lee and Murthy labs collaborated in using CRISPR-Gold to deliver either Cas9 or the analogous Cas12a (Cpf1) to the mouse brain. CRISPR-Gold carrying either Cas9 or Cas12a was stereotactically injected into the mouse hippocampus or striatum, performing efficient genome editing as detected by fluorescent reporters. A mouse model of FXS, based on an Fmr1 knockout (KO), was used for experiments probing the ability of genome editing to treat autism. FXS is a common, inherited single-gene form of autism spectrum disorder (ASD), and drug treatments are largely inadequate. Importantly, the mGluR5 gene has emerged as a promising candidate for genetic therapy, since it can contribute to FXS as well as other ASDs. To test whether reduction of mGluR5 could diminish autism-associated phenotypes in the FXS model mice, CRISPR-Gold bearing a Cas9 RNP targeting mGluR5 was injected into the striatum. In treated striatal cells, 15% of mGluR5 loci were disrupted, leading to a ∼40% reduction in mGluR5 mRNA or protein abundance, via qPCR or immunostaining, respectively. Behavioral studies of edited FXS model mice showed a marked reduction in two established hallmarks of mice with autistic phenotypes: marble-burying and spontaneous jumping. This promising result was bolstered by the observation that CRISPR-Gold treatment had no discernible impact on mouse locomotion. Other tests for toxicity showed that CRISPR-Gold treatment was not associated with cell death in vivo or changes in the properties of cultured neurons.

It is illustrative to evaluate these CRISPR-Gold results in comparison with AAV-mediated delivery, which was quickly adopted by pioneering genome editors for use in the brain (Figure 1). Delivery of AAV encoding Cas9 and its single guide (sg)RNA (Cas9/AAV) has been particularly successful in generating models of neurodegenerative diseases and other diseases of the brain and nervous system. A 2016 report from the Zhang laboratory reported Cas9/AAV-mediated editing of the MCP2 gene in the brains of mice, resulting in disruptive edits in a majority (68%) of the cells in the injected tissue. The observed robust viral distribution throughout the tissue and editing in postmitotic neurons allowed generation of a mouse model of Rett’s syndrome bearing the corresponding behavioral phenotype [8]. AAV has also been applied in therapeutic models; for example, the Davidson laboratory edited the disease-causing allele in a transgenic mouse model of Huntington’s disease, observing reductions in the levels of mutant huntingtin protein of up to 80% following an injection of Cas9/AAV into the brain [9]. A similar approach was recently reported for in vivo editing of the mutant alleles of the APP gene that underlies Alzheimer’s disease, another condition with dominant inheritance [10]. Cas9/AAV vectors were injected into the hippocampus of transgenic adult mice expressing multiple copies of the human mutant APP allele, and selectively generated indels (1.3%) in the mutant allele allowing a decrease in pathogenic amyloid-β protein levels in the brains of the mice [10]. Neither example of Cas9/AAV editing disease-causing mutant alleles demonstrated an associated therapeutic phenotype in mice, as was convincingly demonstrated with the CRISPR-Gold phenotype in the recent report by Lee and colleagues. However, the model systems differ, and it is reasonable to anticipate that Cas9/AAV-mediated editing might perform comparable editing in an FXS model system.

One apparent advantage of AAV-mediated delivery is that the viral particles spread throughout the brain in mice and primates. By contrast, RNP as delivered in isolation [3] or by CRISPR-Gold [7] tends to edit only cells within an area of several cubic millimeters. This suggests a potential hurdle for translation. Another potential concern related to the use of CRISPR-Gold in humans is its introduction of heavy metal, which is known to be toxic. However, this issue is tempered by the knowledge that the gold constitutes a miniscule fraction of the nanoparticle assembly by weight, and that genome editing is ideally a one-time treatment that avoids the accumulation of gold that would be associated with a treatment that is repeatedly dosed. With additional development, RNP delivery may prove itself as a leading strategy for therapeutic genome editing of the brain.

【References】
1. Colella, P. et al. (2018) Emerging issues in AAV-mediated in vivo gene therapy. Mol. Ther. Methods Clin. Dev. 8, 87–104
2. Kim, S. et al. (2014) Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. Genome Res. 24, 1012– 1019
3. Staahl, B.T. et al. (2017) Efficient genome editing in the mouse brain by local delivery of engineered Cas9 ribonucleoprotein complexes. Nat. Biotechnol. 35, 431–434
4. Wang, M. et al. (2016) Efficient delivery of genome-editing proteins using bioreducible lipid nanoparticles. Proc. Natl. Acad. Sci. U. S. A. 113, 2868–2873
5. Lee, K. et al. (2017) Nanoparticle delivery of Cas9 ribonucleoprotein and donor DNA in vivo induces homology-directed DNA repair. Nat. Biomed. Eng. 1, 889–901
6. Leong, Chew Wei (2017) Immunity to CRISPR Cas9 and Cas12a therapeutics. Wiley Interdiscip. Rev. Syst. Biol. Med. 10, e1408
7. Lee, B. et al. (2018) Nanoparticle delivery of CRISPR into the brain rescues a mouse model of fragile X syndromefrom exaggerated repetitive behaviours. Nat. Biomed. Eng. 2, 497–507
8. Swiech, L. et al. (2015) In vivo interrogation of gene func- tion in the mammalian brain using CRISPR-Cas9. Nat. Biotechnol. 33, 102–106
9. Monteys, A.M. et al. (2017) CRISPR/Cas9 editing of the mutant huntingtin allele in vitro and in vivo. Mol. Ther. J. Am. Soc. Gene Ther. 25, 12–23
10. György, B. et al. (2018) CRISPR/Cas9 mediated disruption of the Swedish APP allele as a therapeutic approach for early-onset Alzheimer’s disease. Mol. Ther. Nucleic Acids 11, 429–440

Via: https://www.sciencedirect.com/science/article/pii/S1471491418301473?via%3Dihub

靶向RNA的小分子药物研究进展

昨日,专注于开发靶向RNA的小分子药物的生物医药公司Arrakis Therapeutics宣布完成数额为7500万美元的B轮。同时,该公司创始人、总裁兼首席执行官Michael Gilman博士在访谈中披露,该公司的药物发现平台,针对编码著名”不可成药“的致癌蛋白myc的mRNA,已经取得出色的筛选结果。这一新闻将靶向RNA的小分子药物再一次推到了聚光灯下。

那么我们为什么需要开发靶向RNA的小分子药物?发现和开发靶向RNA的小分子药物需要注意些什么问题?最近在这一领域又有什么新的进展?今天药明康德的微信团队将结合公开资料,与读者共同探索这些问题的答案。

11aa.png

为什么要开发靶向RNA的小分子药物?

小分子疗法绝大部分的靶标都是蛋白质,这一策略在过去数十年来也带来了大量好药和新药,据估计,接近99%的口服药物靶向的是致病蛋白。然而,这需要小分子药物能够与致病蛋白中的特定位点或“口袋”相结合,而对于大部分(接近85%)的蛋白来说,它们没有适合小分子药物结合的位点。这意味着它们用传统手段“无法成药”。

而且,蛋白只占了基因组信息的极少部分。人类的基因组中,只有1.5%的序列编码了蛋白质,和疾病相关的蛋白更是只占其中的10-15%。毫无疑问,如果小分子药物的靶点能超越蛋白质,将给新药研发带来新的变革。

11bb.png

▲靶向RNA,将给我们带来更多的成药选择(图片来源:参考资料[8])

RNA就是这样一种潜在的靶点。在正常细胞中,RNA有着重要的生理功能——mRNA携带了基因的遗传信息,指导蛋白质的合成;非编码RNA则调控基因的表达。靶向RNA有着多种好处:由于处于蛋白质的上游,靶向RNA有望直接对蛋白质的翻译效率进行上调或下调,解决蛋白“不可成药”的难题;RNA在人类基因组中极为丰富,产生非编码RNA的序列更是占到了基因组的70%,丰度比编码蛋白质的序列高出一个数量级;而近期一些概念验证性试验的成功,也让我们看到了希望。

开发靶向RNA的小分子药物的原则

根据《Nature Reviews Drug Discovery》 上的一篇综述,目前靶向RNA的小分子药物根据靶向的RNA结构,可以被分为三大类,它们分别靶向RNA中:

  1. 多个密集螺旋结构(multiple closely packed helices)
  2. 不规则的二级结构(irregular and usually bulge-containing secondary structures)
  3. 或是三联体重复序列(triplet repeats)

第一类药物中已经出现了多种候选化合物,包括四种分子——linezolid、ribocil、branaplam、以及SMA-C5。这些分子“靶向了复杂的RNA结构模块,且每一个均有很高的QED(quantitative estimate of drug-likeness)分值”,表明其具有潜在的成药潜力。值得一提的是,这些分子都是通过表型筛选找到的结果,之后才被确认具有RNA结合的属性。因此,尽管它们开了一个RNA靶向新药研发的好头,但我们尚无法从中推导出明确的RNA靶向方针。

11cc.png

▲第一类药物中已彰显出潜力的四款分子(图片来源:参考资料[8])

在另两类药物中,研究人员们最初的筛选过程,就是为了找到能靶向RNA的小分子药物。这些筛选方法包括高通量筛选、使用专注这一方向的化合物库、受结构启发的设计、基于片段的筛选方法、以及计算机模型。这带来了大量不同的潜在RNA靶向分子,让我们对靶向RNA的原则与方针有了更深的认识。这也是像Arrakis等新兴生物医药公司开发的方向。

如何靶向RNA?

由于RNA与蛋白质结构上的显著不同,因此,RNA是否具有“可成药性”是一个需要解决的问题。与蛋白不同,RNA主要由四类核苷酸组成,带有大量电荷,也比蛋白质更为亲水。然而,RNA在折叠后的具有复杂的三维结构。这些复杂结构有望带来足够的成药构象,让小分子药物结合与识别

11dd.png

▲小分子药物也可识别RNA结构(图片来源:参考资料[8])

过去“意外发现”的小分子RNA靶向药物支持了这一观点。上文中提到的linezolid和ribocil从传统药物化学的角度看,都是非常杰出的分子——它们符合经典的“里宾斯基五规则”,有较小的极性总表面积(tPSA),也有较好的细胞膜穿透性。此外,它们没有明显的毒性。最关键的是,它们能结合RNA靶点结构上的“口袋”。这与许多靶向蛋白质的小分子药物如出一辙,也支持了靶向RNA的小分子药物的研发。

如同蛋白质靶向药物一样,靶向RNA的药物也需要对RNA结构的了解。该综述的作者们指出,良好的RNA靶点,结构上应有足够的“信息量”。目前,具有成药潜力的几个分子都靶向复杂的RNA结构。而倘若靶向简单的RNA结构,可能会影响到靶向分子的亲和力与特异性。

11ee.png

▲好的RNA靶点应有足够的“信息量”(图片来源:参考资料[8])

筛选靶向RNA的小分子药物的八大指导方针

在了指导未来的新药筛选,研究人员们整理了几大方针,供后人参考:

  • 专注于具有足够复杂度,结构独特的RNA模块。这些模块有望带来高质量的“口袋”,方便小分子药物结合。这些模块在大型RNA分子中较常见。
  • 谨慎决定靶向RNA的小分子基本结构。在定义这些结构前,我们还需要更多的研究。过去几十年来,靶向蛋白质的经验或许有用。
  • 在核糖体RNA上取得的成功,未必能旁推到其他领域。这些RNA在细胞中高度富集,靶向它们的分子可能有特殊的性质。
  • 对于那些高度碱性、具有插入特性、或是高度疏水的小分子化合物,对实验结果的解读要尤其谨慎。这些化合物可能与RNA有很高的亲和力,但可能会有严重的脱靶效应。
  • 留意那些能够靶向RNA-蛋白质相互结合的分子,譬如branaplam和SMA-C5。它们有望带来出色的特异性与亲和力。
  • 找到那些具有“高信息量”的结构。相比蛋白质,RNA的一大优势在于我们有许多定量的化学方法去帮助我们完成这些工作。
  • 在现有工具的基础上,开发全新的工具,去更好地了解RNA的复杂结构。
  • 找到那些具有明确治疗机制的靶点,并针对它们寻找潜在的新药。
这一领域的最新进展

我们高兴地看到,这一领域的新兴生物医药公司的研发方向与这些建议不谋而合。前文提到的Arrakis公司在过去两年多的时间里,构建专属于靶向RNA的小分子筛选和优化平台。它包括名为Tryst的RNA靶标筛选系统,能够预测RNA序列上适于小分子药物结合的位点和结构。以及帮助分析小分子药物与RNA之间的结合机制,并且选出最佳候选化合物进一步开发的Pearl-seq系统。

11ff.png

根据Michael Gilman博士的描述,这一药物开发技术平台目前已经可以开始大规模地针对多种不同RNA进行药物筛选。该公司聚焦的第一个RNA就是编码myc蛋白的mRNA。Myc是最初发现的致癌基因之一,在接近40年前就被发现了。它编码的转录因子myc蛋白与多种人类癌症相关。Arrakis公司已经通过多种筛选手段寻找与myc RNA结合的小分子化合物,用Michael Gilman博士的话来说,发现了靶向这一RNA的不少奇妙的化合物。Arrakis公司靶向myc RNA的策略是找到能够与RNA结合并且阻止RNA翻译生成蛋白的小分子药物。RNA在与核糖体结合翻译蛋白时,需要解开折叠的三级结构变为线状结构,而如果使用小分子药物将RNA的三级结构固定住,让它们无法被解开,就可以防止蛋白的产生。

而由Scripps研究所(The Scripps Research Institute) 教授Matthew D. Disney博士创建的Expansion Therapeutics公司,在开发靶向三联体重复序列的小分子化合物方面也获得了出色的进展。Expansion Therapeutics公司的药物开发工具能够将上千个不同的RNA结构与共价固定在芯片上的小分子化合物库进行相互筛选。这一筛选结果能够发现与特定RNA折叠结构结合的高质量的小分子化合物,它们可以被传统的药物化学手段进一步优化。

▲Expansion公司的筛选平台(图片来源:Expansion公司官网)

Matthew D. Disney博士领导的研究团队还开发出了能够切断富含CUG三联体重复序列的RNA的小分子化合物。CUG重复序列是导致1型强直性肌营养不良(DM1)的基因变异。DM1是一种无法治愈的神经肌肉疾病。在近日发表在《PNAS》的一项研究中,这种靶向CUG重复序列的小分子药物cugamycin,能够有选择性地靶向导致疾病的RNA结构,并且在临床前动物试验中改善小鼠的DM1症状。

11gg.png

在探索靶向RNA的小分子药物领域还有其它新兴生物技术公司。例如,去年,Skyhawk Therapeutics在麻省正式成立,力图开发靶向RNA的小分子药物,修复RNA剪接时出现的外显子跳跃错误,治疗神经疾病和癌症。该公司去年与新基(Celgene)公司达成为期5年的研发合作协议

目前,这些疗法仍然处于临床前开发阶段,它们仍然需要在人体中临床试验的验证。我们期待这一天尽快到来,让我们进一步疗法靶向RNA的小分子药物在治疗人类疾病方面的潜力。

参考资料:

[1] Expansion Therapeutics. Retrieved April 19, 2019, from https://www.expansionrx.com/expansion-repeat-disorders/

[2] Arrakis Therapeutics. Retrieved April 19, 2019, from http://arrakistx.com/

[3] Drugging the undruggable and other challenges on the road to targeting RNA with pills. Retrieved April 19, 2019, from https://www.statnews.com/2019/04/19/michael-gilman-rna-arrakis/

[4] Ribometrix. Retrieved April 19, 2019, from http://www.ribometrix.com/index.html

[5] RNA研发热!这12家公司正在推进靶向RNA的小分子药物. Retrieved April 19, 2019, from https://mp.weixin.qq.com/s/df8pOurBsAeXs57CBHuJOQ

[6] Nature深度综述:小分子靶向RNA,这些原则你都知道吗? Retrieved April 19, 2019, from https://mp.weixin.qq.com/s/kpIoktgNTmlXNCfz_mCosQ

[7] RNA-targeting Compound Shows Ability to Limit Muscle Damage in Early Myotonic Dystrophy Type 1 Study. Retrieved April 19, 2019, from https://musculardystrophynews.com/2019/04/08/rna-cutting-compound-shows-ability-to-limit-muscle-damage-in-myotonic-dystrophy-type-1-in-early-study/

[8] Warner et al., (2018). Principles for targeting RNA with drug-like small molecules. Principles for targeting RNA with drug-like small molecules. Nature Reviews Drug Discovery. https://doi.org/10.1038/nrd.2018.93.

Buffering transition

Aberrant aggregation of normally soluble proteins into insoluble amyloid is involved in the onset and the progression of many neurodegenerative diseases, including amyotrophic lateral sclerosis (ALS). In the neurons of ALS patients, aggregation of prion-like RNA-binding proteins (RBPs) usually occurs in the cytoplasm rather than the nucleus. To investigate what prevents prion-like RBPs from forming solid-like aggregates in the nucleus, Maharana et al. used in vitro phase-separation assays and fluorescence correlation spectroscopy to show that RNA blocks the liquid–solid phase transition of prion-like RBPs, which is a prerequisite of aggregation. In living cells, assemblies of prion-like RBPs are observed by reducing the concentration of nuclear RNAs, by increasing intranuclear protein expression, or by impairing the RNA-binding ability of proteins. Photobleaching experiments showed that RNA kept condensates formed by prion-like RBPs in a dynamic state and prevented the formation of solid pathological assemblies. Overall, the findings suggest that nuclear RNA buffers the phase separation behavior of prion-like RBPs, provide insight into the chemical characteristics of this type of protein, and deepen the understanding of the pathological cause of prion-like RBP-related neurodegenerative diseases.

https://www.ncbi.nlm.nih.gov/pubmed/29650702222.jpg

TDP-43相分离与神经退行性疾病

蛋白质内含体,包括一些错误折叠的蛋白质或者是蛋白质的碎片,在多种神经退行性疾病中都存在,例如老年痴呆(AD)、帕金森(PD)、额颞叶痴呆(FTD)、亨廷顿舞蹈症(HD)以及脊髓侧索硬化症(ALS)等神经退行性疾病的重要特征。这些错误聚集的蛋白具有高度无序的结构域(Intrinsically disordered regions, IDRs),IDRs有时候也被称作低复杂度(Low complexity, LC)结构域,IDRs是蛋白质能够进行相分离(Phase separation or phase transition)的重要标志之一。

在神经退行性疾病的病人样本中错误定位、错误折叠的蛋白中包括TAR DNA -binding protein 43 (TDP-43),TDP-43是ALS中运动神经元的神经退行性病变中会异常的蛋白质聚集物,是ALS以及FTD重要病理学标记。TDP-43具有一段IDRs,这种高度无序的结构域的存在给了科学家们一个提示,那就是TDP-43的突变而造成的神经退行性疾病机制是由于相分离。

近日,Neuron上背靠背发表了两篇关于TDP-43的研究从不同方面对TDP-43通过相分离对细胞坏死、细胞核内TDP-43清除、核质转运以及相分离的调控进行了解释。分别是来自于宾夕法尼亚大学Don W. Clevelan研究组的Cytoplasmic TDP-43 De-mixing Independent of Stress Granules Drives Inhibition of Nuclear Import, loss of Nuclear TDP-43, and Cell Death 以及来自于匹兹堡大学Christopher J. Donnelly研究组的RNA Binding Antagonizes Neurotoxic Phase Transitions of TDP-43

Xnip2019-03-07_22-44-33.png

那么什么是相分离呢?相分离是指能够进行自我组装成的无膜细胞器,比如P颗粒、核仁、应激颗粒(Stress granules)、Cajal小体以及一系列具有能够相互融合、具有最小表面张力、与溶液进行动态物质交换并且与液体性质类似的现象(『珍藏版』Cell发布“相分离”研究指南)。

Clevelan研究组发现在不同的细胞系的生理条件下,发现定位在细胞核中的TDP-43无论是内源抗体染色的或者是外源转入表达的会形成明显的相分离(图 1A-1B),并且形成的这种小颗粒能够进行融合和分离,同时也通过荧光淬灭恢复实验证明形成的颗粒具有很好的动态动力学特性。

222.jpeg

图1 生理条件下TDP-43形成的相分离现象。 A)TDP-43在不同细胞系中免疫染色的结果,绿色颗粒即由TDP-43形成的相分离聚集的液滴;B)荧光蛋白融合TDP-43在细胞内形成的相分离的液滴。

随着细胞衰老或者是病理过程中,由于核孔复合体的减少,细胞质内TDP-43积累量变多。为了模拟这种过程,Clevelan研究组构建了删除入核序列(NLS)的TDP-43,通过药物诱导后,存在于细胞质中的TDP-43能够形成明显的相分离的液滴,并且在胞质中形成液滴能能够招募正常转入细胞核内的TDP-43(用与删除NLS的不同的荧光蛋白分别标记),这给了作者一个提示,随着细胞衰老,这种细胞质中逐渐积累的异常的TDP-43的液滴会加速细胞核中TDP-43的清除,最终引起神经退行性疾病的发生。

为了模拟细胞受到胁迫的情况,作者使用亚砷酸盐在对细胞进行诱导,亚砷酸盐诱导会产包含RNA的应激颗粒,诱导后的不同时间后发现,虽然在最初TDP-43在细胞质中形成相分离的现象与应激颗粒的相伴产生,但是诱导之后通过更长时间地观察发现TDP-43产生的液滴与应激颗粒几乎不存在共定位,并且TDP-43形成的液滴也不会被应激颗粒特异性抗体所标记。说明TDP-43在细胞质中形成的液滴不依赖于应激颗粒的产生。但是亚砷酸盐诱导TDP-43与应激颗粒共定位的液滴与不依赖于应激颗粒产生的液滴相比,动态性要差很多,光漂白后几乎不能恢复(图2 )。因此,亚砷酸盐诱导后TDP-43会在细胞质中形成凝胶态的TDP-43液滴333.jpeg

图2 通过亚砷酸盐诱导产生的TDP-43液滴具有凝胶或者是蛋白聚集物的特定,动态型更差,光漂白后几乎不能恢复。

去掉NLS后定位在细胞质中的TDP-43在亚砷酸盐诱导后,会逐渐将正常的TDP-43从细胞核中清除出核,并招募到细胞质中形成的TDP-43的液滴中,并且最终造成神经细胞存活率明显的下降。在TDP-43突变造成的神经退行性疾病中,细胞质中TDP-43的积累是由于核质运输被破坏,从而导致细胞核中的TDP-43被清除出核,并且在此过程中细胞质中形成的液滴中磷酸化的TDP-43也大量积累。

当健康的神经元细胞在经历瞬时胁迫诱导后,会引起细胞质中形成TDP-43相分离的液滴,该液滴中富集磷酸化的TDP-43,并且会影响到核质运输。进一步的胁迫诱导或者是衰老的发生后会引起TDP-43形成胶状或者蛋白质聚集颗粒,造成神经元细胞核找到呢TDP-43被完全清除,最终造成细胞坏死引发神经退行性疾病(图3)。因此,找到目前有可能会引起TDP-43错误累积和错误定位形成的因素以及伴随发生的相关事件,很有可能会降低该基因突变或者是错误积累引发的神经退行性疾病。

444.jpeg
图 3 TDP-43相分离与最终引发神经元细胞坏死模式图

而Donnelly研究组从一个不同的角度对TDP-43的相分离以及与神经退行性疾病进行了研究。首先他们利用Cry2olig,从拟南芥中得到的一个隐花色素蛋白的一个光裂酶同源区域的一个变体,能够在蓝光诱导下发生多聚反应 (图4A)。他们将Cryolig加在TDP-43全长的N端,在没有了蓝光诱导的情况下,TDP-43主要集中在细胞核里,但是当进蓝光诱导的时候会发现,TDP-43会从细胞核中逐渐被清除并在细胞质中形成蛋白的内含体(图4B),在细胞之中形成的这种蛋白聚集小体动态性不佳。对这种聚集的内含体免疫荧光染色体发现其p62(ALS的病理性特征标志物)以及磷酸化TDP-43含量很高(图5),与ALS病人的脊髓组织切片的结果不谋而合。因此作者通过在将Cry2olig与TDP-43进行融合在细胞中建立一个能够良好的拟合神经退行性疾病中TDP-43形成相分离现象的模型。

555.jpeg

图 4 TDP-43与Cry2olig形成融合蛋白的模式图(A)以及蓝光诱导后TDP-43发生错误定位,从细胞核中被清除出来,并且形成蛋白内含体的相分离的现象(B)。

5556.jpeg

图 5 ALS病人脊髓切片的p62染色以及磷酸化TDP-43染色。

在2017年Brangwynne研究组建立的一种基于Cry2WT的检测蛋白LCD或者说是IDRs是否具有驱动相分离的能力光诱导系统:Optodroplet system【1】。Donnelly研究组他们使用的Cry2olig与Cry2WT功能上基本相似,但是CryWT对蓝光更敏感,发挥作用的作用的饱和浓度更低,相对来说更不可控,因此他们只将TDP-43的LCD放入该系统中进行测试,发现该LCD能够明显的产生可逆的相分离的液滴(图 6)。

666.jpeg图6 TDP-43的LCD能够在Optodroplet系统中蓝光诱导产生可逆的相分离的现象。

但是他们将TDP-43的全长放入Optodroplet系统中之后发现,并不能产生相分离的现象,但是原本的TDP-43的全长是有相分离的能力的。因此作者对该现象进行思考,将TDP-43全长放到Optodroplet系统后不能发生相变是否是由于TDP-43全长中存在RNA-binding domain(RNA-recognition motifs,RRMs)存在起的,因为已有实验发现,当包含RRMs的区域删除后,TDP-43形成蛋白聚集小体的能力会增强。因此作者首先将该RRMs区域单独拿出来放入Optodroplet系统中后发现,该区域不能发生相分离现象(图 7)。当把具有相分离能力的LCD区域与RRM区域放在一起时也并不能引起相分离现象,而将RRM中已知的能够显著降低TDP-43的RNA结合能力的五个位点突变后,RRM-LCD能够产生明显的相分离的现象,而这五个位点的突变其实并不是完全消除TDP-43的RNA结合能力而只是降低而已。

777.jpeg

图7 TDP-43的RNA结合能力阻止TDP-43的光诱导产生的相分离能力。

由此作者对于RNA对于TDP-43形成的相分离的调节作用产生了兴趣。为了验证该想法,他们体外纯化了TDP-43的全长以及TDP-43-5FL(包含RRM五个突变),并且合成了TDP-43特异结合的RNA序列,他们发现随着加入的RNA的总量的提升,TDP-43野生型全长形成小液滴的能力明显下降,但是TDP-43-5FL对于RNA的加入没有明显的响应(图 8)

888.jpeg

图8 TDP-43的RNA结合能力阻止TDP-43相分离能力。

总的来说,Donnelly研究组在活细胞中建立了更好的光控的研究TDP-43相分离的体系,并且他们发现RNA能够调控TDP-43的内含体的形成。他们的工作还建立了对于人类神经元细胞中异常的相分离的毒性作用。未来他们将致力于研究TDP-43有神经毒性相分离的其他特性以及造成神经退行性疾病的下游过程。最后他们发现的TDP-43特异结合的RNA的策略能够抑制异常的TDP-43的相分离的形成这一点,可能会为未来研究神经退行性疾病的治疗方案提供可能的参考思路。

以上的两篇文章都发现TDP-43的形成的相分离现象与不可逆的蛋白质聚集而造成的神经退行性疾病病理性特征进行了解释,也给神经退行性疾病的治疗提供了可能的方案。未来关于相分离的研究将给予很多疾病极其致病机理给出可能的阐释,也将有助于人类对于相关疾病的药物研发、治疗途径进行研究。

本文来自:http://www.jintiankansha.me/t/JKnSywp6iR

Cas12a和Cas13a在诊断领域的应用

CRISPR-Cas系统背景回放

面对噬菌体的威胁,细菌进化出了一套专门针对噬菌体或外源性遗传物质的CRISPR-Cas免疫系统。CRISPR全称为“簇状,规律间隔的,短回文重复序列”(Clustered Regularly Interspaced Short Palindromic Repeats),是由众多短而保守的重复序列区(repeat)和间隔区(spacer)组成。如图1所示: Repeat是细菌固有序列,能够同时结合Cas蛋白和spacer的序列,而spacer则是细菌(或是其祖先)感染过的病毒序列。一旦噬菌体感染发生,绝大多数的细菌死亡,极少部分的细菌由于其基因变异得以生存。这些细菌中的一部分,将噬菌体的DNA序列切割后,插入repeat区域中,形成spacer,从而获得类似高等生物“免疫记忆”的能力。

随着CRISPR-Cas机理的逐步揭示和新的Cas酶(Cas12/Cas13/Cas14)的发现,科学家们发现这个系统非常强大,可以精准高效地实现基因编辑,比如对某个基因的敲除、插入和替换等。而CRISPR系统的序列特异性识别能力已经被应用在越来越多的领域,如医药,食品,农业和工业生物技术等,这些应用很大程度上都是以Cas9为基础进行开发的,而新发现的Cas12/Cas13/Cas14不同于Cas9,使得CRISPR系统在病原体的快速诊断和肿瘤基因检测领域的应用成为可能。

Cas12a-单链DNA的“新魔剪”

今年4月,有CRISPR女神之称的Jennifer Doudna 教授在《Science》撰文指出Cas12酶家族在gRNA的引导下与目标序列结合以后,便会切换为激活状态,疯狂的切割体系内其它的单链DNA。Cas12a这一特点可被用于分子诊断领域,实现对肿瘤基因或特定病原体的检测。在体系内加入含有报告基团的单链底物后,如果Cas12a识别到靶序列(目标病原体或肿瘤基因)的存在,就会切割单链底物从而释放荧光报告基团。

Cas12a可以实现HPV的准确分型

但是如果样本中的目标基因含量非常少,Cas12a与gRNA复合物匹配到需检测靶序列的概率很低。此时就需要先扩增靶序列,提高需要检测底物的丰度。PCR(聚合酶链式反应)是常用于这一目的扩增手段,但是需要专门的PCR仪进行温控反应。而另外一种信号扩增技术——重组聚合酶扩增(RPA),可以在恒温状态下实现信号的扩增,而不需要复杂的升温降温过程。

Doudna教授创新性的将Cas12a靶向切割单链DNA的特性与RPA技术联合起来,开发了一种名为DETECTR的技术(DNA Endonuclease-Targeted Crispr Trans Reporter)。研究结果表明,肛拭子取样后等温扩增10分钟,使用Cas12a系统对扩增产物进行检测,可以在1小时内检测到人乳头瘤病毒(HPV)并准确区分两种相似的亚型,HPV16和HPV18。DETECTR技术的开发为实现病原体或肿瘤基因的即时检验(POCT) 又提供了一个强有力的支撑平台。

那CRISPR-Cas12a与CRISPR-Cas9有什么区别呢? Cas9是最早发现的Cas酶之一,也是目前为止研究最深入和应用最广泛的Cas酶,在基因编辑、疾病治疗等方面的应用前景巨大。然而CRISPR-Cas9缺乏切割单链核酸的酶活结构域,无法用于体外检测。而Cas12/13/14则普遍存在第二个酶活结构域,当蛋白正确结合到靶向序列时,能够激活这一结构域,切割探针小分子,实现从待检序列信息到荧光信号的转化。基于它们的这一特点,在医学检测领域需要靶向检测已知序列的情况下,能够实现普通实时定量PCR所无法达到的灵敏度,摆脱对实时定量PCR仪的依赖。

CRISPR13a:“异于常酶”的RNA切割机

2016年6月,张锋实验室与Eugene koonin实验室等合作在《Science》杂志上发表文章,首次描述了一种RNA靶向的Cas酶—C2c2(后称 Cas13a)。文章指出,在大肠杆菌中导入 CRISPR-Cas13a系统,由于Cas13a含有两个称为HEPN的保守RNA核酸内切酶结构域,该系统可以成功地切断噬菌体的核酸序列,帮助大肠杆菌抵御噬菌体的入侵。他们还发现,体外实验中,Cas13a与单链靶标RNA底物结合后,还可以 “附带切割”反应体系中其它的单链RNA分子。

9月份, Jennifer Doudna 教授在《Nature》发文,进一步阐述了Cas13a的分子切割机制。他们指出与Cas9不同,Cas13a具有将crRNA前体切割为成熟crRNA和切割靶向RNA的双重酶活性。当Cas13a正确与靶向RNA序列结合以后,其非特异性切割的特性便会被激活,进而切割体系中荧光报告基因,实现待检序列信息从荧光信号的转化。

“夏洛克”实现登革热、寨卡单分子检测

2017年4月,张锋团队在《Science》再次发表Cas13a的研究成果,根据Cas13a与靶标RNA结合后的“附带切割效应”,将其与反转录,重组聚合酶扩增(RPA)以及体外转录三项技术结合,开发了名为“SHERLOCK”(Specific High-sensitivity Enzymatic Reporter unLocking的缩写)的检测技术,实现靶标序列扩增后检测,进而显著提高该技术的灵敏度。“SHERLOCK”取自大众广为熟知的英剧《神探夏洛克》,寓意在该技术的协助下,医学检测能够像大侦探夏洛克的探案能力一样精准。

与Cas12a检测策略类似,两位科学家不约而同地使用了RPA进行靶标序列扩增,可以有效解决样本中的靶标序列少的问题。而不同点在于Cas12a检测的是DNA靶标,Cas13a检测的是RNA靶标,因而Cas13a必须在反应体系中引入体外转录系统。无论是Cas12a还是Cas13a,是否加入逆转录系统,取决于待检靶标是DNA还是RNA,如果是后者,则需要首先将靶标序列信息逆转录成为DNA,才能进行后续扩增与检测。

“夏洛克”+“哈德逊”实验室走向临场检测

今年4月,张锋参与,布罗德研究所Paridis Sabeti主导的关于CRISPR-Cas13a的研究成果在《Science》杂志以封面形式刊登。他们将“SHERLOCK”与一种新的技术“HUDSON”(Heating Unextracted Diagnostic Samples to Obliterate Nucleases)联合,实现了登革和寨卡病毒的即时检测,满足了脱离实验室的临场快速检测需求。在《神探夏洛克》里,房东哈德逊(Hudson)太太总是对夏洛克关爱有加。同样,对“SHERLOCK”检测技术而言, HUDSON技术就像哈德逊太太一样协助“夏洛克”,通过对临床样本的两步快速热处理和化学处理,实现灭活核酸酶和病毒的同时释放病毒核酸。

两种技术的联合使灵敏度增至1个拷贝/ml,不需要复杂的核酸提取技术,2小时内便可完成临床多种样本中的登革病毒检测。夏洛克与哈德逊技术的强强联合可实现如此令人叹服的快速,准确的便携式诊断。技术的革新浪潮也激起了我们更深一步的反思。在基因技术和万物互联的洪流之中,以技术为工具,以伦理为度量,以实现整个人类健康为使命,或是我们最期望到达的终点。CRISPR编辑婴儿基因为时尚早,体外诊断正当其时。

参考文献

1.Abudayyeh, O. O., Gootenberg, J. S., Konermann, S., Joung, J., Slaymaker, I. M., Cox, D. B., … & Severinov, K. (2016). C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector. Science, 353(6299), aaf5573.

2.East-Seletsky, A., O’Connell, M. R., Knight, S. C., Burstein, D., Cate, J. H., Tjian, R., & Doudna, J. A. (2016). Two distinct RNase activities of CRISPR-C2c2 enable guide-RNA processing and RNA detection. Nature, 538(7624), 270.

3.Gootenberg, J. S., Abudayyeh, O. O., Lee, J. W., Essletzbichler, P., Dy, A. J., Joung, J., … & Myhrvold, C. (2017). Nucleic acid detection with CRISPR-Cas13a/C2c2. Science, eaam9321.

4.Zuo, X., Fan, C., & Chen, H. Y. (2017). Biosensing: CRISPR-powered diagnostics. Nature Biomedical Engineering, 1(6), 0091.

5.Myhrvold, C., Freije, C. A., Gootenberg, J. S., Abudayyeh, O. O., Metsky, H. C., Durbin, A. F., … & Garcia, K. F. (2018). Field-deployable viral diagnostics using CRISPR-Cas13. Science, 360(6387), 444-448.

6.Sashital, D. G. (2018). Pathogen detection in the CRISPR–Cas era. Genome medicine, 10(1), 32.

Novel Vaccine Technologies: Essential Components of an Adequate Response to Emerging Viral Diseases

The availability of vaccines in response to newly emerging infections is impeded by the length of time it takes to design, manufacture, and evaluate vaccines for clinical use. Historically, the process of vaccine development through to licensure requires decades; however, clinicians and public health officials are often faced with outbreaks of viral diseases, sometimes of a pandemic nature that would require vaccines for adequate control. New viral diseases emerge from zoonotic and vectorborne sources, such as Middle East Respiratory Syndrome coronavirus and Chikungunya, and while these diseases are often detected in resource-rich countries, they usually begin in low- and mid-income countries.1 Therefore, part of the timeline for a vaccine involves surveillance and detection of new pathogens in remote areas and transfer of specimens to laboratories capable of vaccine development.

Development of vaccines for viral infections has historically been an empirical and iterative process based on the use of attenuated or inactivated whole virus. This requires unique methods of cultivation for each virus, development of animal models for vaccine testing, and a prolonged process of fine-tuning product formulation and immunogenicity, and for live-attenuated vaccines, pathogenicity. Thus, preclinical vaccine development can take years, followed by several more years of early-phase clinical testing and defining of dose and schedule. Moreover, efficacy testing and registration with regulatory agencies often takes another 5 to 10 years. In total, 15 to 20 years would be a typical timeframe from virus discovery to vaccine availability if the process proceeds smoothly and there are no major biological or logistical challenges.

Fortunately, during the last decade, there have been substantial technological advances for conceiving, developing, manufacturing, and delivering vaccines. Rapid genetic sequencing allows both early identification of new pathogens and the identity of the genes encoding structural proteins that can form the basis for vaccine immunogen development. Also, rapid isolation of human monoclonal antibodies has proven to be extremely helpful in defining epitopes that are the targets of protective immunity.

Additional tools of modern vaccinology include (1) delineation of atomic-level structures of viral proteins that facilitates structure-enabled immunogen design and protein engineering; (2) cell sorting and sequencing technologies that allow single-cell analysis of immune responses; and (3) genetic knock-in technologies that allow construction of animal models with human antibody genes for vaccine testing. These tools have already provided the potential not only for solving long-standing problems in vaccinology, such as the development of a new candidate vaccine for respiratory syncytial virus, but they have facilitated rapid development of new candidate vaccines for emerging pathogens such as the Zika virus and pandemic strains of influenza virus. Synthetic vaccinology and platform manufacturing are important innovations that can speed the initial vaccine immunogen design and vaccine development process, and shorten the time needed for manufacturing and initial regulatory approval to begin phase 1 testing.

Synthetic vaccinology is the process of using viral gene sequence information to accelerate vaccine development.2 For example, if a new influenza virus emerges anywhere in the world and is identified through genomic sequencing, the digitally transferred information can be used to synthesize nucleic acids encoding the viral surface proteins (hemagglutinin and neuraminidase). The process of gene synthesis is now extremely rapid and relatively inexpensive. Thus, within a few weeks, DNA plasmids encoding viral proteins can be available for preclinical testing. These genetic vectors (DNA and mRNA) can be used directly for immunization whereby intramuscular immunization leads to muscle cells producing the viral proteins. Alternatively, the genetic vectors can be used to express recombinant protein antigens, in vitro, that can be used for immunization.

Similarly, if an outbreak of a new flavivirus becomes an epidemic or even a pandemic threat, as with Zika in 2015, the gene sequences that encode the viral surface proteins premembrane and envelope can be rapidly identified and form the basis for vaccine immunogen design strategies, based on prior knowledge of flavivirus structure and mechanisms of neutralization.3 Once a structurally authentic immunogen is available, the protein or genetic vectors encoding the protein can be used to immunize animals. In addition, the vaccine proteins can be used as probes to identify monoclonal antibodies secreted by B cells of convalescent humans. Such antibodies are valuable not only for refining vaccine immunogen designs, but also for development of diagnostic assays and potentially for use in passive transfer as therapeutic agents. Thus, development of reagents, diagnostics, candidate vaccines, and immune assessment assays can be done without having the actual virus in hand. This has particular value for viruses with extreme pathogenicity because it avoids the need for high-level containment in laboratory and manufacturing facilities.

Platform manufacturing technologies allow more rapid production and clinical implementation once the vaccine immunogen design is established. The term platform is used in many ways; however, in vaccine production, it implies that the method for generating and presenting a vaccine immunogen can be applied across multiple pathogens. In essence, the cell substrates, production approach, purification processes, and analytical assays used as release criteria for products made under current Good Manufacturing Procedures are the same even though the immunogen may change. DNA or mRNA nucleic acid vaccines are good examples of how platform manufacturing can shorten timelines from pathogen identification to phase 1 clinical trials.4 DNA vaccine delivery and immunogenicity have evolved and improved over the last 2 decades, making it a viable platform for vaccination.

For DNA plasmid vaccines, the manufacturing process is well established, and their toxicity profile is well understood. The National Institute of Allergy and Infectious Diseases Vaccine Research Center has developed candidate DNA vaccines for several viral disease threats during outbreaks, including SARS coronavirus in 2003, H5N1 avian influenza in 2005, H1N1 pandemic influenza in 2009, and most recently for Zika virus in 2016. Once these pathogens were identified, the time from viral sequence selection to initiation of the phase 1 clinical trial was shortened from 20 months to slightly longer than 3 months .

Other examples of vaccine platform technologies include viral vector–based approaches where genes encoding viral proteins are incorporated into viral vectors (eg, adenovirus, poxvirus, vesicular stomatitis virus, or paramyxovirus vectors) for gene-based immunogen expression and delivery, or chimeric replication-competent viruses in which the vaccine antigens of one virus are expressed in a common replication-competent virus allowing uniform manufacturing processes (eg, yellow fever or other flavivirus antigens expressed in dengue virus, or human parainfluenza or pneumovirus antigens expressed in bovine parainfluenza or Sendai virus vectors).

Traditional approaches, such as live-attenuated virus vaccines (eg, Sabin polio) or whole-inactivated virus vaccines (eg, Salk polio) would not qualify as platform approaches because the requirements for growth in cell culture and purification are usually different among virus families. Protein-based approaches are also likely to have different requirements for purification and formulation, and they may not be amenable to platform approaches unless the display of proteins on nanoparticles or other carrier systems brings more uniformity to downstream manufacturing approaches. Having a standard manufacturing approach reduces the time needed for current Good Manufacturing Procedures process development and simplifies regulatory approval because the safety database that has accumulated for a given platform can be applied to multiple vaccine products.

In summary, emerging viral diseases with pandemic potential are a perpetual challenge to global health. The time-honored approach to vaccinology, which depends predominantly on isolating and growing the pathogen, has not adequately met this challenge. To effectively prepare for and respond to these continually emerging threats, it will be critical to exploit modern-day technological advances, preemptively establish detailed information on each family of viral pathogens, and invest in more infrastructure for surveillance in developing countries to expedite pathogen identification and jump-start the process of vaccine development using these new technologies.2 Failure to do so will result in the untenable situation of not optimally using vaccinology in the response to newly emerging infectious disease threats.

via: JAMA. 2018;319(14):1431-1432. doi:10.1001/jama.2018.0345

[Refs]

[1] Jones KE, Patel NG, Levy MA, et al. Global trends in emerging infectious diseases. Nature. 2008;451(7181):990-993.
[2] Graham BS, Sullivan NJ. Emerging viral diseases from a vaccinology perspective: preparing for the next pandemic. Nat Immunol. 2018;19(1):20-28
[3] Dowd KA, Ko SY, Morabito KM, et al. Rapid development of a DNA vaccine for Zika virus. Science. 2016;354(6309):237-240.
[4] Ulmer JB, Geall AJ. Recent innovations in mRNA vaccines. Curr Opin Immunol.2016;41:18-22.

Cell发布“相分离”研究指南

原文链接:

https://www.sciencedirect.com/science/article/pii/S0092867418316490

细胞是生物体结构和功能的基本单位,细胞内的各种组分如何在正确的时间以及空间上聚集以执行其相应的功能,是细胞在一系列基本的生命活动中需要解决的问题。为此,细胞进化出了一系列的细胞器,包括有膜包裹的(比如线粒体,细胞核,溶酶体等)和无膜包裹的细胞器(核仁等)。有膜包裹的细胞器将特定蛋白、核酸等物质包裹起来,以在特定的空间内执行其功能,如果这些蛋白或者核酸脱离特定的位置,将导致严重的后果(比如,细胞色素C释放到细胞质,将导致细胞凋亡,核酸释放到细胞质,将导致innate immune signaling pathway 的激活)。另一类无膜包裹的细胞器是如何形成,以及其物理化学本质,是困扰了大家多年的问题

Hyman和Brangwynne 2009年在Science发表了题为:Germline P granules are liquid droplets that localize by controlled dissolution/condensation 的文章【1】,提出了细胞内通过“相分离”,可以提供一种特定的方式让细胞内的特定分子聚集起来,从而在“混乱的”细胞内部形成一定“秩序”,为困扰了大家多年的问题,提供了全新的思路。近几年的研究表明,液-液相分离(LLPS:liquid-liquid phase separation)可能是细胞形成无膜细胞器的物理化学基础,比如细胞内的p granule, nucleolar, stress granule等(图1)。

111.png

图1:细胞的的相分离结构。(引自参考文献【2】)

LLPS也被报道在一些疾病(如:癌症,以及神经退行性疾病等)的发生发展过程中起着非常重要的作用。相分离领域已经成为生命科学领域研究的热点,相关的文章近年来呈现井喷似的增长(见文末的延伸阅读:Bioart 系列解读)。(另外,Brangwynne在iBiology上面有三期的讲座,对这个领域的历史,以及其实验室相关的此方面的开创性工作做了非常具体的介绍,题目为:Liquid Phase Separation in Living Cells。链接:https://www.ibiology.org/biophysics/liquid-phase-separation-in-living-cells/  )

Cliff Brangwynne (Princeton & HHMI)在iBiology上的介绍相分离研究

然而,像许多新兴的研究领域一样,该领域的研究对于实验的设计,以及实验方法还没有统一的标准和相关的指南,对于该领域的研究造成了一定的困扰,亟需建立一个从理论体系到具体实验设计的统一的标准。近日Simon Alberti, Amy Gladfelter, 和 Tanja Mittag联合在Cell 杂志发表了题为:Considerations and Challenges in Studying Liquid-Liquid Phase Separation and Biomolecular Condensates的文章,较为系统地阐述了LLPS相关的理论基础,提出了LLPS的体内体外的实验设计方法的具体指南。

222.png

相分离具体的作用可以参考另一篇由Hyman以及Michael Rosen执笔的发表在Nature reviews molecular cell biocology上的题为 Biomolecular condensates: organizers of cellular biochemistry的文章,这篇文章更加系统的介绍了相分离的原理,以及其目前所报道的生物学功能【2】。另外,Simon Alberti等人在2018年,发表了另一篇题为A User’s Guide for Phase Separation Assays with Purified Proteins,非常细致地阐述了体外相分离实验的纯化蛋白的考虑以及一些相关的tips【3】。建议读者在具体的实验中可以参考此文章。这篇Cell文章更倾向于整体实验设计指南,为相关领域的研究提出一些标准化的实验步骤,也提出了一些这个领域还需要进一步阐述的机制,以及相分离的研究的根本目的,即要研究其生物学的具体功能。

相分离的基本概念,相分离的形式,以及如何预测一个蛋白是否会形成相分离现象

LLPS的发生,高度依赖溶液中生物大分子(比如:蛋白,DNA以及RNA等)的浓度、物理化学性质,以及溶液所处的环境,比如:温度、pH、盐离子浓度、盐离子类型以及溶液中存在其它的生物大分子。作者使用如下的Phase Diagram来描述这些与相分离相关的条件与该溶液是否发生相分离的关系(图2)。333.png

图2:Schematic Phase Diagram

简单来讲:当溶液中所处的大分子浓度低于一个特定的值c时,这一体系无论在什么样的温度,pH等条件下都不能发生相分离。当高于这一浓度后,在合适的pH以及温度等条件下,就能形成相分离现象,形成相分离后,该生物大分子便有两种存在形式,一种是在溶液中的低浓度状态,一种是形成的“液滴”中较高浓度的形式存在。随着相关条件的变化,两种形式可以相互转化。也就是说,相分离是一种高度动态的过程。

另外,LLPS不仅能形成液滴状的结构,还能继续转变为胶状物的形式。凝胶状态的相分离经常不可逆转,这也为阿尔兹海默症等体内形成的amyloid-like fibers的形成,提供了全新的思路,为相关药物的设计提供了全新的理念。已有研究表明一些蛋白的突变会加快这种LLPS向凝胶状态转化的过程。

液-液相分离的发生是蛋白质和核酸在某种特定的情况下的一个普遍特性,然而多数相分离根本不可能发生在一个正常的细胞中。正如仅有一小部分蛋白质能够在生理状态下发生淀粉样改变那样,仅有一小部分蛋白质的序列具有在活细胞内形成相分离的能力。截至目前为止,我们对控制相分离的基因学及生物学特性仍所知甚少。因此,在断定相分离的发生时,我们应该尤其谨慎。

近几年涌现了许多对在生理状态下能够发生相分离的分子的普遍特征的研究。其中之一即是支架及客户蛋白的理论。支架分子被认为是相分离的驱动分子,而在相分离形成以后参与到液滴当中的则被称为是客户蛋白。支架蛋白与客户蛋白的相分离需要一个互作网络的形成,该网络常由蛋白质-蛋白质互作及蛋白质-RNA互作构成。

两种蛋白质类型参与促进此类互作网络的形成:一类以多个折叠的结构域为特点,如Nck蛋白中的SH3结构域,该结构域能够与短的线性模块如SLiMs相结合;另一类蛋白则以内部无序区(IDR)为特点。两类蛋白有多处相似,其中最重要的一点是:蛋白间都通过多个结构域或模块相互作用。因此,通过从基因上初步推断蛋白的化合价可以判断该蛋白形成相分离的能力及饱和浓度。对RNA而言,特定的RNA可驱动相分离的发生,而一些含有IDR的蛋白包含多个RNA结合结构域、其目的RNA也包含多个蛋白结合序列。因此,蛋白和RNA能够通过多种方式形成多价互作,这些互作决定着特定蛋白及核酸形成相分离的能力。

多价的蛋白质互作网络如何发生相分离这一问题,可以从高分辨率的结构数据中得到结果。然而,IDR是如何驱动相分离的则相对不那么容易理解。

IDR是相分离蛋白中一种常见的结构域,常常不含有芳香族及脂肪族氨基酸,且不能够形成一个相对能量较低的、单一的折叠结构。相反,这些蛋白的构象能量往往与其一级序列所含有的能量相同。一级序列往往决定了这些蛋白的相变能力、相变的驱动因素、相变的临界浓度及黏弹性。能够影响相分离的序列特征包括IDR的长度、数量、模式、及IDR与IDR间序列的特征。典型的决定因素包括疏水性氨基酸的组成、模式等。尽管IDR中疏水性氨基酸的含量相对较少,他们代表了相分离中的粘附成分、并根据温度变化调控相分离的浓度。带电氨基酸也能够影响相分离的形成,成对的、带相反电荷的蛋白能够以复合凝聚的形式形成沉淀。

IDR的另一个共同特征是由低复杂度序列区(LCR)构成,如单一氨基酸的重复序列。一个典型的LCR是朊病毒样LCR,富含极性氨基酸如丝氨酸、酪氨酸、谷氨酰胺、天冬酰胺,不含有带电氨基酸。另一个典型的LCR是RNA结合蛋白中常见的RGG结构域,富含精氨酸,且能够调控LCR/RNA间的相互作用。LCR间相互作用的基础是电荷-电荷、共轭-共轭、阳离子-共轭互作。

因此,目前常应用预测算法推测蛋白质中的IDR以分析其发生相分离的能力(图3)。

444.png

 

图3:蛋白序列分析以及相分离预测工具

以FUS为例,IUPred算法鉴定前250个氨基酸、aa365-420、aa450-C端为潜在IDR,PLAAC算法鉴定到FUS N端含有QGSY-及G-重复序列,D2P2算法则区别了FUS的可折叠结构域与无序结构域。Anthony A. Hyman教授的团队则对这一预测算法的结果进行了实验验证。

体外重构相分离

相分离可以在体外通过纯化的蛋白以及核酸等在特定的条件下发生,体外重构的相分离实验对于该领域具有重要的作用(2012年,美国西南医学中心Michael Rosen和Steven McKnight独立发现在试管中这些分子通过微弱的作用力形成液滴,即首次证实了相分离能够通过简单的生化实验在体外重复【4, 5】。清华大学李丕龙教授便是Michael Rosen 这篇文章的一作)。

体外的相分离现象可以非常简单的使用普通的光学显微镜观察,发生相分离的特点是溶液会从澄清变得浑浊,镜检时会看到在溶液中会存在一些如水中的油滴状态的液滴。作者建议使用PEG或者lipids包被载玻片,以更好的观察和记录相分离现象。另外,可以将蛋白或者RNA使用荧光标记,或者将不同的组分通过不同的荧光分开标记,以方便使用荧光显微镜或者激光共聚焦显微镜拍摄出更好的图片,同时可以做FRAP等实验,以及持续拍摄获得LLPS动态变化过程。但是在具体应用此方法的时候要注意,一些RNA与RNA结合蛋白之间可能会被拍摄中的激光照射所交联,在拍摄的时候需要注意此类问题。

除此以外,可以使用检测溶液浑浊度的方法检测相分离,也可以使用离心沉淀的方法检测相分离现象。实验相关的具体的细节,推荐大家阅读香港科技大学张明杰教授题为Cell Phase transition in postsynaptic densities underlies formation of synaptic complexes and synaptic plasticity Cell文章中图4中的实验【6】。(另:Bioart专门邀请了张明杰教授过去的博士,现为复旦大学生物医学研究院PI 温文玉教授对张明杰教授的工作进行了系统解读,见延伸阅读)

另外,体外相分离的体外实验存在一定的局限性。体外相分离实验最大的优势是在于其各个组分以及各个组分的浓度,以及各种外在条件(比如:温度,pH等条件)可以被严格控制,更加方便我们去研究相分离现象。因此,需要注意的是,相关的蛋白以及RNA、DNA等各个组分的纯度是至关重要的。在此,作者提出了用于体外相分离实验的蛋白以及核酸样品的表达纯化,保存以及样品处理的标准。

蛋白纯化的考虑

用于体外相分离实验的蛋白可以在E.coli,酵母或昆虫细胞中表达纯化,或者通过体外转录/翻译系统得到。在能够发生相分离的蛋白中普遍存在无序区域(IDR:Intrici Disordered Region)在纯化的过程中非常容易被降解,因此在纯化此类蛋白时需要特别加以注意。含有IDR的蛋白质通常需要在纯化体系中加入蛋白酶抑制剂,需要快速纯化以防止蛋白的降解和蛋白聚集。另外,纯化此类蛋白可以让目的蛋白大量表达进细菌包涵体里面,这种方式可以防止被宿主体内的蛋白酶降解。但是,需要注意的是,在进行相分离实验之前,必须将蛋白复性到生理条件的缓冲液中。另外,表达蛋白的时候带上一些助溶的标签,比如MBP等,可以帮助得到较好的蛋白,但是在进行相分离实验的之前,需要将这些标签切掉,以免这些标签带来一些影响。

另外需要注意的是,蛋白质的翻译后修饰(PTM)对于相分离是非常重要的,在细菌表达纯化出来的蛋白一般翻译后修饰非常少,在真核细胞纯化的蛋白一般具有较好的翻译后修饰,建议真核表达的蛋白在做实验之前,最好先使用质谱等方式鉴定其翻译后修饰,以保证实验的可重复性以及利于更深入地了解该蛋白发生相分离的具体的分子机制。

在纯化的过程中,一般要避免目的蛋白发生相分离现象,但是如果发生相分离后,可以被一定的条件重新溶解,也可以通过利用这种反复发生相分离,溶解的方法来纯化该蛋白。纯化出来的蛋白需要用SDS-PAGE以及质谱等方式鉴定是否确实是目的蛋白以及鉴定其纯度。我们建议将蛋白保存在不发生相分离的Buffer中,保存Buffer通常在中性的pH, 使用高浓度或者非常低浓度的盐离子来防止发生相分离现象。同时需要加入还原剂。常用的Buffer体系如下(pH 缓冲体系:50mM  HEPES pH7.5, 盐离子:300 mM NaCl或者500 mM KCl, 或者不加盐离子,还原剂:1mM TCEP 或者5mM DTT )。我们建议将蛋白分装后使用液氮速冻以后保存在低温的条件下,不要冻融蛋白,冻融会导致蛋白的聚集以及部分变性,影响其性质。在做相分离实验的时候,需要注意,对于每一个蛋白而言,都需要优化相应的PH,盐离子浓度以及温度等条件,尽量使其接近生理条件。详细的一些tips, 建议读者阅读另外一篇文章(Alberti et al., 2018)A User’s Guide for Phase Separation Assays with Purified Proteins

RNA来源

许多相分离需要使用RNA。 RNA可以使用体外转录或者直接化学合成。体外转录是较长的RNA非常好的来源,短链RNA可以直接从一些公司购买。 可以将RNA使用荧光标记,从而更方便的检测RNA与蛋白相分离的现象。

Macromolecular Crowders

另外,相分离对物理化学条件的变化非常敏感。温度,蛋白,核酸或盐浓度的微小差异也会导致不同的结果。因此,体外相分离实验应精确控制缓冲液的体系和蛋白浓度。

在相分离的实验中,经常使用到一些Macromolecular crowders,比如:PEG,Dextran或Ficoll。很多情况下,添加到实验中的crowder的量可能超过存在于细胞内的环境。目前,crowder是如何促进相分离发生的具体的机制仍不明确,因此,应当慎重使用crowder。如果使用crowder,我们建议使用多种crowder做实验,以排除由于Crower带来的假阳性的结果(私底下听到过有老师提到,可能80%左右的蛋白在有Crower存在的时候都可以发生相分离)。

小分子对于相分离的影响

在体外相分离实验体系中,可以测定LLPS是否改变酶的活性,从而研究相分离的生物学意义。在一些体外实验中,通常需要将高浓度的小分子化学物质(例如激酶抑制剂,甲基转移酶抑制剂等)添加到相分离体系中。此时应该考虑到,这些高剂量的小分子化学物质,除了本身对酶活的影响外,还可能对相分离产生直接作用,从而影响了酶活。

在体外,IDR通常足以介导相分离的发生。在高浓度下相分离的IDR的例子:Ddx4,LAF-1,FUS,hnRNPA1和Whi3的IDR。这些结果提示了这些IDR能够自主地通过homotypic interactions驱动相分离的发生。越来越多的证据显示,与同一多肽或其他蛋白其他区域的heterotypic interactions也可以驱动相分离的发生。另外,简单的单组分或双组分系统建立的理论如何扩展到细胞中的更复杂的混合物中,是亟待解决的问题。

相分离的物理特性

相分离液滴的物理状态变化极大。从液滴样直至多孔固体或胶体,其特性取决于相分离中的分子构成、时间、液滴的稳定程度、淬灭深度等。RNA的参与也可以影响相分离液滴的物理状态,但由于RNA既提供了多价结合位点、又贡献了静电,尚不清楚RNA究竟使其更流体化还是更固态化。由于不可逆的固态常被认为是一种病理状态,因此调控相分离的物理状态的因素仍有待进一步探究。体外重现相分离时,有多种方法可对液滴的物理性质进行具体描述。最直接的方法即测量表征表面张力的反毛细管速度,辅以被动微流变学,可推测液滴的表面张力。液滴表面与盖玻片间的接触角亦可表征液滴表面张力及表面的化学性质。

被动微观流变学通过在液滴内部置以珠子、测量该珠子的均方位移的方法表征液滴的物理性质。该方法受到多种因素的影响如液滴的组成、珠子的材料及大小、珠子表面的钝化程度、显微镜设备的漂移等。当相分离液滴极其粘稠乃至类似于胶体时,则可使用原子力显微镜或光镊来测量液滴内部的硬度。此外,还可使用荧光标记的右旋糖苷来检测相分离液滴中聚合物网格的孔径大小,以描述液滴的物理状态。

荧光漂白恢复实验(FRAP)常用于测量液滴的流动性,且其恢复时间因蛋白/RNA的不同而表现出很大的差异,尤其当对比蛋白质与RNA形成的相分离时,由于RNA的结构相对较刚性,其相分离液滴的FRAP时间往往更长。尽管FRAP是一个非常常见的实验,其应用的限制性却常常被忽略了。如FRAP的恢复时间不仅仅取决于液滴的稀释度,还取决于被光漂白的液滴大小、内部流动性、光漂白区域的大小等。如果条件允许,半荧光漂白恢复实验可以提供更多的关于液滴内部流动性的信息。此外,FRAP还可以用于评估液滴内部的同质性,但不能作为确定某一结构的形成机制是相分离的依据。

为更准确的估计相分离液滴内部某一分子的扩散能力,荧光相关光谱可作为检测手段之一。此外,偏振荧光显微镜可检测相分离液滴内部纤维性或固态样结构的各向异性成分。

检测细胞内的相分离现象

目前相分离领域的一个难点就是如何去鉴别细胞内的一些特殊的结构是否真的是相分离形成。在体外特定条件下,一些蛋白和RNA在足够的浓度或者合适的Buffer的情况下会发生相分离现象,通常通过在细胞内过表达这些蛋白,观察到形成较大的,球状的结构,以此来推测细胞内低浓度的该蛋白仍然会形成相分离,只是在普通的光学显微镜下无法检测到而已。然而,相分离需要足够浓度才能发生,因此,在使用过表达蛋白检测相分离的时候一定要考虑到外源过表达带来的影响。同时应该致力于寻找除过度表达之外的其他方式去证明细胞内确实发生了相分离现象。

目前被大家所接受的认为是相分离结构的的标准:形成球状结构,能够融合,同时使用FRAP技术,证明其能够发生荧光漂白恢复(图4)。但是FRAP实验并不是证明LLPS发生的金标准,仍然存在很多问题。

555.png

 

图4:检测相分离的方法

体内相分离液滴的物理特性

体外重现相分离现象时,对液滴物理特性的研究手段不胜枚举:如可利用延时显微镜测量液滴的反毛细管速度、通过FRAP测量液滴的流动性。然而如何探究体内相分离液滴的物理性质仍是一个难题。目前已有的手段如基因编码的纳米粒(GEMS)、荧光共振能量转移技术(FRET)、定量相显微镜(QPM)、折射率层析成像技术及更先进的布里渊显微镜等都可对体内相分离液滴的物理特性进行初步描绘。此外,体内相分离液滴的生物学功能也是研究的重点之一,如何改变液滴的性质以观察其功能,是未来的研究重点。

应用高分子化学来指导相分离研究的可行性及局限性

液液相分离研究的目标之一,是建立能够解释和预测大分子相分离现象的理论体系,并通过一级序列预测相分离的饱和浓度、刺激因素、液滴状态。高分子化学中的弗洛里赫金斯理论描述的是由焓介导的均聚合物从贫溶剂中析出的化学基础,其扩展理论考虑到了这一过程中的静电力作用。无规相近似方法则仅仅考虑了带电氨基酸的序列特征对杂聚体的形成的影响。此外,通过近似模拟也可以对相分离现象的机制进行进一步发掘:对单分子蛋白的模拟已能够较准确的说明其序列和功能间的关系,然而对成百上千个分子构成的相分离现象的建模及分析仍是目前的难点之一。对多组分相分离系统的粗粒化模拟初步解释了多层无膜细胞器如核仁形成的物理机制:蛋白-蛋白间、蛋白-RNA间的相互作用由序列决定;而不同细胞组分之间的相互作用呈互斥或亲和的状态,则可最终导致非随机的多层结构的形成。

算法模拟及理论体系可作为相分离现象实验数据的有力补充。反之,体内相分离现象的实验描述可作为算法模拟的数据库、并从中产生能够描述多分子复杂相分离现象的新的理论。

相分离到底意味着什么?研究相分离生物学功能

相分离研究的重点还是在于对其生物学功能的阐述。作者对目前报道的生物学功能进行了归纳总结(图5):

666.png

图5:相分离功能总结

1. LLPS可以感知环境的变化,并对环境的变化做出快速响应。这种响应比通过细胞内的转录以及翻译过程更加快速。目前的一些研究已经证明,LLPS可以感知温度以及pH, 另外,还可以用于感知细胞内外源的DNA(cGAS相分离)

2. LLPS可以用来调节相关蛋白在细胞内的浓度。LLPS可以将高浓度的蛋白以液滴的形式储存起来,在细胞需要的时候将该蛋白释放到细胞环境中。

3. LLPS可以形成局部的高浓度蛋白,从而激活一些生化反应,激活相关信号转导途径以及促进细胞骨架的形成。

4. LLPS可以将一些蛋白与其底物隔离,从而抑制细胞内的一些生化反应过程。

5. LLPS可以介导一些蛋白定位到已经存在的一些无膜包裹的细胞器中。

6. LLPS的特殊结构可能对于细胞的形态起着重要作用。

7. LLPS可以介导形成一些孔状结构,比如核孔。

近年来,相分离领域已经成为生命科学领域研究的大热点,更多的相关的文章还在持续不断的发表,该领域对于揭示一些细胞的基础生物学问题以及对一些疾病的发生发展都将提供全新的思路。就在解读这篇文章的过程中,注意到Michael Rosen组在生物学预印本bioRxiv上online了一篇题为Organization and Regulation of Chromatin by Liquid-Liquid Phase Separation文章,说明了相分离在染色体的结构上起着重要的作用。

777.png

另外,去年12月份在深圳举行的的华人生物学家双年会上(The 12th biennial meeting of Chinese Biological Investigators Society ),诺奖得主Yoshinori Ohsumi 介绍了其实验室最新的研究进展, ATG13与ATG17结合能够发生phase separation的现象,揭示了autophagy中几个关键的conjugation system形成的物理学基础,作者发现突变了关键位点的 ATG13 与 ATG17 不再能形成相分离结构。从体外纯化的蛋白以及在细胞内都验证了这一理论。另外,去年清华大学俞立教授与李丕龙教授今年在 Cell research 上发表 了 Polyubiquitin chain-induced p62 phase separation drives autophagic cargo segregation的文章,报道了autophagy的adaptor protein P62能够发生相分离现象, 而且此现象能够介导 autophagy 对于底物的选择性。这些研究都将进一步拓展我们对于autophagy等领域的认识。

1.    C. P. Brangwynne et al., Germline P granules are liquid droplets that localize by controlled dissolution/condensation. Science 324, 1729-1732 (2009).

2.    S. F. Banani, H. O. Lee, A. A. Hyman, M. K. Rosen, Biomolecular condensates: organizers of cellular biochemistry. Nat Rev Mol Cell Biol 18, 285-298 (2017).

3.    S. Alberti et al., A User’s Guide for Phase Separation Assays with Purified Proteins. J Mol Biol 430, 4806-4820 (2018).

4.    M. Kato et al., Cell-free formation of RNA granules: low complexity sequence domains form dynamic fibers within hydrogels. Cell 149, 753-767 (2012).

5.    P. Li et al., Phase transitions in the assembly of multivalent signalling proteins. Nature 483, 336-340 (2012).

6.    M. Zeng et al., Phase Transition in Postsynaptic Densities Underlies Formation of Synaptic Complexes and Synaptic Plasticity. Cell 166, 1163-1175.e1112 (2016).

本文来自bioart,编译丨QY、赤贞

To win at gene therapy, companies pick viruses with production credentials

Nature Biotechnology volume 37, pages 5–6 (2019)

Is a spontaneous chemical change on the proteins that coat an adeno-associated virus (AAV) a problem for gene therapy developers? According to a recent controversial paper from the lab of gene therapy pioneer Jim Wilson, professor of medicine and pediatrics at the University of Pennsylvania, the hitherto overlooked phenomenon of protein deamidation can affect the capsid of AAV, the vector most widely adopted in gene therapy, reducing the efficiency with which vectors enter their target cells. Because the reaction is unpredictable it may impair lot-to-lot consistency in manufacturing. “My first response to the paper was here we go again,” says Michael Linden, newly appointed CSO of Hampton, UK-based Touchlight Genetics. “Let’s see if this one explodes.”

The findings (Mol. Ther. https://www.sciencedirect.com/science/article/pii/S1525001618304544) have not been universally accepted. But at the same time, Linden adds, the paper serves to highlight the lack of standardization in gene therapy vector analytics and manufacturing. Every lab has its own way of measuring vector titer and purity. “These things have probably a much bigger effect on the potency of the vector than deamidation,” he says.

In protein science, deamidation is a well-known post-translational modification. As it can affect the potency and stability of monoclonal antibodies, it is included as a design consideration in monoclonal antibody development . But for gene therapy vectors, its potential impact has been largely ignored. “At a minimum, the study is important because it is something that has to be explored more fully,” says Mark Kay, professor of pediatrics and genetics at Stanford University. Isolating the precise contribution that deamidation may—or may not—make to vector yield and potency during production is a difficult undertaking, given the complex matrix of parameters that influences the final product. “There is a lot that is unknown about manufacturing and how that affects potency,” says Kay. “You can use the same manufacturing method and have two lots of vector that give you similar levels of titer but different biological activity.”

One significant parameter is formulation—the precise composition and concentration of the excipients used to package the vector. “This causes huge variability between different manufacturing groups,” Kay says. Formulation specialists, such as Martinsried, Germany-based Leukocare, have developed algorithms to optimize the selection of excipients in order to maximize stability and avoid issues such as aggregation and oxidation. “We are able to substantially stabilize viruses regarding their functionality,” says CEO Michael Scholl. Purification methods influence vector potency as well. The final product may also contain DNA from the bacterial plasmids used to transfect the producer cell line, as well as residual components of those cells, both of which can also influence activity.

All of these issues are beginning to take on added importance as the gene therapy sector moves into its early stages of maturation. The first therapies are now on the market, and approvals for several others are in the offing. “Manufacturing expertise is becoming 80% of the question; [clinical] efficacy is 20%,” according to a recent report from a roundtable session hosted by the investment bank Jefferies Financial Group. Gene therapies that require systemic delivery, for conditions such as hemophilia, Duchenne muscular dystrophy and spinal muscular atrophy, need far higher production efficiency than does a product for localized use—to treat the eye, for example. Luxturna (voretigene neparvovec-rzyl), Spark Therapeutics’ FDA-approved AAV2-based gene therapy for retinal dystrophy (Nat. Biotechnol. 36, 6, 2018), is administered at a dose of 1.5 × 1011 vector genomes per eye, several orders of magnitude lower than some of the high-dose AAV-based therapies that are now in or approaching phase 3 trials. Even that metric can itself be unreliable: methods for measuring the amount of vector DNA present, such as quantitative polymerase chain reaction (PCR) or digital droplet PCR, provide an indirect measure of the actual level of viral activity. “We dose based on genome copy [number], which is about the only assay we can reliably perform,” says Wilson. The cell-based assays in use are suboptimal, Linden notes. “They’re not reflective of the bioactivity of the virus. What you see in tissue culture cannot be extrapolated into animal models,” he says.

Notwithstanding the challenges, gene therapy manufacturing is obviously not stuck. Industrial scale-up and optimization are proceeding on all levels, from cell transfection to produce the virus vector to formulating the finished product. One shift underway is a migration from adherent cell culture to suspension cell culture, which allows more rapid expansion. Engineered insect cell lines employing baculovirus-based expression systems are growing in popularity for that reason. “There’s a preference for insect-cell-based manufacturing at the moment because once you can get it tweaked and working, you get higher yields,” Kay says. In contrast, traditional mammalian cell lines, such as HEK293, grow more poorly in suspension culture. Valoctocogene roxaparvovec, Biomarin’s AAV5-based hemophilia A gene therapy, which is in a phase 3 trial, is one high-profile example of an insect-cell-grown vector. The preference is not universally held, however. “I’m personally of the opinion that mammalian systems are better suited to make mammalian viruses,” says Linden. Alain Lamproye, CEO of the gene and cell therapy contract manufacturing organization Yposkesi, agrees. The insect-cell-based approach “is restricted to certain serotypes,” he says. It does offer cost benefits, he says, because the process dispenses with the need for bacterial plasmids to introduce the vector components to the producer cell line, but consistency can be a problem.

Yposkesi, located in Evry, France, claims a ten-fold improvement in vector production using the conventional HEK293 cell process. This yield resulted from replacing polyethylenimine with an as yet undisclosed transfection agent, which improves transfection efficiency by up to fivefold, in a highly efficient HEK293 cell line. “We have identified a subpopulation which are high producers,” he says. The company, a spin-out from the not-for-profit gene therapy research organization Généthon, can achieve up to 70% vector purity without the need for cumbersome and expensive ultracentrifuges. “Ultracentrifugation is not an easily scalable manufacturing technique,” he says.

For all their challenges, viral vectors remain the most effective way of delivering large quantities of DNA to target cells. The repertoire of available capsids, particularly AAV vectors, has expanded in the past two decades—an effort spearheaded by Wilson following the death of Jesse Gelsinger in a trial of an adenovirus-based therapy for ornithine transcarbamylase deficiency, a trial Wilson led. The 11 available AAV serotypes can be further extended by pseudotyping—mixing and matching different viral genomes and capsid proteins—so the resulting constructs allow selective, if not specific, targeting of particular organs or tissue types. But what remains largely unchanged are the DNA payloads the vectors carry—for the most part they involve a transgene under the control of a strong viral promoter, as well as the sequences encoding capsid assembly functions. “The capsid [only] gets you so far,” says David Venables, CEO of Edinburgh-based Synpromics. An AAV9-based vector, for example, will efficiently deliver its payload to the central nervous system, but, he adds, it will not do so exclusively. “You’ve still got exposure elsewhere.”

Synpromics is developing promoters that allow greater control of transgene expression in terms of both location and strength. It’s not the first company to develop cell-selective promoters, nor is it the first to develop inducible switches that can dial up or down the level of gene expression required. But it is attempting to overcome the shortcomings of existing systems, which, says Venables, are “leaky” in terms of their expression profiles and which require the coexpression of additional factors, such as transcriptional activators or repressors, to work. They are neither able to shut down expression completely nor able to crank it up sufficiently when required. “The amplitude of the dial-up tends to be quite low.” The company has not yet unveiled the workings of its inducible expression switches, but it has secured commercial agreements with six of the ten leading gene therapy firms, he says. One of them is UniQure, of Amsterdam. It recently reported preclinical data in a nonhuman primate model indicating that a liver-directed gene therapy, under the control of a Synpromics-developed liver-selective promoter, resulted in an eightfold increase in gene expression compared with current approaches.

Touchlight has developed a method to obtain viral vectors using in vitro DNA amplification, which eliminates the need for plasmid transfection and avoids the packaging of unwanted bacterial DNA into viral capsids. “Part of your drug product is contaminated by antibiotic resistance genes,” says Linden. “In the traditional system, you can’t get around it.” Touchlight avoids the problem by using a dual-enzyme system, comprising a phage DNA polymerase and protelomerase. Via rolling-circle DNA replication, a circular template containing the sequence of interest is replicated into a concatemer, a long continuous sequence, which is then processed into individual ‘doggybone’ DNA (dbDNA) molecules. So-called because of their shape, these double-stranded, covalently closed DNA molecules can be introduced into producer cells lines for capsid assembly and packaging using the same triple transfection method currently used for bacterial plasmids. “You do exactly the same, except you don’t package the crap,” says Linden. What’s more, the generation of dbDNA vectors is rapid. “You can make gram amounts of DNA within two weeks at scale in GMP [good manufacturing practice],” he says.

The technology can be used to produce any DNA-based medicines, including gene therapies, DNA vaccines and what Linden terms “DNA-launched” products, such as antibodies. Touchlight recently entered a collaboration with the Janssen Biotech arm of Johnson & Johnson, which is evaluating the technology for undisclosed genetic therapies in infectious disease and oncology. Linden, who, as Pfizer’s former vice president of gene therapy, helped to develop the pharma firm’s gene therapy strategy, says a dbDNA-based AAV gene therapy could reach in the clinic within one to two years.

Linden expects gene therapy to progress in a hybrid fashion in which new ideas and innovations are bolted onto the existing technologies. Because they are potentially curative, AAV and other viral vectors are here to stay, he says, “and what we have to do is address the limitations.” Mark Levi, senior consultant with Parexel, regards this as being feasible, given the incentives involved, “I think you’ll see manufacturing rise to the occasion and deliver,” he says. “Where there’s money to be made it’ll get done.”

研究RNA-protein互作的新方法: OOPS

41587_2018_1_Fig1_HTML.pngResearchers from the University of Cambridge have developed a new method to capture protein-RNA complexes which is compatible with downstream proteomics or RNA-Seq.

The new method is based on the acid guanidinium thiocyanate-phenol-chloroform (AGPC) which is commonly used to extract RNA. In this method, RNA is partitioned into an aqueous phase, protein to the organic phase, DNA the interface and organic phase, and lipids the interface. Applying 254 nm UV crosslinking generates protein-RNA adducts which are “pulled” in both directions and thus end up at the interface. Extraction of this interface and repeated rounds of AGPC phase separation yields an interface enriched in protein-RNA adducts without free protein or RNA. Subsequent treatment of the adducts with proteinase K or RNase digests one half of the adduct and a final round of AGPC returns the remaining RNA or protein to the aqueous or organic phases, respectively. This simple method yields much more material than current techniques to enrich RNA-protein complexes and can be used to quantitatively study either protein-bound RNA or RNA-bound protein.

In their recent Nature Biotechnology publication, Queiroz, Smith & Villanueva et al demonstrate that all long RNAs are bound by protein and sequence the RNA to show that OOPS recovers all cross-linked RNA. After proteinase K digestion, a few amino acids remain on the RNA at the site of crosslinking which in turn inhibits reverse transcription across the crosslink site. Whilst this means slightly more RNA is required to generate RNA-Seq libraries, an upshot is that the RNA-Seq read coverage profiles inform the site of crosslinking.  By comparing the profiles for crosslinked and non-crosslinked samples, protein occupancy sites can thus be detected. Since OOPS recovers all RNA, protein occupancy can be assessed across the entire transcriptome, including non-coding RNAs. The Nature Biotechnology publication is a proof of principle that OOPS can be used to quantify protein binding systematically and modifications to the RNA-Seq library preparation would be expected to yield improved resolution. This approach could help identify global changes in RNA binding between healthy and disease states, or changes in RNA binding across biological conditions.

Having demonstrated that OOPS recovers all crosslinked RNA, the researchers interrogated the bound protein to generate the first full RBPomes of human cell lines, identifying 926 putative novel RBPs. Importantly, OOPS is 100 times more efficient than current methods, thus replicate samples can be easily obtained for multiple conditions. This allowed them to assess differences in RNA binding between nocodazole arrested and released cells. Finally, since OOPS is not dependent on any RNA feature, they were able to obtain the first draft RBPome catalog for a bacterium, the model organism E. coli.

OOPS has the potential to be very useful to further our understanding of the role of RNA binding in biological processes, and to elucidate the etiology of neurological diseases known to involve perturbations in RNA binding, such as amyotrophic lateral sclerosis.

Single-cell RNA-seq—now with protein

srep44447-f1.jpg

Two new methods simultaneously measure epitope and transcriptome levels in single cells.

The molecular understanding of the cell has been greatly advanced by single-cell RNA-seq, a technique that generates a library of all transcripts (the ‘transcriptome’) in a single cell. The technique has revealed surprising heterogeneity in cell populations previously considered homogeneous, identified new and rare cell types, and extended our understanding of cellular development. The transcriptome is, however, only a proxy of the ‘proteome’, the collection of proteins in a cell that defines how the cell looks, acts, and reacts. Although the transcriptome provides valuable information, it does not necessarily reflect protein abundance in the cell. And while flow cytometry is an established strategy for profiling populations at single-cell resolution according to surface-protein levels, it cannot access the rich phenotypic information available in the full cellular transcriptome.

Now, independent efforts led by Marlon Stoeckius at the New York Genome Center (NYGC) and Vanessa Peterson at Merck have yielded approaches to measure levels of both gene and protein expression in single cells on a large scale.

Both CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing), developed by the NYGC group, and REAP-seq (RNA expression and protein sequencing assay), designed by the Merck group, use a similar approach. Proteins are detected by using antibodies conjugated to a tripartite DNA sequence that contains a primer for amplification and sequencing (PCR handle), a unique oligonucleotide that acts as an antibody barcode, and a poly(dA) sequence. The poly(dA) sequence allows for simultaneous extension of antibody-specific DNA sequences and cDNAs in the same poly(dT)-primed reaction. This generates a protein readout that is captured and sequenced along with the cell’s transcriptome. The two approaches differ in how the DNA barcode is conjugated to the antibody. While antibodies used in CITE-seq are conjugated to streptavidin that is noncovalently bound to biotinylated DNA barcodes, REAP-seq relies on covalent bonds between the antibody and aminated DNA barcode.

In a proof-of-principle study, Stoeckius and colleagues monitored ten surface proteins and the transcriptomes of 8,000 single cells from cord blood mononuclear cells. The CITE-seq analysis revealed cell profiles similar to those established by flow cytometry. In addition, the multimodal data from CITE-seq enhanced the phenotypic characterization of a specific type of immune cell, the natural killer cell, compared with single-cell RNA-seq alone.

REAP-seq was used to characterize the effect of a CD27 agonist on human naïve CD8+ T cells by employing 80 barcoded antibodies and monitoring the expression of more than 20,000 genes in a single workflow. The transcriptome data analysis identified several differentially expressed genes in treated versus untreated cells. But REAP-seq’s ability to quantify cell surface proteins also led researchers to determine that ICOS, an immune checkpoint protein, is increased on the surface of treated cells, regardless of the fact that this protein’s mRNA does not differ in abundance between treated and untreated cells. REAP-seq also identified a small and previously unknown cell population within the enriched naïve CD8+ lymphocytes.

While both CITE-seq and REAP-seq add to established methods for transcriptome analysis without affecting the quality of the data, the main limitation of both approaches is the quality of the antibodies used and the epitope location, which is currently restricted to the cell surface. Both research groups anticipate that the use of these tools will soon be extended to measure intracellular proteins.

References

Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cellsNat. Methods 14, 865–868 (2017).

Peterson, V.M. et al. Multiplexed quantification of proteins and transcripts in single cellsNat. Biotechnol.