Skip to content

DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors.

文献信息

DOI10.1016/j.cels.2019.03.003
PMID30954475
期刊Cell systems
影响因子7.7
JCR 分区Q1
发表年份2019
被引次数1893
关键词双重检测、机器学习、质量控制、单细胞RNA测序
文献类型Journal Article, Research Support, N.I.H., Extramural, Research Support, Non-U.S. Gov't, Research Support, U.S. Gov't, Non-P.H.S.
ISSN2405-4712
页码329-337.e4
期号8(4)
作者Christopher S McGinnis, Lyndsay M Murrow, Zev J Gartner

一句话小结

本研究提出了一种新的计算工具DoubletFinder,用于利用单细胞RNA测序数据识别技术伪影“双重细胞”,解决了传统方法的局限性。通过在已知双重细胞数据集上的验证,DoubletFinder显示出提高差异表达基因识别的能力,并具有良好的适应性和实用性,对多样细胞类型的scRNA-seq数据具有重要的应用意义。

在麦伴科研 (maltsci.com) 搜索更多文献

双重检测 · 机器学习 · 质量控制 · 单细胞RNA测序

摘要

单细胞RNA测序(scRNA-seq)数据常常受到被称为“双重细胞”(doublets)的技术伪影的影响,这限制了细胞通量并导致虚假的生物学结论。在这里,我们提出了一种计算双重细胞检测工具——DoubletFinder,该工具仅利用基因表达数据来识别双重细胞。DoubletFinder根据每个真实细胞在基因表达空间中与通过平均随机选择细胞对的转录谱生成的人工双重细胞的接近度来预测双重细胞。我们首先使用已知双重细胞身份的scRNA-seq数据集,展示DoubletFinder能够识别由转录上存在差异的细胞形成的双重细胞。当这些双重细胞被移除后,差异表达基因的识别得到了增强。其次,我们提供了一种估计DoubletFinder输入参数的方法,使其能够应用于具有多样细胞类型分布的scRNA-seq数据集。最后,我们展示了DoubletFinder应用的“最佳实践”,并说明DoubletFinder对具有“混合”表达特征的经过实验验证的肾细胞类型不敏感。

英文摘要

Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as "doublets," which limit cell throughput and lead to spurious biological conclusions. Here, we present a computational doublet detection tool-DoubletFinder-that identifies doublets using only gene expression data. DoubletFinder predicts doublets according to each real cell's proximity in gene expression space to artificial doublets created by averaging the transcriptional profile of randomly chosen cell pairs. We first use scRNA-seq datasets where the identity of doublets is known to show that DoubletFinder identifies doublets formed from transcriptionally distinct cells. When these doublets are removed, the identification of differentially expressed genes is enhanced. Second, we provide a method for estimating DoubletFinder input parameters, allowing its application across scRNA-seq datasets with diverse distributions of cell types. Lastly, we present "best practices" for DoubletFinder applications and illustrate that DoubletFinder is insensitive to an experimentally validated kidney cell type with "hybrid" expression features.

麦伴智能科研服务

智能阅读回答你对文献的任何问题,帮助理解文献中的复杂图表和公式
定位观点定位某个观点在文献中的蛛丝马迹
加入知识库完成数据提取,报告撰写等更多高级知识挖掘功能

主要研究问题

  1. DoubletFinder在不同类型的scRNA-seq数据集上的应用效果如何?
  2. 如何评估DoubletFinder在识别双重细胞方面的准确性和灵敏度?
  3. 除了DoubletFinder,还有哪些其他方法可以有效检测scRNA-seq数据中的双重细胞?
  4. 在应用DoubletFinder时,输入参数的选择对结果有何影响?
  5. DoubletFinder如何处理具有“混合”表达特征的细胞类型,是否存在局限性?

核心洞察

研究背景和目的

单细胞RNA测序(scRNA-seq)技术在生物医学研究中具有重要应用,但其数据常受到技术伪影的影响,尤其是“重复细胞”(doublets),即在测序过程中两个细胞的转录组混合在一起,导致分析结果失真。为了提高scRNA-seq的细胞通量并减少重复细胞的形成,本文提出了一种新的计算工具DoubletFinder,旨在通过基因表达数据识别重复细胞。

主要方法/材料/实验设计

DoubletFinder的工作流程可以分为五个主要步骤,具体如下:

  1. 模拟人工重复细胞:通过随机选择细胞对并平均其基因表达特征,生成人工重复细胞。
  2. 数据合并与预处理:将真实细胞数据与人工重复细胞数据合并,并使用Seurat分析管道进行标准化和预处理。
  3. 降维分析:对合并后的数据进行主成分分析(PCA),以生成低维空间表示。
  4. 计算邻近细胞:为每个真实细胞计算其在主成分空间中的邻近细胞,并计算其人工邻近细胞的比例(pANN)。
  5. 预测重复细胞:根据pANN值的高低预测真实重复细胞。
Mermaid diagram

关键结果和发现

  • 性能评估:在已知重复细胞身份的scRNA-seq数据集上,DoubletFinder能够有效识别来自转录组不同细胞的重复细胞,并且在去除这些重复细胞后,差异基因表达分析的性能显著提高。
  • 输入参数优化:通过对不同数据集的参数(如pN和pK)进行优化,发现pK是影响DoubletFinder性能的主要参数。
  • 真实数据应用:在没有已知重复细胞标签的情况下,DoubletFinder也能有效应用于真实数据集(如小鼠肾脏数据),且能够识别具有“混合”表达特征的细胞状态。

主要结论/意义/创新性

DoubletFinder是一种创新的计算工具,能够通过基因表达数据识别和去除重复细胞,进而改善scRNA-seq数据分析的准确性。其方法的有效性和通用性使其在单细胞转录组学研究中具有广泛的应用潜力,尤其是在细胞状态转变的研究中。

研究局限性和未来方向

  • 对同源重复细胞的敏感性不足:DoubletFinder对来自转录组相似细胞的同源重复细胞的识别能力较弱,可能导致部分真实细胞被误分类为重复细胞。
  • 参数调整的挑战:在不同细胞类型和转录异质性的数据集中,如何精确调整参数仍需进一步研究。
  • 未来方向:建议结合样本多重化技术与DoubletFinder使用,以实现更高的细胞通量和更准确的重复细胞识别。此外,未来可探索改进DoubletFinder以提高对同源重复细胞的识别能力。

参考文献

  1. Nuclei multiplexing with barcoded antibodies for single-nucleus genomics. - Jellert T Gaublomme;Bo Li;Cristin McCabe;Abigail Knecht;Yiming Yang;Eugene Drokhlyansky;Nicholas Van Wittenberghe;Julia Waldman;Danielle Dionne;Lan Nguyen;Philip L De Jager;Bertrand Yeung;Xinfang Zhao;Naomi Habib;Orit Rozenblatt-Rosen;Aviv Regev - Nature communications (2019)
  2. Splatter: simulation of single-cell RNA sequencing data. - Luke Zappia;Belinda Phipson;Alicia Oshlack - Genome biology (2017)
  3. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. - Evan Z Macosko;Anindita Basu;Rahul Satija;James Nemesh;Karthik Shekhar;Melissa Goldman;Itay Tirosh;Allison R Bialas;Nolan Kamitaki;Emily M Martersteck;John J Trombetta;David A Weitz;Joshua R Sanes;Alex K Shalek;Aviv Regev;Steven A McCarroll - Cell (2015)
  4. The CD14(+/low)CD16(+) monocyte subset is more susceptible to spontaneous and oxidant-induced apoptosis than the CD14(+)CD16(-) subset. - C Zhao;Y-C Tan;W-C Wong;X Sem;H Zhang;H Han;S-M Ong;K-L Wong;W-H Yeap;S-K Sze;P Kourilsky;S-C Wong - Cell death & disease (2010)
  5. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. - Alexander B Rosenberg;Charles M Roco;Richard A Muscat;Anna Kuchina;Paul Sample;Zizhen Yao;Lucas T Graybuck;David J Peeler;Sumit Mukherjee;Wei Chen;Suzie H Pun;Drew L Sellers;Bosiljka Tasic;Georg Seelig - Science (New York, N.Y.) (2018)
  6. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. - Andrew McDavid;Greg Finak;Pratip K Chattopadyay;Maria Dominguez;Laurie Lamoreaux;Steven S Ma;Mario Roederer;Raphael Gottardo - Bioinformatics (Oxford, England) (2013)
  7. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. - David van Dijk;Roshan Sharma;Juozas Nainys;Kristina Yim;Pooja Kathail;Ambrose J Carr;Cassandra Burdziak;Kevin R Moon;Christine L Chaffer;Diwakar Pattabiraman;Brian Bierie;Linas Mazutis;Guy Wolf;Smita Krishnaswamy;Dana Pe'er - Cell (2018)
  8. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. - Jihwan Park;Rojesh Shrestha;Chengxiang Qiu;Ayano Kondo;Shizheng Huang;Max Werth;Mingyao Li;Jonathan Barasch;Katalin Suszták - Science (New York, N.Y.) (2018)
  9. Comprehensive single-cell transcriptional profiling of a multicellular organism. - Junyue Cao;Jonathan S Packer;Vijay Ramani;Darren A Cusanovich;Chau Huynh;Riza Daza;Xiaojie Qiu;Choli Lee;Scott N Furlan;Frank J Steemers;Andrew Adey;Robert H Waterston;Cole Trapnell;Jay Shendure - Science (New York, N.Y.) (2017)
  10. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. - Samuel L Wolock;Romain Lopez;Allon M Klein - Cell systems (2019)

引用本文的文献

  1. Discovery of rare cells from voluminous single cell expression data. - Aashi Jindal;Prashant Gupta; Jayadeva;Debarka Sengupta - Nature communications (2018)
  2. Single Cell Gene Expression to Understand the Dynamic Architecture of the Heart. - Andrea Massaia;Patricia Chaves;Sara Samari;Ricardo Júdice Miragaia;Kerstin Meyer;Sarah Amalia Teichmann;Michela Noseda - Frontiers in cardiovascular medicine (2018)
  3. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. - Samuel L Wolock;Romain Lopez;Allon M Klein - Cell systems (2019)
  4. Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape. - Brian Hie;Hyunghoon Cho;Benjamin DeMeo;Bryan Bryson;Bonnie Berger - Cell systems (2019)
  5. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. - Christopher S McGinnis;David M Patterson;Juliane Winkler;Daniel N Conrad;Marco Y Hein;Vasudha Srivastava;Jennifer L Hu;Lyndsay M Murrow;Jonathan S Weissman;Zena Werb;Eric D Chow;Zev J Gartner - Nature methods (2019)
  6. Current best practices in single-cell RNA-seq analysis: a tutorial. - Malte D Luecken;Fabian J Theis - Molecular systems biology (2019)
  7. Quantitative proteomics and single-nucleus transcriptomics of the sinus node elucidates the foundation of cardiac pacemaking. - Nora Linscheid;Sunil Jit R J Logantha;Pi Camilla Poulsen;Shanzhuo Zhang;Maren Schrölkamp;Kristoffer Lihme Egerod;Jonatan James Thompson;Ashraf Kitmitto;Gina Galli;Martin J Humphries;Henggui Zhang;Tune H Pers;Jesper Velgaard Olsen;Mark Boyett;Alicia Lundby - Nature communications (2019)
  8. Single-cell RNA sequencing of a European and an African lymphoblastoid cell line. - Daniel Osorio;Xue Yu;Peng Yu;Erchin Serpedin;James J Cai - Scientific data (2019)
  9. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. - Dylan Kotliar;Adrian Veres;M Aurel Nagy;Shervin Tabrizi;Eran Hodis;Douglas A Melton;Pardis C Sabeti - eLife (2019)
  10. An accessible, interactive GenePattern Notebook for analysis and exploration of single-cell transcriptomic data. - Clarence K Mah;Alexander T Wenzel;Edwin F Juarez;Thorin Tabor;Michael M Reich;Jill P Mesirov - F1000Research (2018)

... (1883 更多 篇文献)


© 2025 MaltSci 麦伴科研 - 我们用人工智能技术重塑科研