Appearance
DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors.
Literature Information
| DOI | 10.1016/j.cels.2019.03.003 |
|---|---|
| PMID | 30954475 |
| Journal | Cell systems |
| Impact Factor | 7.7 |
| JCR Quartile | Q1 |
| Publication Year | 2019 |
| Times Cited | 1893 |
| Keywords | doublet detection, machine learning, quality-control, single-cell RNA sequencing |
| Literature Type | Journal Article, Research Support, N.I.H., Extramural, Research Support, Non-U.S. Gov't, Research Support, U.S. Gov't, Non-P.H.S. |
| ISSN | 2405-4712 |
| Pages | 329-337.e4 |
| Issue | 8(4) |
| Authors | Christopher S McGinnis, Lyndsay M Murrow, Zev J Gartner |
TL;DR
This study introduces DoubletFinder, a computational tool designed to detect doublets in single-cell RNA sequencing data based solely on gene expression, addressing the issue of technical artifacts that compromise cell throughput and biological interpretations. The tool effectively identifies doublets formed from transcriptionally distinct cells, enhances the detection of differentially expressed genes when doublets are removed, and provides adaptable input parameters and best practices for diverse scRNA-seq datasets.
Search for more papers on MaltSci.com
doublet detection · machine learning · quality-control · single-cell RNA sequencing
Abstract
Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as "doublets," which limit cell throughput and lead to spurious biological conclusions. Here, we present a computational doublet detection tool-DoubletFinder-that identifies doublets using only gene expression data. DoubletFinder predicts doublets according to each real cell's proximity in gene expression space to artificial doublets created by averaging the transcriptional profile of randomly chosen cell pairs. We first use scRNA-seq datasets where the identity of doublets is known to show that DoubletFinder identifies doublets formed from transcriptionally distinct cells. When these doublets are removed, the identification of differentially expressed genes is enhanced. Second, we provide a method for estimating DoubletFinder input parameters, allowing its application across scRNA-seq datasets with diverse distributions of cell types. Lastly, we present "best practices" for DoubletFinder applications and illustrate that DoubletFinder is insensitive to an experimentally validated kidney cell type with "hybrid" expression features.
MaltSci.com AI Research Service
Intelligent ReadingAnswer any question about the paper and explain complex charts and formulas
Locate StatementsFind traces of a specific claim within the paper
Add to KBasePerform data extraction, report drafting, and advanced knowledge mining
Primary Questions Addressed
- How does DoubletFinder compare to other doublet detection methods in terms of accuracy and computational efficiency?
- What specific challenges does DoubletFinder address when dealing with different cell type distributions in scRNA-seq datasets?
- Can the principles behind DoubletFinder be applied to other types of single-cell sequencing technologies beyond RNA sequencing?
- What are the implications of doublet detection on downstream analyses, such as differential gene expression studies?
- How does the choice of input parameters influence the performance of DoubletFinder in various experimental setups?
Key Findings
Research Background and Purpose
Single-cell RNA sequencing (scRNA-seq) has become a vital tool in understanding cellular heterogeneity. However, the presence of technical artifacts known as "doublets"—where a single RNA sequencing read represents two or more cells—can confound data interpretation. This study introduces DoubletFinder, a computational tool designed to detect doublets based solely on gene expression data, aiming to enhance the accuracy of scRNA-seq analyses.
Main Methods/Materials/Experimental Design
DoubletFinder operates through a structured workflow involving several key steps:
- Simulation of Artificial Doublets: Creates artificial doublets by averaging the gene expression profiles of randomly selected pairs of cells.
- Data Integration: Merges the simulated doublets with real scRNA-seq data processed through the Seurat pipeline.
- Dimensionality Reduction: Applies principal component analysis (PCA) to the merged dataset to establish a low-dimensional representation of the data.
- Proximity Calculation: Identifies the k nearest neighbors for each real cell in PCA space and computes the proportion of artificial nearest neighbors (pANN).
- Doublet Prediction: Classifies cells as doublets based on their pANN values, identifying those with the highest proportions as potential doublets.
The workflow can be represented in a flowchart using Mermaid code:
Key Results and Findings
- Validation Against Ground Truth: DoubletFinder was benchmarked using datasets with known doublet identities (from methods like Demuxlet and Cell Hashing), demonstrating high sensitivity in detecting heterotypic doublets (formed from transcriptionally distinct cells) while being less effective for homotypic doublets (formed from similar cells).
- Impact on Differential Gene Expression: The removal of identified doublets significantly improved the performance of differential gene expression analyses, leading to the identification of more differentially expressed genes.
- Parameter Optimization: The study established a method for estimating optimal input parameters for DoubletFinder, allowing its application across various scRNA-seq datasets.
Main Conclusions/Significance/Innovation
DoubletFinder presents a novel and effective approach for detecting doublets in scRNA-seq data using only gene expression profiles. Its ability to enhance the identification of differentially expressed genes and improve data interpretation underscores its significance in the field. The tool's insensitivity to homotypic doublets suggests it is best used in conjunction with other methods, particularly in contexts where doublet formation is expected.
Research Limitations and Future Directions
- Sensitivity to Data Structure: DoubletFinder performs optimally with datasets that exhibit well-resolved clusters. Its efficacy diminishes in homogeneous datasets, which poses a limitation for certain applications.
- Homotypic Doublet Detection: Future improvements could focus on enhancing the detection of homotypic doublets or developing strategies to account for their presence without introducing significant false positives.
- Broad Application: Further validation across diverse biological contexts and datasets is needed to fully establish the robustness and adaptability of DoubletFinder in real-world scenarios.
| Aspect | Details |
|---|---|
| Tool | DoubletFinder |
| Primary Function | Detects doublets in scRNA-seq data |
| Key Innovation | Utilizes artificial doublets based on gene expression data |
| Impact on Analysis | Improves differential gene expression results |
| Limitations | Insensitive to homotypic doublets; performance varies with data structure |
| Future Directions | Enhance homotypic detection; validate across diverse datasets |
References
- Nuclei multiplexing with barcoded antibodies for single-nucleus genomics. - Jellert T Gaublomme;Bo Li;Cristin McCabe;Abigail Knecht;Yiming Yang;Eugene Drokhlyansky;Nicholas Van Wittenberghe;Julia Waldman;Danielle Dionne;Lan Nguyen;Philip L De Jager;Bertrand Yeung;Xinfang Zhao;Naomi Habib;Orit Rozenblatt-Rosen;Aviv Regev - Nature communications (2019)
- Splatter: simulation of single-cell RNA sequencing data. - Luke Zappia;Belinda Phipson;Alicia Oshlack - Genome biology (2017)
- Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. - Evan Z Macosko;Anindita Basu;Rahul Satija;James Nemesh;Karthik Shekhar;Melissa Goldman;Itay Tirosh;Allison R Bialas;Nolan Kamitaki;Emily M Martersteck;John J Trombetta;David A Weitz;Joshua R Sanes;Alex K Shalek;Aviv Regev;Steven A McCarroll - Cell (2015)
- The CD14(+/low)CD16(+) monocyte subset is more susceptible to spontaneous and oxidant-induced apoptosis than the CD14(+)CD16(-) subset. - C Zhao;Y-C Tan;W-C Wong;X Sem;H Zhang;H Han;S-M Ong;K-L Wong;W-H Yeap;S-K Sze;P Kourilsky;S-C Wong - Cell death & disease (2010)
- Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. - Alexander B Rosenberg;Charles M Roco;Richard A Muscat;Anna Kuchina;Paul Sample;Zizhen Yao;Lucas T Graybuck;David J Peeler;Sumit Mukherjee;Wei Chen;Suzie H Pun;Drew L Sellers;Bosiljka Tasic;Georg Seelig - Science (New York, N.Y.) (2018)
- Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. - Andrew McDavid;Greg Finak;Pratip K Chattopadyay;Maria Dominguez;Laurie Lamoreaux;Steven S Ma;Mario Roederer;Raphael Gottardo - Bioinformatics (Oxford, England) (2013)
- Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. - David van Dijk;Roshan Sharma;Juozas Nainys;Kristina Yim;Pooja Kathail;Ambrose J Carr;Cassandra Burdziak;Kevin R Moon;Christine L Chaffer;Diwakar Pattabiraman;Brian Bierie;Linas Mazutis;Guy Wolf;Smita Krishnaswamy;Dana Pe'er - Cell (2018)
- Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. - Jihwan Park;Rojesh Shrestha;Chengxiang Qiu;Ayano Kondo;Shizheng Huang;Max Werth;Mingyao Li;Jonathan Barasch;Katalin Suszták - Science (New York, N.Y.) (2018)
- Comprehensive single-cell transcriptional profiling of a multicellular organism. - Junyue Cao;Jonathan S Packer;Vijay Ramani;Darren A Cusanovich;Chau Huynh;Riza Daza;Xiaojie Qiu;Choli Lee;Scott N Furlan;Frank J Steemers;Andrew Adey;Robert H Waterston;Cole Trapnell;Jay Shendure - Science (New York, N.Y.) (2017)
- Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. - Samuel L Wolock;Romain Lopez;Allon M Klein - Cell systems (2019)
Literatures Citing This Work
- Discovery of rare cells from voluminous single cell expression data. - Aashi Jindal;Prashant Gupta; Jayadeva;Debarka Sengupta - Nature communications (2018)
- Single Cell Gene Expression to Understand the Dynamic Architecture of the Heart. - Andrea Massaia;Patricia Chaves;Sara Samari;Ricardo Júdice Miragaia;Kerstin Meyer;Sarah Amalia Teichmann;Michela Noseda - Frontiers in cardiovascular medicine (2018)
- Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. - Samuel L Wolock;Romain Lopez;Allon M Klein - Cell systems (2019)
- Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape. - Brian Hie;Hyunghoon Cho;Benjamin DeMeo;Bryan Bryson;Bonnie Berger - Cell systems (2019)
- MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. - Christopher S McGinnis;David M Patterson;Juliane Winkler;Daniel N Conrad;Marco Y Hein;Vasudha Srivastava;Jennifer L Hu;Lyndsay M Murrow;Jonathan S Weissman;Zena Werb;Eric D Chow;Zev J Gartner - Nature methods (2019)
- Current best practices in single-cell RNA-seq analysis: a tutorial. - Malte D Luecken;Fabian J Theis - Molecular systems biology (2019)
- Quantitative proteomics and single-nucleus transcriptomics of the sinus node elucidates the foundation of cardiac pacemaking. - Nora Linscheid;Sunil Jit R J Logantha;Pi Camilla Poulsen;Shanzhuo Zhang;Maren Schrölkamp;Kristoffer Lihme Egerod;Jonatan James Thompson;Ashraf Kitmitto;Gina Galli;Martin J Humphries;Henggui Zhang;Tune H Pers;Jesper Velgaard Olsen;Mark Boyett;Alicia Lundby - Nature communications (2019)
- Single-cell RNA sequencing of a European and an African lymphoblastoid cell line. - Daniel Osorio;Xue Yu;Peng Yu;Erchin Serpedin;James J Cai - Scientific data (2019)
- Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. - Dylan Kotliar;Adrian Veres;M Aurel Nagy;Shervin Tabrizi;Eran Hodis;Douglas A Melton;Pardis C Sabeti - eLife (2019)
- An accessible, interactive GenePattern Notebook for analysis and exploration of single-cell transcriptomic data. - Clarence K Mah;Alexander T Wenzel;Edwin F Juarez;Thorin Tabor;Michael M Reich;Jill P Mesirov - F1000Research (2018)
... (1883 more literatures)
© 2025 MaltSci - We reshape scientific research with AI technology
