Appearance
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.
Literature Information
| DOI | 10.1371/journal.pcbi.1004333 |
|---|---|
| PMID | 26107944 |
| Journal | PLoS computational biology |
| Impact Factor | 3.6 |
| JCR Quartile | Q1 |
| Publication Year | 2015 |
| Times Cited | 148 |
| Keywords | single-cell RNA sequencing, Bayesian analysis, gene expression heterogeneity, technical noise, biological variance |
| Literature Type | Journal Article, Research Support, Non-U.S. Gov't |
| ISSN | 1553-734X |
| Pages | e1004333 |
| Issue | 11(6) |
| Authors | Catalina A Vallejos, John C Marioni, Sylvia Richardson |
TL;DR
This study introduces BASiCS, a Bayesian hierarchical model designed to address the technical noise inherent in single-cell mRNA sequencing, allowing for improved identification of biologically variable genes by estimating cell-specific normalization constants and decomposing expression variability into technical and biological components. Demonstrated through analysis of mouse Embryonic Stem Cells, the method effectively enhances gene expression insights, supported by cross-validation and enriched gene ontology categories for identified variable genes, thereby advancing the understanding of cellular heterogeneity.
Search for more papers on MaltSci.com
single-cell RNA sequencing · Bayesian analysis · gene expression heterogeneity · technical noise · biological variance
Abstract
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.
MaltSci.com AI Research Service
Intelligent ReadingAnswer any question about the paper and explain complex charts and formulas
Locate StatementsFind traces of a specific claim within the paper
Add to KBasePerform data extraction, report drafting, and advanced knowledge mining
Primary Questions Addressed
- How does BASiCS compare to other methods for analyzing single-cell sequencing data in terms of handling technical noise?
- What are the implications of using spike-in genes for quantifying technical variability in single-cell RNA sequencing?
- Can BASiCS be applied to other types of single-cell data, such as single-cell ATAC-seq or single-cell proteomics?
- What criteria are used to determine the significance of highly variable genes identified by BASiCS?
- How does the integration of Bayesian hierarchical modeling enhance the interpretation of gene expression variability in single-cell studies?
Key Findings
Research Background and Objective
The study focuses on Bayesian methods for analyzing single-cell sequencing data, particularly addressing the challenges of prior specification and posterior propriety in the context of gene expression modeling. The authors aim to establish a framework that allows for the proper inference of model parameters, even when prior information is scarce.
Main Methods/Materials/Experimental Design
The authors propose a Bayesian model that incorporates both informative and non-informative priors. Specifically, they use an improper non-informative prior for gene-specific normalized expression rates, which is uniform for the log of these rates. This choice simplifies the prior elicitation process but raises concerns about posterior propriety.
The following flowchart summarizes the methodological approach:
- Prior Specification: The model assumes independence among parameters and utilizes an improper prior for gene expression rates.
- Non-informative Prior for Gene Expression: The prior for normalized expression rates is defined as π(µ1, ..., µq0) ∝ ∏(µi^(-1)).
- Assumption of Proper Priors for Other Parameters: Proper priors are assigned to other model parameters (e.g., Gamma distributions).
- Analysis of Simulated and Real Datasets: The model is tested on various datasets, including mouse embryonic stem cells.
- Posterior Inference: The resulting posterior distributions are analyzed.
- Verification of Posterior Propriety: The authors derive conditions under which the posterior distributions are well-defined.
- Theorem on Posterior Existence: A key theorem is presented, stating that the posterior is well-defined if each gene is expressed in at least one cell.
Key Results and Findings
- The use of improper priors can lead to valid posterior inferences if certain conditions are met, specifically that each gene must have a positive count in at least one observation.
- The analysis demonstrates that the choice of hyper-parameters for proper priors does not significantly affect posterior inference.
- The proposed model is validated through both simulated data and real single-cell sequencing datasets, confirming its robustness.
Main Conclusions/Significance/Innovation
The study contributes to the field of single-cell genomics by providing a Bayesian framework that accommodates the complexities of gene expression data. The innovative aspect lies in the use of improper priors while ensuring that the posterior distributions remain valid under specific conditions. This approach allows researchers to analyze large datasets without the need for extensive prior information, facilitating broader applications in genomics.
Research Limitations and Future Directions
- Limitations: The reliance on improper priors may introduce uncertainty in certain contexts, and the theorem's conditions may not be universally applicable across all datasets.
- Future Directions: Future research could explore alternative prior specifications and their impacts on model performance, as well as extending the framework to other types of biological data. Additionally, investigating the applicability of the model in different biological contexts could enhance its utility in genomics research.
References
- Computational and analytical challenges in single-cell transcriptomics. - Oliver Stegle;Sarah A Teichmann;John C Marioni - Nature reviews. Genetics (2015)
- Differentiating embryonic stem cells: GAPDH, but neither HPRT nor beta-tubulin is suitable as an internal standard for measuring RNA levels. - Christopher L Murphy;Julia M Polak - Tissue engineering (2002)
- Detecting differential gene expression with a semiparametric hierarchical mixture method. - Michael A Newton;Amine Noueiry;Deepayan Sarkar;Paul Ahlquist - Biostatistics (Oxford, England) (2004)
- A Bayesian approach to measurement error problems in epidemiology using conditional independence models. - S Richardson;W R Gilks - American journal of epidemiology (1993)
- An in situ hybridization-based screen for heterogeneously expressed genes in mouse ES cells. - Mark G Carter;Carole A Stagg;Geppino Falco;Toshiyuki Yoshikawa;Uwem C Bassey;Kazuhiro Aiba;Lioudmila V Sharova;Nabeebi Shaik;Minoru S H Ko - Gene expression patterns : GEP (2008)
- mRNA-Seq whole-transcriptome analysis of a single cell. - Fuchou Tang;Catalin Barbacioru;Yangzhou Wang;Ellen Nordman;Clarence Lee;Nanlan Xu;Xiaohui Wang;John Bodeau;Brian B Tuch;Asim Siddiqui;Kaiqin Lao;M Azim Surani - Nature methods (2009)
- Bayesian integrated modeling of expression data: a case study on RhoG. - Rashi Gupta;Dario Greco;Petri Auvinen;Elja Arjas - BMC bioinformatics (2010)
- Differential expression analysis for sequence count data. - Simon Anders;Wolfgang Huber - Genome biology (2010)
- Synthetic spike-in standards for RNA-seq experiments. - Lichun Jiang;Felix Schlesinger;Carrie A Davis;Yu Zhang;Renhua Li;Marc Salit;Thomas R Gingeras;Brian Oliver - Genome research (2011)
- Accounting for technical noise in single-cell RNA-seq experiments. - Philip Brennecke;Simon Anders;Jong Kyoung Kim;Aleksandra A Kołodziejczyk;Xiuwei Zhang;Valentina Proserpio;Bianka Baying;Vladimir Benes;Sarah A Teichmann;John C Marioni;Marcus G Heisler - Nature methods (2013)
Literatures Citing This Work
- Design and computational analysis of single-cell RNA-sequencing experiments. - Rhonda Bacher;Christina Kendziorski - Genome biology (2016)
- Beyond comparisons of means: understanding changes in gene expression at the single-cell level. - Catalina A Vallejos;Sylvia Richardson;John C Marioni - Genome biology (2016)
- Reply to The contribution of cell cycle to heterogeneity in single-cell RNA-seq data. - Nature biotechnology (2016)
- Robust Inference of Cell-to-Cell Expression Variations from Single- and K-Cell Profiling. - Manikandan Narayanan;Andrew J Martins;John S Tsang - PLoS computational biology (2016)
- Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data. - Martin Barron;Jun Li - Scientific reports (2016)
- Single-Cell Transcriptomics Bioinformatics and Computational Challenges. - Olivier B Poirion;Xun Zhu;Travers Ching;Lana Garmire - Frontiers in genetics (2016)
- Revealing the vectors of cellular identity with single-cell genomics. - Allon Wagner;Aviv Regev;Nir Yosef - Nature biotechnology (2016)
- A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. - Aaron T L Lun;Davis J McCarthy;John C Marioni - F1000Research (2016)
- Intrinsic transcriptional heterogeneity in B cells controls early class switching to IgE. - Yee Ling Wu;Michael J T Stubbington;Maria Daly;Sarah A Teichmann;Cristina Rada - The Journal of experimental medicine (2017)
- Batch effects and the effective design of single-cell gene expression studies. - Po-Yuan Tung;John D Blischak;Chiaowen Joyce Hsiao;David A Knowles;Jonathan E Burnett;Jonathan K Pritchard;Yoav Gilad - Scientific reports (2017)
... (138 more literatures)
© 2025 MaltSci - We reshape scientific research with AI technology
