Skip to content

BayesHammer: Bayesian clustering for error correction in single-cell sequencing.

文献信息

DOI10.1186/1471-2164-14-S1-S7
PMID23368723
期刊BMC genomics
影响因子3.7
JCR 分区Q2
发表年份2013
被引次数233
关键词贝叶斯聚类, 错误校正, 单细胞测序
文献类型Journal Article, Research Support, Non-U.S. Gov't
ISSN1471-2164
页码S7
期号14 Suppl 1()
作者Sergey I Nikolenko, Anton I Korobeynikov, Max A Alekseyev

一句话小结

研究者开发了一种新型错误校正工具BAYESHAMMER,针对单细胞测序中的错误校正问题,引入了基于汉明图和贝叶斯子聚类的算法,显著改善了现有工具在多细胞测序中的表现,并提升了运行速度。该工具为单细胞测序数据的准确性和效率提供了新的解决方案,具有重要的应用价值。

在麦伴科研 (maltsci.com) 搜索更多文献

贝叶斯聚类 · 错误校正 · 单细胞测序

摘要

序列读取的错误校正仍然是一项困难的任务,尤其是在覆盖极不均匀的单细胞测序项目中。虽然现有的针对标准(多细胞)测序数据的错误校正工具在单细胞测序项目中通常表现不佳,但迄今为止,实际用于单细胞错误校正的算法仍然非常简单。我们在新开发的错误校正工具BAYESHAMMER中引入了几种基于汉明图和贝叶斯子聚类的新算法。虽然BAYESHAMMER是为单细胞测序设计的,但我们证明它在多细胞测序数据的现有错误校正工具上也有改进,同时在实际数据集上运行速度更快。我们在k-mer计数和实际组装结果上使用SPADES基因组组装器对BAYESHAMMER进行了基准测试。

英文摘要

Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data usually come up short in single-cell sequencing projects, algorithms actually used for single-cell error correction have been so far very simplistic.We introduce several novel algorithms based on Hamming graphs and Bayesian subclustering in our new error correction tool BAYESHAMMER. While BAYESHAMMER was designed for single-cell sequencing, we demonstrate that it also improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets. We benchmark BAYESHAMMER on both k-mer counts and actual assembly results with the SPADES genome assembler.

麦伴智能科研服务

智能阅读回答你对文献的任何问题,帮助理解文献中的复杂图表和公式
定位观点定位某个观点在文献中的蛛丝马迹
加入知识库完成数据提取,报告撰写等更多高级知识挖掘功能

主要研究问题

  1. BayesHammer在处理不同类型的单细胞测序数据时表现如何?是否有特定的优势或劣势?
  2. 除了BayesHammer,还有哪些新兴的算法或工具可以用于单细胞测序的错误校正?
  3. 在BayesHammer的基准测试中,哪些具体的k-mer计数指标显示了其优越性?
  4. BayesHammer如何与传统的多细胞测序错误校正工具进行比较,具体有哪些改进?
  5. 使用BayesHammer进行单细胞测序错误校正时,是否有推荐的最佳实践或使用策略?

核心洞察

研究背景和目的

单细胞测序技术的出现使得对重要的未培养细菌基因组进行测序成为可能。然而,单细胞测序数据通常具有极其不均匀的覆盖率,这使得现有的错误校正工具在处理这类数据时表现不佳。因此,本文旨在提出一种新的错误校正工具——BAYESHAMMER,基于贝叶斯聚类方法,旨在提高单细胞测序数据的错误校正效率。

主要方法/材料/实验设计

BAYESHAMMER的工作流程如下:

Mermaid diagram
  1. 计算k-mer统计:从输入读取中提取k-mer,计算每个k-mer的出现次数、质量和错误概率。
  2. 构建Hamming图:通过Hamming距离将k-mer构建为图的连通分量。
  3. 贝叶斯子聚类:对Hamming图的连通分量进行进一步的贝叶斯子聚类,以更好地识别和校正错误k-mer。
  4. 选择可靠k-mer:根据总质量阈值选择出“可靠”的k-mer。
  5. 扩展可靠k-mer:迭代地扩展这些可靠k-mer以标记整个读取中的其他k-mer。
  6. 读取校正:通过对每个读取中的可靠k-mer进行共识投票,生成校正后的读取。

关键结果和发现

BAYESHAMMER在多个数据集上的表现优于现有的错误校正工具(如QUAKE、HAMMER、EULER-SR等)。具体结果如下:

工具运行时间k-mers数量错误率降低
BAYESHAMMER57 m35,862,329显著降低
QUAKE30 m58,305,738较少降低
HAMMER36 m28,290,788适中降低
  • 在单细胞E. coli数据集中,BAYESHAMMER的校正效果明显,且运行速度较快。
  • 在组装结果方面,使用BAYESHAMMER校正后的读取比其他工具的校正结果在组装质量上有显著提高。

主要结论/意义/创新性

BAYESHAMMER通过结合贝叶斯聚类和Hamming图的创新方法,显著提高了单细胞测序数据的错误校正能力。这一工具不仅适用于单细胞测序,也对多细胞测序数据有良好的适应性,展示了其广泛的应用潜力。BAYESHAMMER的成功为单细胞基因组研究提供了新的工具和方法。

研究局限性和未来方向

尽管BAYESHAMMER在校正效果上取得了显著进展,但仍存在以下局限性:

  • 对于复杂的重复序列,可能仍存在校正不完全的问题。
  • 当前模型假设错误独立且具有相同概率,未来可以考虑引入非均匀错误分布以提高准确性。

未来的研究方向包括:

  • 优化算法以处理人类或其他DNA污染的情况。
  • 探索最小化技术,以减少内存需求并处理配对信息。

参考文献

  1. Short read fragment assembly of bacterial genomes. - Mark J Chaisson;Pavel A Pevzner - Genome research (2008)
  2. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. - Micah Hamady;Rob Knight - Genome research (2009)
  3. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. - Anton Bankevich;Sergey Nurk;Dmitry Antipov;Alexey A Gurevich;Mikhail Dvorkin;Alexander S Kulikov;Valery M Lesin;Sergey I Nikolenko;Son Pham;Andrey D Prjibelski;Alexey V Pyshkin;Alexander V Sirotkin;Nikolay Vyahhi;Glenn Tesler;Max A Alekseyev;Pavel A Pevzner - Journal of computational biology : a journal of computational molecular cell biology (2012)
  4. QUAST: quality assessment tool for genome assemblies. - Alexey Gurevich;Vladislav Saveliev;Nikolay Vyahhi;Glenn Tesler - Bioinformatics (Oxford, England) (2013)
  5. Single cell genome amplification accelerates identification of the apratoxin biosynthetic pathway from a complex microbial assemblage. - Rashel V Grindberg;Thomas Ishoey;Dumitru Brinza;Eduardo Esquenazi;R Cameron Coates;Wei-ting Liu;Lena Gerwick;Pieter C Dorrestein;Pavel Pevzner;Roger Lasken;William H Gerwick - PloS one (2011)
  6. Reducing storage requirements for biological sequence comparison. - Michael Roberts;Wayne Hayes;Brian R Hunt;Stephen M Mount;James A Yorke - Bioinformatics (Oxford, England) (2004)
  7. Metagenomic analysis of the human distal gut microbiome. - Steven R Gill;Mihai Pop;Robert T Deboy;Paul B Eckburg;Peter J Turnbaugh;Buck S Samuel;Jeffrey I Gordon;David A Relman;Claire M Fraser-Liggett;Karen E Nelson - Science (New York, N.Y.) (2006)
  8. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. - Peter J A Cock;Christopher J Fields;Naohisa Goto;Michael L Heuer;Peter M Rice - Nucleic acids research (2010)
  9. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. - Hamidreza Chitsaz;Joyclyn L Yee-Greenbaum;Glenn Tesler;Mary-Jane Lombardo;Christopher L Dupont;Jonathan H Badger;Mark Novotny;Douglas B Rusch;Louise J Fraser;Niall A Gormley;Ole Schulz-Trieglaff;Geoffrey P Smith;Dirk J Evers;Pavel A Pevzner;Roger S Lasken - Nature biotechnology (2011)
  10. Genomic sequencing of single microbial cells from environmental samples. - Thomas Ishoey;Tanja Woyke;Ramunas Stepanauskas;Mark Novotny;Roger S Lasken - Current opinion in microbiology (2008)

引用本文的文献

  1. The draft assembly of the radically organized Stylonychia lemnae macronuclear genome. - Samuel H Aeschlimann;Franziska Jönsson;Jan Postberg;Nicholas A Stover;Robert L Petera;Hans-Joachim Lipps;Mariusz Nowacki;Estienne C Swart - Genome biology and evolution (2014)
  2. Draft Genome Sequence of Cronobacter sakazakii Clonal Complex 45 Strain HPB5174, Isolated from a Powdered Infant Formula Facility in Ireland. - Arthur W Pightling;Franco Pagotto - Genome announcements (2014)
  3. Hidden diversity in honey bee gut symbionts detected by single-cell genomics. - Philipp Engel;Ramunas Stepanauskas;Nancy A Moran - PLoS genetics (2014)
  4. Genome Sequence of Pectobacterium atrosepticum Strain 21A. - Yevgeny Nikolaichik;Vladimir Gorshkov;Yuri Gogolev;Leonid Valentovich;Anatoli Evtushenkov - Genome announcements (2014)
  5. Draft Genome Sequences of Two Clostridium botulinum Group II (Nonproteolytic) Type B Strains (DB-2 and KAPB-3). - Nicholas Petronella;Robyn Kenwell;Franco Pagotto;Arthur W Pightling - Genome announcements (2014)
  6. HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly. - Basir Shariat;Narjes Sadat Movahedi;Hamidreza Chitsaz;Christina Boucher - BMC genomics (2014)
  7. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. - Melanie Schirmer;Umer Z Ijaz;Rosalinda D'Amore;Neil Hall;William T Sloan;Christopher Quince - Nucleic acids research (2015)
  8. Genome sequence of Coxiella burnetii strain Namibia. - Mathias C Walter;Caroline Öhrman;Kerstin Myrtennäs;Andreas Sjödin;Mona Byström;Pär Larsson;Anna Macellaro;Mats Forsman;Dimitrios Frangoulidis - Standards in genomic sciences (2014)
  9. Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes. - Manuel Kleiner;Lora V Hooper;Breck A Duerkop - BMC genomics (2015)
  10. Draft Genome Sequences of Four Vibrio parahaemolyticus Isolates from Clinical Cases in Canada. - Swapan Banerjee;Nicholas Petronella;Courtney Chew Leung;Jeffrey Farber - Genome announcements (2015)

... (223 更多 篇文献)


© 2025 MaltSci 麦伴科研 - 我们用人工智能技术重塑科研