>
Fa   |   Ar   |   En
   Physlr-Mol: Physical Map of Linked-Reads For De-Novo Barcode To Molecule Deconvolution  
   
DOR 20.1001.2.9920068682.1399.1.1.323.7
نویسنده Afshinfard Amirhossein ,Jackman Shaun ,Coombe Lauren ,Chu Justin ,Wong Johnathan ,Nikolic Vladimir ,Warren Rene ,Birol Inanc
منبع ژنتيك ايران - 1399 - دوره : 16 - شانزدهمین کنگره و چهارمین کنگره بین المللی ژنتیک ایران - کد همایش: 99200-68682
چکیده    Background and aim: long-range sequence information extracted by novel genome sequencing technologies has drastically transformed our understanding of genomics. despite expensive and error-prone long-read sequencing technologies like that of oxford nanopore and pacbio, linked-read technologies such as 10xg chromium provide long-range information while utilizing high-quality short-read platforms. having assigned the short reads derived from the same long dna molecule an identical barcode, they sequence the reads with short-read sequencing technologies and thus offer the same fidelity and cost of short-reads. one main challenge in analyzing linked reads arises from barcode reuse, whereby distinct molecules are assigned the same barcode.methods: here we present physlr-molecule, a method that deconvolutes these barcodes into their component molecules without using a reference genome. a barcode overlap-graph is constructed, where each edge represents two barcodes that share minimizer k-mers. to split a barcode into molecules, we inspect each barcode’s neighbourhood graph, the vertex-induced subgraph of a barcode's immediate neighbours. this neighbourhood subgraph is composed of multiple communities, one community per molecule. physlr-molecule detects these communities in millions of subgraphs each comprising hundreds to thousands of vertices. in such a setting, state-of-the-art community-detection algorithms fail to scale up. to reduce the running time of these superlinear-time algorithms, each subgraph is partitioned into chunks; communities are detected using k-clique percolation and cosine similarity measure, and then merged if needed. results: the novel community-detection approach explained above reduces the running time from 8 weeks to 8 minutes for drosophila melanogaster, and 18 minutes for h.sapiens (hg004). the deconvoluted barcode set resulted in a chromosome-level assembly of the human genome (hg004 10xg dataset) with ng50 of 59.2 mb compared to 38.5 mb by supernova the one-and-only existing tool for linke-reads assembly. it also decreased the number of miss-assemblies from 1071 to 507. 3 chromosomes are assembled in 1 scaffold and 6 are 90% assembled in only 2 pieces. conclusion: physlr-molecule efficiently deconvolutes barcodes of linked-reads. making a physical-map of the deconvoluted set it produces a map by which it can scaffold draft genome assemblies and yield in assemblies of chromosome-level contiguity. physlr accepts linked reads of different technologies like 10xg chromium and mgi stlfr.
کلیدواژه Dna Sequencing ,Genome Assembly ,Scaffolding Draft Genomes ,Linked Reads ,Community Detection ,10x Genomics
آدرس University Of British Columbia, Canada, 10x Genomics, Usa, University Of British Columbia, Canada, University Of British Columbia, Canada, University Of British Columbia, Canada, University Of British Columbia, Canada, University Of British Columbia, Canada, University Of British Columbia, Canada
 
     
   
Authors
  
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved