>
Fa   |   Ar   |   En
   Evaluation of the evenness score in next-generation sequencing  
   
نویسنده Oexle Konrad
منبع journal of human genetics - 2016 - دوره : 61 - شماره : 7 - صفحه:627 -632
چکیده    The evenness score (e) in next-generation sequencing (ngs) quantifies the homogeneity in coverage of the ngs targets. here i clarify the mathematical description of e, which is 1 minus the integral from 0 to 1 over the cumulative distribution function f(x) of the normalized coverage x, where normalization means division by the mean, and derive a computationally more efficient formula; that is, 1 minus the integral from 0 to 1 over the probability density distribution f(x) times 1–x. an analogous formula for empirical coverage data is provided as well as fast r command line scripts. this new formula allows for a general comparison of e with the coefficient of variation (=standard deviation σ of normalized data) which is the conventional measure of the relative width of a distribution. for symmetrical distributions, including the gaussian, e can be predicted closely as 1–σ2/2⩾e⩾1–σ/2 with σ⩽1 owing to normalization and symmetry. in case of the log-normal distribution as a typical representative of positively skewed biological data, the analysis yields e≈exp(−σ*/2) with σ*2=ln(σ2+1) up to large σ (⩽3), and e≈1–f(exp(−1)) for very large σ (⩾2.5). in the latter kind of rather uneven coverage, e can provide direct information on the fraction of well-covered targets that is not immediately delivered by the normalized σ. otherwise, e does not appear to have major advantages over σ or over a simple score exp(−σ) based on it. actually, exp(−σ) exploits a much larger part of its range for the evaluation of realistic ngs outputs.
آدرس Center for Cardiovascular Genetics and Gene Diagnostics, Switzerland
 
     
   
Authors
  
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved