|
|
Evaluation of the evenness score in next-generation sequencing
|
|
|
|
|
نویسنده
|
Oexle Konrad
|
منبع
|
journal of human genetics - 2016 - دوره : 61 - شماره : 7 - صفحه:627 -632
|
چکیده
|
The evenness score (e) in next-generation sequencing (ngs) quantifies the homogeneity in coverage of the ngs targets. here i clarify the mathematical description of e, which is 1 minus the integral from 0 to 1 over the cumulative distribution function f(x) of the normalized coverage x, where normalization means division by the mean, and derive a computationally more efficient formula; that is, 1 minus the integral from 0 to 1 over the probability density distribution f(x) times 1–x. an analogous formula for empirical coverage data is provided as well as fast r command line scripts. this new formula allows for a general comparison of e with the coefficient of variation (=standard deviation σ of normalized data) which is the conventional measure of the relative width of a distribution. for symmetrical distributions, including the gaussian, e can be predicted closely as 1–σ2/2⩾e⩾1–σ/2 with σ⩽1 owing to normalization and symmetry. in case of the log-normal distribution as a typical representative of positively skewed biological data, the analysis yields e≈exp(−σ*/2) with σ*2=ln(σ2+1) up to large σ (⩽3), and e≈1–f(exp(−1)) for very large σ (⩾2.5). in the latter kind of rather uneven coverage, e can provide direct information on the fraction of well-covered targets that is not immediately delivered by the normalized σ. otherwise, e does not appear to have major advantages over σ or over a simple score exp(−σ) based on it. actually, exp(−σ) exploits a much larger part of its range for the evaluation of realistic ngs outputs.
|
|
|
آدرس
|
Center for Cardiovascular Genetics and Gene Diagnostics, Switzerland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|