مقایسه روش‌های تجزیه مولفه‌‌های اصلی (pca) و تجزیه تشخیصی مولفه‌های اصلی (dapc) در بررسی ساختار جمعیتی نژادهای اسب آخال‌تکه، عرب و کاسپین با استفاده از اطلاعات ژنومی

Fa | Ar | En

مقایسه روش‌های تجزیه مولفه‌‌های اصلی (pca) و تجزیه تشخیصی مولفه‌های اصلی (dapc) در بررسی ساختار جمعیتی نژادهای اسب آخال‌تکه، عرب و کاسپین با استفاده از اطلاعات ژنومی


نویسنده	بابایی نسرین ,رافت عباس ,مرادی محمدحسین ,فیضی درخشی محمد رضا
منبع	پژوهشهاي علوم دامي ايران - 1400 - دوره : 13 - شماره : 3 - صفحه:453 -462
چکیده	ابداع روش های تعیین ژنوتیپ با توان بالا و مقرون به صرفه طی سالیان اخیر، امکان ارزیابی ساختار ژنتیکی و ارتباط میان جمعیت های یک گونه را با استفاده از اطلاعات ژنومی فراهم ساخته است. بررسی ساختار جمعیتی بر اساس نشانگرهای گسترده در سطح ژنوم، اطلاعات ارزشمندی در ارتباط با روابط تکاملی و دسته بندی زیرجمعیت ها فراهم می کند. هدف از این تحقیق مقایسه دو روش تجزیه مولفه های اصلی (pca) و تجزیه تشخیصی مولفه‌های اصلی (dapc) در بررسی ساختار و روابط بین جمعیتی سه نژاد اسب موجود در منطقه خاورمیانه شامل آخال تکه، عرب و کاسپین بود. به این منظور از داده های ژنومی بدست آمده از آرایه هایillumina 50k snp beadchip در 61 نمونه از این نژادها استفاده شد. این تحقیق با همکاری پروژه کنسرسیوم تنوع ژنتیکی اسب (egdc) انجام شد و کدهای مورد نیاز برای آنالیز داده ها در نرم افزار r تهیه شدند. نتایج حاصل نشان داد که در هر دو روش 8/10 درصد واریانس توسط دو مولفه اول توجیه می شود و هر دو روش سه جمعیت را جدا از هم خوشه بندی کردند. معیار ارزیابی تعداد بهینه خوشه بندی برای روش dapc، معیار اطلاعات بیزی (bic) بود که تعداد k=3 بهترین نتیجه را با کمترین bic نشان داد. روش dapc نسبت به روش pca با نتایج بهتری همراه بود .در تعیین شمار بهینة k بهتر از روش pca عمل کرد و تصویر بهتری از ارتباط بین افراد ارائه داد. همچنین در انتساب افراد به گروه های خود هر دو روش صحت بسیار خوبی ارائه دادند. در مجموع نتایج این تحقیق نشان می دهد با وجود اینکه نتایج تحقیقات گذشته این سه جمعیت را که مربوط به منطقه خاورمیانه هستند در یک خوشه از درخت همسایگی قرار می دهند، ولی با توجه به نتایج این پژوهش و با استفاده از روش های مورد استفاده در این تحقیق، سه نژاد به صورت مجزا گروه بندی می شوند و dapc می تواند تصویر بهتری از روابط بین جمعیتی در نژادهای اسب ارائه دهد.
کلیدواژه	روش‌های dapc و pca، ساختار جمعیت، نژادهای اسب خاورمیانه، نشانگرهای snp
آدرس	دانشگاه تبریز, دانشکده کشاورزی, گروه علوم دامی, ایران, دانشگاه تبریز, دانشکده کشاورزی, گروه علوم دامی, ایران, دانشگاه اراک, دانشکده کشاورزی و منابع طبیعی, گروه علوم دامی, ایران, دانشگاه تبریز, دانشکده فنی و مهندسی, گروه کامپیوتر, ایران

Comparison of principal component analysis (PCA) and discriminant analysis of principal component (DAPC) methods for analysis of population structure in Akhal-Take, Arabian and Caspian horse breeds using genomic data

Authors	Babayi Nasrin ,Rafat Abbad ,Moradi Mohammad Hossein ,Feizi derakhshi Mohammad Reza
Abstract	Introduction Development of highpower and costeffective genotyping methods in recent years has provided the possibility of evaluation the genetic structure and the relationship among species populations utilizing genomic data. Genome wide inference of population structure using genetic markers could provide invaluable information associated with evolutionary relationships and clustering of subpopulations for performing animal breeding programs. In large scale studies, one of the interesting subjects is to study the existence of genetic differences among subdivided groups ascertained from different geographic locations. The objective of this study was to compare the principal component analysis (PCA) and discriminant analysis of principal component (DAPC) approaches for determining the population structure and study how an individual allocated to the true population of origin, in three Horse breeds located in Middle East consisting Akhal Take, Arabian and Caspian using genomic data.Materials and Methods In this study, the genomic data obtained from 61 animals consisting Akhal Take (19), Arabian (24) and Caspian (18) were used to investigate the population structure of some Asian horse breeds. The data were obtained from the Equine Genetic Diversity Consortium (EGDC) project. Hair or tissue samples were collected from animals. DNA extraction was performed using an optimized Pure gene (Qiagen) assay and approximately 1 μg of DNA was used for genotyping of the samples. Genotyping was performed using Illumina SNP 50K BeadChip arrays that allow to genotype 52603 SNP marker loci, according to the Illumina standard guidelines. In this study, different quality control steps were applied on preliminary data to ensure the quality of genotyping data. Quality control carried out using PLINK v.1.07 program. The samples with more than 5% missing data were excluded from analysis. Then for each SNP, MAF and call percentage were calculated and the SNPs with a call rate<95% and a MAF<2% were discarded. Deviation from HardyWeinberg equilibrium (p<106) was estimated for the remaining SNPs to identify genotyping errors. The Bonferroni correction (β=α/n) was used to address the multiple testing comparison problem. Principal component analysis (PCA) is a statistical technique for summarizing data from many variables into a few variables which describe as much of the variation in the data as possible. For this purpose, the variancecovariance matrix of independent variables was first calculated and principal components were extracted. Each new variable has an associated Eigen value that measures the respective amount of explained variance. Furthermore, the model independent of discriminant analysis of principal component (DAPC) is a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential Kmeans and model selection to infer genetic clusters. Analysis was performed using PCA and DAPC approaches and the codes for analysis were provided in R v.3.4.1 software.Results and Discussion The analysis of the main components summarizes the general variation among individuals, which includes both the variability between the groups and the diversity of the groups, and shows a clear picture of the differences between the groups. The results of this study indicated that 10.8% of the variance was explained by the first two components in both PCA and DAPC methods. Both methods showed high accuracy for assigning of individuals to the true population of origin and both were able to cluster three populations separately. The Bayesian information criterion (BIC) index was used for evaluating the optimal number of clusters for DAPC method and the results revealed that K=3 showing the optimal number with lowest BIC that completely separate three populations. The DAPC method was better than PCA to separate populations from each other due to the increase of intergroup variance and the reduction of intragroup variance. In determining the optimal number of K, it worked better than PCA method and provided a better picture of the relationship between individuals. This results show that DAPC method can be applied in quality control of GWAS as an alternative to the PCA, because of summarizing the genetic differentiation between groups and overlooking withingroup variation and provides better population structure.Conclusion In general, the results of this study showed that although the previous studies grouped these three breeds located in Middle East in one cluster of neighboring trees, however, according to the results of this study, three breeds are grouped separately, and the DAPC method can better illustrate the interpopulation relationships in horse breeds.
Keywords