>
Fa   |   Ar   |   En
   کاربرد الگوریتم جنگل تصادفی در برآورد آثار نشانگرها و تعیین ژن‌های کاندیدا برای صفات تولیدمثلی در گاو شیری هلشتاین ایران  
   
نویسنده جباری تورچی جیران ,علیجانی صادق ,رافت عباس ,عباسی مختارعلی
منبع تحقيقات توليدات دامي - 1403 - دوره : 13 - شماره : 1 - صفحه:95 -109
چکیده    روش یادگیری ماشین، رویکرد قدرتمندی برای مطالعات ژنومی است. هدف تحقیق حاضر، استفاده از روش یادگیری ماشین (جنگل تصادفی) برای پویش ژنومی پیشنهادی صفات تولیدمثلی شامل سن در زمان اولین زایش (afc)، روزهای باز (do)، فاصله گوساله زایی (ci) و نرخ آبستنی دختران (dpr) در گاوهای هلشتاین ایران بود. اطلاعات لازم از مرکز اصلاح نژاد و بهبود تولیدات دامی کشور اخذ شد. اطلاعات ژنوتیپی شامل نشانگرهای چند شکلی تک نوکلئوتیدی (snp) مربوط به 2419 راس گاو هلشتاین نر بود. فایل داده مشتمل بر رکوردهای ثبت شده سال های 1360 تا 1398 شامل 2774183 راس دام بود. با توجه به تفاوت تراکم در اطلاعات ژنومی گاوهای نر، تعداد نشانگرهای آن‌ها نیز با یکدیگر متفاوت بود. برای یکسان سازی نشانگرها از نرم افزار fimpute برای جانهی ژنوتیپ استفاده شد. در این تحقیق با استفاده از الگوریتم جنگل تصادفی که نمونه‌ای از الگوریتم‌های با نظارت و از نوع رگرسیونی است، در مجموع، 21 نشانگر با میزان اهمیت بالا برای صفات مختلف تولید مثلی مشخص شد. سپس، با استفاده از روش هستی شناسی ژن، ژن های پیشنهادی مهمی برای این صفات شناسایی شدند. ژن های mpzl1 و cd247 شناسایی شده روی کروموزوم 3 در ارتباط با صفت afc و ژن های rps6kc1 و fam170a در ارتباط با صفت dpr برای بهبود عملکرد تولید مثلی گاوهای شیری، مهم بوده و می توانند مورد استفاده قرار گیرند. نشانگرها و ژن های شناسایی شده در این تحقیق می توانند اطلاعات جدیدی را در مورد معماری ژنتیکی صفات تولید مثلی برای بهبود ژنومی آن ها ارائه دهد و در طراحی تراشه ها برای ارزیابی صفات تولید مثلی مورد استفاده قرار گیرد.
کلیدواژه الگوریتم جنگل تصادفی، ژنوتیپ، گاو شیری، نشانگر، یادگیری ماشین
آدرس دانشگاه تبریز, دانشکده کشاورزی, گروه علوم دامی, ایران, دانشگاه تبریز, دانشکده کشاورزی, گروه علوم دامی, ایران, دانشگاه تبریز, دانشکده کشاورزی, گروه علوم دامی, ایران, سازمان تحقیقات، آموزش و ترویج کشاورزی, موسسه تحقیقات علوم دامی کشور, ایران
پست الکترونیکی pmaz_abbasi@yahoo.com
 
   application of a random forest algorithm to estimate marker effects and identify candidate genes for reproductive traits in iranian holstein dairy cattle  
   
Authors jabbari tourchi j. ,alijani s. ,rafat a. ,abbasi m. a.
Abstract    introduction: the genome-wide association study (gwas) is a powerful approach to identify genomic regions associated with fertility traits that explain a significant portion of the genetic variance associated with these traits and identify the relevant causal mutations. evaluating the correlation between each genotyped marker and trait is an essential strategy for gwas studies that examine the effects of all markers by considering their possible interactions, environmental factors, and even mutual effects between markers. recently, machine learning methods have been introduced to genomic topics, and the basis of these methods is different from the common methods of genomic evaluation. the machine learning method is used to estimate the genomic breeding values of the candidate animals by considering the training data (genotypic and phenotypic information of the reference population). one of the key advantages of this method is the ability to analyze large data. machine learning is a branch of artificial intelligence whose goal is to achieve machines that can extract knowledge (learning) from the environment. a variety of machine learning methods (random forest, boosting, and deep learning) are used to model genetic variance and environmental factors, study gene networks, gwas, study epistasis effects, and genomic evaluation. random forest is one of the machine learning methods that has been successfully used in various fields of science. this research was conducted to identify markers and genes related to reproductive traits such as calving interval (ci), days open (do), daughter pregnancy rate (dpr), and age at first calving (afc) in iranian holstein dairy cattle. these traits have already been investigated with the ssgblup method and using a smaller sample size. however, in the present research, by using more genotyped animals, a random forest algorithm was used to identify markers and genes related to reproductive traits.materials and methods: the records used in this research were provided by the national animal breeding center and promotion of animal products of iran and included afc, do, ci, and dpr related to the genotyped bulls’ daughters. in this research, the pedigree information of 2774183 animals was used. the genotypic information of the markers related to 2419 holstein bulls was used. genomic data quality control was performed using factors such as the number of genotyped snps per animal (acr), the number of genotyped animals per snp (cr), hardy-weinberg equilibrium (hwe), and minor allele frequency (maf). when filtering genomic data, the markers whose maf was less than 5% were removed, and then the samples whose genotyped frequency was less than 90% were identified and removed. then, the markers whose genotyping rate was less than 95% in the samples were identified and removed. finally, the snps that deviated from the hwe test (p<10-6) were excluded from the analysis as a measure of genotyping error. to control the quality of genomic data, plink 1.9 software was used. then ranfog software was used in the linux environment to perform analysis through random forest algorithm.results and discussion: by using the random forest algorithm, a total of 21 important snps were observed, then important fertility trait candidate genes were identified by the gene ontology method, and 62 genes were within 250 kb of these snps. the most significant snp was observed for afc. the main snp for afc is in ars-bfgl-ngs-22647 bta3, for ci is in ars-bfgl-ngs-114194 (bta11), for do is in bta-74076 -no-rs (bta5), and for dpr is in ars-bfgl-ngs-32553 (bta26). the researchers, who studied fertility traits in nellore cattle using machine learning methods, identified mpzl1 and cd247 genes on chromosome number 3 and this gene was associated with age at first calving. many pathways of cell biology affect the performance of reproductive traits. research has reported the relationship between the cd247 gene and pathways of biology, including cell development and function. research has shown that the iffo2 gene plays an important role in the molecular structure of cells, as well as in the mechanism of blastocyst formation, embryos, and the length of gestation in cattle. in a study conducted on the mouse population on the structure of the flagellum and the sperm maturation process, the role of the aldh4a1 gene in the sperm maturation process was reported. the association of the rps6kc1 gene with pregnancy rate and antral follicle number in nellore heifers has been reported. the kat2b gene is a transcriptional activator that plays an essential role in regulating the correction of histone acetylation and plays an important role in improving carcass quality, muscle and fat development, and metabolism in native chinese cattle. in addition, they play a key role in regulating biological processes and are related to cell growth, metabolism and immune system function.conclusions:  according to the objectives of this research, new information on markers and candidate genes related to reproductive traits in iranian holstein dairy cattle was reported. the markers and candidate genes identified in the present research can be used in genomic selection to improve the reproductive traits of holstein dairy cattle.
Keywords random forest algorithm ,genotype ,dairy cow ,marker ,machine learning
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved