Share this post on:

non-association SNPs with zero effects. We then fix the PVE at the predetermined value, and simulate the residual from the distribution N 0, 2 where 2 = j Var. Phenotype y is generated using W, b and according to Eq. 4. For Simulation Case 2, the data set is generated in a similar way as in Case 1, with the only difference being that the random effects for the non-association SNPs are simulated from N 0, 2 where is a very small number instead of zero. and the random effect variance b by the HBM are close to their corresponding true values, 0.01 and 0.1, respectively. This demonstrates the good performance of our estimation method. In both simulation cases, the MLM 2 severely underestimates b, as it divides the total genetic variance onto all the SNPs, instead of just the 1% association SNPs, which results in underestimation of the genetic effects. In addition, in Simulation Case 2, the estimated PVE from the MLM is much larger than the true value while the HBM gives a closer PVE estimate. The reason is that the MLM can not distinguish the “significant” SNP effects versus those “noisy” effects due to its assumption that all random effects follow the same distribution. Therefore, g2 obtained by MLM would include both “significant” and “noisy” effects and thus lead to overestimation of PVE according to. We comment that the simulation model in this case is different from the underlying models assumed by our HBM and the MLM of GCTA. As the results indicate, the HBM is rather robust against such model misspecification. Example 2 This simulation example is used to demonstrate the performance of the HBM algorithm when the number of SNPs is large, in the order of real GWAS. We have to implement several computational optimizing strategies in order to speed up the computation on such a large number of SNPs as well as to efficiently use the computer memory. First, in each iteration of the HBM algorithm, we need to invert a square matrix with the rank the same as the number of SNPs. Thus, we carry out the analysis in parallel on UNC-CH’s multi-core Linux-based cluster computing server. We write scripts to distribute the computation among multiple cores/CPUs and run multiple computing analyses simultaneously. Our study shows that parallel computing can speed up the computation by a factor of 20 on a 10-core computing node on the cluster. It takes 668.5 minutes and 158GB memory to finish the calculation for the simulated data set with 100,000 SNPs. To consider whole genome data with even more SNPs, the amount of memory and computation power of the server will be the main bottleneck. Similar to Example 1 in Section “Example 1”, we consider the same two simulation cases. The estimation results are summarized in Real data set results The Framingham heart study Genotyping for FHS participants was performed using the Affymetrix 500K GeneChip array. Genotypes on the Y chromosome are not included in our analysis. A Luteolin 7-glucoside standard quality control filter is applied to the genotype data. Individuals with 5% or more missing genotype data were excluded from analysis. SNPs that are on the X chromosomes and have a call rate 99% or a minor allele frequency 0.01 were also eliminated from the analysis. The application of the quality control procedures resulted in 8,738 PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/19800191 individuals with 287,525 SNPs from the 500,000 genotype data. Genotype data were converted to minor allele frequencies for the analysis. One individual of a pair is deleted if the genetic relationship is gr

Share this post on:

Author: GTPase atpase