This installation of the ALDsuite package assumes:
To install the ALDdata package issue the following commands in the terminal window after navigating to the directory where you want the source files to reside:
git clone https://github.com/johnsonra/ALDdata
cd ALDdata
R CMD INSTALL ALDdata
cd ..
git clone https://github.com/johnsonra/ALDsuite
cd ALDsuite
R CMD INSTALL ALDsuite
At this point the source directories may be deleted.
Before you train your model using HapMap data, we need to know which markers you will be using. If you have a list of polymorphic rs numbers, you can skip this step, but for this example we are going to train the model for chromosome 20 using the phased ASW data from HapMap (see Importing VCF files for instructions on importing data from a VCF file).
require(ALDdata)
data(asw20)
We also want to do a little quality control of our data:
##### drop monomorphic SNPs #####
mono <- apply(phased, 2, sum) == 0
phased <- phased[,!mono]
##### check for HWE problems #####
# each individual has two chromosome in this data set
chrA <- (1:(nrow(phased)/2)) * 2 # indices for chromosome A
chrB <- chrA - 1 # indices for chromosome B
genotypes <- phased[chrA,] + phased[chrB,]
require(hwde)
hweScore <- hwexact(apply(genotypes == 0, 2, sum), # observed homozygote 0
apply(genotypes == 1, 2, sum), # observed hets
apply(genotypes == 2, 2, sum)) # observed homozygote 1
phased <- phased[,!hweScore < 1e-4]
genotypes <- genotypes[,!hweScore < 1e-4]
Now that we have a list of SNPs to use in our model, the next step is to train the model based on HapMap data. In this example, we will use the CEU and YRI samples as surrogate parental populations. This can take a little while to run.
# setup.prior needs to reference the data in the hapmap dataset
require(ALDdata)
data(hapmap)
require(ALDsuite)
Pm.prior <- setup.prior(snps = colnames(phased), pops = c('YRI', 'CEU'),
maxpcs = 6, window = 0.1, phased = TRUE)
Now that we have trained our model, we can infer local ancestry in our data. The ancestry()
function will need at least the following arguments:
# pull information for rs numbers marking each window in Pm.prior
info <- subset(hapmap, rs %in% names(Pm.prior))
calls <- admixture(Pm.prior, haps = phased, chr = info$chr, pos = info$cM, cores = 1, iter = 1)
Your local chromosomal ancestry estimates will be given in calls$gammas
.
str(calls$gammas)
## num [1:63, 1:703, 1:2, 1:2] 1.00 4.00e-24 1.00 9.25e-29 3.90e-23 ...
## - attr(*, "dimnames")=List of 4
## ..$ : chr [1:63] "NA19909_A" "NA19908_A" "NA19901_A" "NA19900_A" ...
## ..$ : chr [1:703] "rs1418258" "rs6038405" "rs6055084" "rs373225" ...
## ..$ : chr [1:2] "Mother" "Father"
## ..$ : chr [1:2] "YRI" "CEU"