PLINK Howto
+ Installation
It is widely advisable to compile PLINK from scratch since it allows
to link it against the LAPACK library and to enable R back-end.
Depending on your OS, you will probably have to rework the included
Makefile a little bit. Here is how I did for a Mac OS X 10.6 (Snow
Leopard), 64 bits:
+ Data format
PLINK \cite{Purcell:2007} is able to process several kind of
genotyping data. Most common input file are tab or space delimited
file summairizing information relative to markers (.map file) and
allele frequency (.ped file).
Binary file can be imported in R with utilities from snpMatrix or
rtracklayer.
What to do if you have an unexpected PED file, e.g. missing
information on gender or Family ID?
--no-fid
--allow-no-sex
To bypass filters applied during file reading, use the --all
option. It is equivalent to
--mind 1 no calculation of per-individual genotyping rate
--geno 1 no calculation of for per-snp genotyping rate
--maf 0 no check of minimum allele frequency
+ Uncovering population structure
It is easy to check that an MDS allows to uncover population structure
in the HapMap data which can be downloaded from here:
http://pngu.org/~purcell/plink/res.shtml.
First, we need to compute the pairwise IBS distance matrix (which took
about 4h on an Intel Core Duo 2 2.8 GHz with 4 Go of RAM).
plink --bfile /Users/chl/nosave/hapmap_r23a/hapmap_r23a
--Z-genome --out hapmapALLr23a
Coordinates of the 270 individuals on the first four dimensions may
easily be obtained using
plink --bfile /Users/chl/nosave/hapmap_r23a/hapmap_r23a
--read-genome hapmapALLr23a.genome.gz
--cluster
--mds-plot 4
We can check that we would obtain similar results for the first two
components with a PCA. We need to compute X.X' where X dimensions are
an n x p (hence the transpose on the second X to avoid working with a
matrix of size p x p, where p > 10^5). To speed up the computations,
we only use a random sample of the 4,098,136 SNPs availbale in the
Hapmap r23a.
......................................................................
a <- read.table("plink.mds",header=T)
b <- read.table("hapmap.pop",header=F)
a <- merge(a,b,by.x="IID",by.y="V2",all.x=T)
pairs(a[,4:7],col=a$V3)
library(snpMatrix)
hm23 <- read.plink("hapmap_r23a")
hm23.5000 <- hm23[,sample(ncol(hm23),5000,rep=F)]
hm23.5000.xxt <- xxt(hm23.5000, correct.for.missing=F)
hm23.5000.ev <- eigen(hm23.5000.xxt, symmetric=T)
plot(hm23.5000.ev$values[1:10])
......................................................................
However, if we really like to work on the whole data, we need to
proceed one chromosome at a time. Assuming each autosomal chromosomes
have been stored in snpMatrix object, you can bind them together into
an smlSet (GGtools), apply the procedure described above for each
chromosome, store the resulting X.X' matrix and then add them
together.
show example
xxt <- list()
for (i in 1:22)
xxt[[i]] <- xxt(smlList(ss)[[i]], correct.for.missing=F)
+ Running a GWAS with one phenotype
+ Useful on-line references
http://bioinfo.mc.vanderbilt.edu/SZGR/
www.ncbi.nlm.nih.gov/gap
www.genome.gov/27532174
----------------------------
Wed Nov 25 21:42:05 CET 2009