Title: | Pathway Analysis Methods for Genomewide Association Data |
---|---|
Description: | Bayesian hierarchical methods for pathway analysis of genomewide association data: Normal/Bayes factors and Sparse Normal/Adaptive lasso. The Frequentist Fisher's product method is included as well. |
Authors: | Marina Evangelou |
Maintainer: | Marina Evangelou <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0 |
Built: | 2024-11-26 02:50:07 UTC |
Source: | https://github.com/cran/PAGWAS |
Bayesian hierarchical methods for pathway analysis of genomewide association data: Normal/Bayes factors and Sparse Normal/Adaptive lasso. The Frequentist Fisher's method is included as well.
Package: | PAGWAS |
Type: | Package |
Version: | 2.0 |
Date: | 2015-12-02 |
License: | GPL (>=2) |
LazyLoad: | yes |
Marina Evangelou Maintainer: Marina Evangelou <[email protected]>
Evangelou, M., Dudbridge, F., Wernisch, L. (2014). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics 30(5), 690 - 697
Evangelou, M., Rendon, A., Ouhewand, W. H., Wernisch, L., Dudbridge, F. (2012) Comparison of methods for competitive tests of pathway analysis. Plos One 7(7): e41018
Returns a data frame with L rows and M columns. L is the number of SNPs in the genotypes data frame and M is the number of tested pathways.
create.pathway.df(genotypes,snps.paths)
create.pathway.df(genotypes,snps.paths)
genotypes |
Genotype matrix, with L SNPs (columns) and N individuals (rows) |
snps.paths |
A list with entries the SNP members of each pathway. The size of the list is M |
A data frame with columns equal to the number of pathways in the pathway.snps list and rows equal to the number of tested SNPs
SNPs
, genes
, snps.to.pathways
snps.to.genes
data(SNPs) data(genes) data(pathways) data(genotypes) snps.genes <- snps.to.genes(snp.info=SNPs,gene.info=genes, distance=0) pathway.snps <- snps.to.pathways(pathways,snps.genes) P <- create.pathway.df(genotypes=genotypes,snps.paths=pathway.snps)
data(SNPs) data(genes) data(pathways) data(genotypes) snps.genes <- snps.to.genes(snp.info=SNPs,gene.info=genes, distance=0) pathway.snps <- snps.to.pathways(pathways,snps.genes) P <- create.pathway.df(genotypes=genotypes,snps.paths=pathway.snps)
Creates a pathway matrix, with rows the SNPs assigned to each pathway
create.pathway.matrix(genotypes,pathway.snps)
create.pathway.matrix(genotypes,pathway.snps)
genotypes |
Genotype matrix, with L SNPs (columns) and N individuals (rows) |
pathway.snps |
A list of the SNPs members of each pathway |
A matrix with columns equal to the number of pathways in the pathway.snps list and rows equal to the number of SNPs in the genotypes data-frame
SNPs
, genes
, snps.to.pathways
,
snps.to.genes
## Not run: data(SNPs) data(genes) data(pathways) data(genotypes) snps.genes <- snps.to.genes(snp.info=SNPs,gene.info=genes, distance=50) pathway.snps <- snps.to.pathways(pathways,snps.genes) P <- create.pathway.matrix(genotypes,pathway.snps) ## End(Not run)
## Not run: data(SNPs) data(genes) data(pathways) data(genotypes) snps.genes <- snps.to.genes(snp.info=SNPs,gene.info=genes, distance=50) pathway.snps <- snps.to.pathways(pathways,snps.genes) P <- create.pathway.matrix(genotypes,pathway.snps) ## End(Not run)
Calculates the Fisher's method p-value for a set of p-values. It returns both the p-value and the test statistic value of the Fisher's product method.
FM.chi.pvalue(x)
FM.chi.pvalue(x)
x |
A vector of p-values. These p-values can be either gene or SNP p-values of a tested pathway |
FMstatistic |
Fisher's product method test statistic |
FMpvalue |
Fisher's method p-value, computed using the exact distribution of the Fisher's method test statistic which is a Chi^2 distribution with degrees of freedom twice the size of vector x |
Evangelou M, Rendon A, Ouwehand WH, Wernisch L, Dudbridge F (2012) Comparison of Methods for Competitive Tests of Pathway Analysis. PLoS ONE 7(7): e41018. doi:10.1371/journal.pone.0041018
FM.chi.pvalue(x=c(0.05,0.1))
FM.chi.pvalue(x=c(0.05,0.1))
A data frame with 20 rows and 4 columns.
data(genes)
data(genes)
Column names:
Name
Name of gene
Start
Start position of gene on the genome
End
End position of gene on the genome
Chr
Chromosome of gene
data(genes) print(genes[1:5,])
data(genes) print(genes[1:5,])
A data frame with 75 rows (individuals) and 100 columns (SNPs). The entries of the genotype matrix are 0, 1 and 2. There are no missing values.
data(genotypes)
data(genotypes)
data(genotypes)
data(genotypes)
A list with 8 different combinations of the hyper-parameters of NBF. Possible values for the hyper-parameters a, b, s2_0 and nu_0 are given in each entry of the list
data(list.of.parameters)
data(list.of.parameters)
data(list.of.parameters) a=list.of.parameters[[1]][1] b=list.of.parameters[[1]][2] s2_0=list.of.parameters[[1]][3] nu_0=list.of.parameters[[1]][4]
data(list.of.parameters) a=list.of.parameters[[1]][1] b=list.of.parameters[[1]][2] s2_0=list.of.parameters[[1]][3] nu_0=list.of.parameters[[1]][4]
A vector of the computed Bayes factors for the tested pathways.
NBF(y, G, P, a, b, s2, nu)
NBF(y, G, P, a, b, s2, nu)
y |
Response vector of length N |
G |
Genotype matrix, with N rows and L columns (number of tested SNPs) |
P |
Pathway matrix, with L columns and M columns (number of tested pathways) |
a |
Hyper-parameter of the variance assumed for the integrated out SNP effects |
b |
Hyper-parameter of the variance assumed for the pathway effects |
s2 |
Hyper-parameter of the Inverse-Chi-squared distribution assumed for the variance of the response vector |
nu |
Hyper-parameter of the Inverse-Chi-squared distribution assumed for the variance of the response vector |
A vector of the computed Bayes factors of the same length as the number of tested pathways
Evangelou, M., Dudbridge, F., Wernisch, L. (2014). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics, 30(5), 690 - 697.
## Not run: data(genotypes) G=genotypes data(pathways) data(SNPs) data(genes) snps.genes=snps.to.genes(SNPs,genes,distance=0) snps.paths=snps.to.pathways(pathways,snps.genes) P=create.pathway.df(G,snps.paths) y=rnorm(nrow(G),mean=0,sd=10) NBF(y,G,P,a,b,s2,nu) ## End(Not run)
## Not run: data(genotypes) G=genotypes data(pathways) data(SNPs) data(genes) snps.genes=snps.to.genes(SNPs,genes,distance=0) snps.paths=snps.to.pathways(pathways,snps.genes) P=create.pathway.df(G,snps.paths) y=rnorm(nrow(G),mean=0,sd=10) NBF(y,G,P,a,b,s2,nu) ## End(Not run)
A list of two pathways. The gene members of each pathway are given.
data(pathways)
data(pathways)
data(pathways)
data(pathways)
This function returns the appropriate hyper-parameter a for the integrated out SNP effects of SNAL
return.a.SNAL(y, G, P, no.simulations, no.paths, no.snps)
return.a.SNAL(y, G, P, no.simulations, no.paths, no.snps)
y |
Response vector of length N |
G |
Genotype matrix, with N rows and L columns (L is the number of tested SNPs) |
P |
Pathway matrix, with L columns and M columns (M is the number of tested pathways) |
no.simulations |
Number of simulations to run |
no.paths |
Number of pathways assumed to be causal for the simulations |
no.snps |
Number of SNPs assumed to be causal for the simulations |
Returns the value of the hyper-parameter a that gives the highest the power of SNAL
Evangelou, M., Dudbridge, F., Wernisch, L. (2013). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics (to appear)
This function returns a data-frame of true positive rates and false positive rates of NBF for the Bayes factors thresholds chosen by the user
return.bf.NBF(y, G, P, a, b, s2, nu, no.simulations, no.paths, no.snps, BF.cutoffs)
return.bf.NBF(y, G, P, a, b, s2, nu, no.simulations, no.paths, no.snps, BF.cutoffs)
y |
Response vector of length N |
G |
Genotype matrix, with N rows and L columns (L is the number of tested SNPs) |
P |
Pathway matrix, with L columns and M columns (M is the number of tested pathways) |
a |
Hyper-parameter a of the variance assumed for the integrated out SNP effects |
b |
Hyper-parameter b of the variance assumed for the pathway effects |
s2 |
Hyper-parameter s2_0 of the Inverse-Chi-squared distribution assumed for the variance of the response vector |
nu |
Hyper-parameter nu_0 of the Inverse-Chi-squared distribution assumed for the variance of the response vector |
no.simulations |
Number of simulations to run |
no.paths |
Number of pathways assumed to be causal for the simulations |
no.snps |
Number of SNPs assumed to be causal for the simulations |
BF.cutoffs |
A vector of Bayes factors (BFs) thresholds to test |
Returns a data-frame of true positive and false positive rates for the Bayes factors (BFs) thresholds given by the user
Evangelou, M., Dudbridge, F., Wernisch, L. (2013). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics (to appear)
NBF
, return.hyperparameters.NBF
## Not run: return.bf.NBF(y,G,P,a=1e-4,b=1e-2,s2=0.25,nu=200, no.simulations=100,no.paths=10,no.snps=20,BF.cutoffs=c(0.5,0.75,0.9,1)) ## End(Not run)
## Not run: return.bf.NBF(y,G,P,a=1e-4,b=1e-2,s2=0.25,nu=200, no.simulations=100,no.paths=10,no.snps=20,BF.cutoffs=c(0.5,0.75,0.9,1)) ## End(Not run)
This function returns the four hyper-parameters a, b, s2_0, nu_0 of NBF
return.hyperparameters.NBF(y, G, P, no.simulations, no.paths, no.snps, list.parameters)
return.hyperparameters.NBF(y, G, P, no.simulations, no.paths, no.snps, list.parameters)
y |
Response vector of length N |
G |
Genotype matrix, with N rows and L columns (L is the number of tested SNPs) |
P |
Pathway matrix, with L columns and M columns (M is the number of tested pathways) |
no.simulations |
Number of simulations to run |
no.paths |
Number of pathways assumed to be causal for the simulations |
no.snps |
Number of SNPs assumed to be causal for the simulations |
list.parameters |
A list of various combinations of the four hyper-parameters, see |
Returns a list with the four hyper-parameters of NBF
a |
Hyper-parameter of the variance assumed for the integrated out SNP effects |
b |
Hyper-parameter of the variance assumed for the pathway effects |
s2_0 |
Hyper-parameter of the Inverse-Chi-squared distribution assumed for the variance of the response vector |
nu_0 |
Hyper-parameter of the Inverse-Chi-squared distribution assumed for the variance of the response vector |
Evangelou, M., Dudbridge, F., Wernisch, L. (2013). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics (to appear)
## Not run: hyper=return.hyperparameters.NBF(y,G,P,no.simulations=100, no.paths=10,no.snps=20, list.parameters=list(c(1e-3,1e-3,0.25,200),c(1e-2,1e-3,0.25,100))) ## End(Not run)
## Not run: hyper=return.hyperparameters.NBF(y,G,P,no.simulations=100, no.paths=10,no.snps=20, list.parameters=list(c(1e-3,1e-3,0.25,200),c(1e-2,1e-3,0.25,100))) ## End(Not run)
This function returns a data-frame of true positive rates and false positive rates of SNAL for the s2 thresholds chosen by the user
return.s2.SNAL(y, G, P, a, no.simulations, no.paths, no.snps, s2.cutoffs)
return.s2.SNAL(y, G, P, a, no.simulations, no.paths, no.snps, s2.cutoffs)
y |
Response vector of length N |
G |
Genotype matrix, with N rows and L columns (L is the number of tested SNPs) |
P |
Pathway matrix, with L columns and M columns (M is the number of tested pathways) |
a |
Hyper-parameter of the variance assumed for the integrated out SNP effects |
no.simulations |
Number of simulations to run |
no.paths |
Number of pathways assumed to be causal for the simulations |
no.snps |
Number of SNPs assumed to be causal for the simulations |
s2.cutoffs |
A vector of s2 thresholds to test |
Returns a data-frame of true positive and false positive rates for the s2 thresholds given by the user
Evangelou, M., Dudbridge, F., Wernisch, L. (2013). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics (to appear)
## Not run: return.s2.SNAL(y,G,P,a=1e-2,no.simulations=100, no.paths=10,no.snps=20,s2.cutoffs=c(0.5,1,2)) ## End(Not run)
## Not run: return.s2.SNAL(y,G,P,a=1e-2,no.simulations=100, no.paths=10,no.snps=20,s2.cutoffs=c(0.5,1,2)) ## End(Not run)
Computes the area under a ROC curve using the convex hull of the curve
roc.convex(sens, spec)
roc.convex(sens, spec)
sens |
Vector with the values of the recorded sensitivity (true positive rate) |
spec |
Vector with the values of the recorded specificity (1-false positive rate) |
Returns the computed area under the ROC curve
Marina Evangelou, Lorenz Wernisch
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27
Evangelou, M., Dudbridge, F., Wernisch, L. (2013). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics (to appear)
roc.convex(sens=c(0.1,0.5,1),spec=c(0.1,0.3,0.7))
roc.convex(sens=c(0.1,0.5,1),spec=c(0.1,0.3,0.7))
Sparse Normal/Adaptive lasso method applied for finding the associated pathways. The iterative algorithm suggested by Wipf and Nagarajan (2008) is applied. A vector equal to the number of tested pathways is returned, the zero entries of the vector correspond to the pathways that are not associated. The posterior estimates of the beta coefficients are also returned as they are described by Wipf and Nagarajan (2008).
SNAL(y, G, P, a, s2)
SNAL(y, G, P, a, s2)
y |
Response vector of length N |
G |
Genotype matrix, with N rows and L columns (number of tested SNPs) |
P |
Pathway matrix, with L columns and M columns (number of tested pathways) |
a |
Hyper-parameter of the variance assumed for the integrated out SNP effects |
s2 |
Variance assumed for the response variable, the tuning parameter of adaptive lasso |
gamma.star |
Estimates of gamma hyper-parameters |
ARD |
Posterior estimates of beta coefficients |
Evangelou, M., Dudbridge, F., Wernisch, L. (2014). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics, 30(5), 690 - 697.
Wipf, D. and Nagarajan, S. (2008). A new view of automatic relevance determination. Advances in Neural Information Processing Systems, 20
## Not run: data(genotypes) G=genotypes data(pathways) data(SNPs) data(genes) snps.genes=snps.to.genes(SNPs,genes,distance=0) snps.paths=snps.to.pathways(pathways,snps.genes) P=create.pathway.df(G,snps.paths) y=rnorm(nrow(G),mean=0,sd=10) SNAL(y,G,P,a,s2) ## End(Not run)
## Not run: data(genotypes) G=genotypes data(pathways) data(SNPs) data(genes) snps.genes=snps.to.genes(SNPs,genes,distance=0) snps.paths=snps.to.pathways(pathways,snps.genes) P=create.pathway.df(G,snps.paths) y=rnorm(nrow(G),mean=0,sd=10) SNAL(y,G,P,a,s2) ## End(Not run)
For more details please read SNAL.
SNAL.calculation(Y, Phi, s2)
SNAL.calculation(Y, Phi, s2)
Y |
Response vector of length N |
Phi |
Design matrix, with N rows and M columns (number of tested variables) |
s2 |
Variance assumed for the response variable, the tuning parameter of the adaptive lasso problem |
gamma.star |
Estimates of gamma hyper-parameters |
ARD |
Posterior estimates of beta coefficients |
Evangelou, M., Dudbridge, F., Wernisch, L. (2014). Two novel pathway analysis methods based on a hierarchical model. Bioinformatics, 30(5), 690 - 697
Wipf, D. and Nagarajan, S. (2008). A new view of automatic relevance determination. Advances in Neural Information Processing Systems, 20
## Not run: SNAL.calculation(Y,Phi,s2=0.5)
## Not run: SNAL.calculation(Y,Phi,s2=0.5)
A data frame with 100 rows and 3 columns.
data(SNPs)
data(SNPs)
Column names:
Name
SNP name
Position
Position of SNP on the genome
Chr
Chromosome of the SNP
data(SNPs) print(SNPs[1:5,])
data(SNPs) print(SNPs[1:5,])
Assigns SNPs to genes based on their physical distance.
snps.to.genes(snp.info, gene.info, distance)
snps.to.genes(snp.info, gene.info, distance)
snp.info |
A data frame with 3 columns with names: Name, Position and Chr that correspond to the SNP name, its position on the genome and its chromosome, respectively |
gene.info |
A data frame with 4 columns with names: Name, Start, End and Chr that correspond to the gene name, start and end positions on the genome and its chromosome, respectively |
distance |
A number that corresponds to the distance below and above the Start and End positions of the gene that all SNPs in that region should be assigned to the gene |
A list of the same size as the number of genes of the gene.info data frame. The names of the SNPs assigned to each gene are returned
data(SNPs) data(genes) snps.to.genes(snp.info=SNPs,gene.info=genes,distance=50)
data(SNPs) data(genes) snps.to.genes(snp.info=SNPs,gene.info=genes,distance=50)
Assigns SNPs to pathways, using the pathway gene members and the SNPs assigned to each gene.
snps.to.pathways(pathways,gene.snps)
snps.to.pathways(pathways,gene.snps)
pathways |
A list of pathways with their gene members |
gene.snps |
A list of genes with the SNPs assigned to them according to their physical distance on the genome |
A list of the same size as the number of pathways in the pathway list. The names of the SNPs assigned to each pathway are returned. Empty pathways are also returned.
data(SNPs) data(genes) data(pathways) snps.genes <- snps.to.genes(snp.info=SNPs,gene.info=genes, distance=50) pathway.snps <- snps.to.pathways(pathways,snps.genes)
data(SNPs) data(genes) data(pathways) snps.genes <- snps.to.genes(snp.info=SNPs,gene.info=genes, distance=50) pathway.snps <- snps.to.pathways(pathways,snps.genes)