PheWAS Pipeline

The analysis of phenotypes.

PheWAS - Phenome Wide Association Study

PheWAS is a study design that statistically estimates the association between single-nucleotide polymorphisms and a large number of different phenotypes.

Phewas is an R package application that provides methods for the creation of PheWAS phenotypes, analysis, and visualization. PheWAS implementation allows users to translate ICD-9 codes to PheWAS case and control groups, perform analyses using these and/or other phenotypes with covariate adjustments and plot the results.

A Typical PheWAS pipeline consists of four steps:

  1. Data import and transformation;

  2. Create PheWAS table;

  3. PheWAS analysis;

  4. Plotting.

Workflow

User Journey

After curating the dataset, you will land on the configuration page, where the parameters and options to run the pipelines are too.

You will notice that the layout is the same as the GWAS pipeline, with estimated time and cost.

List of parameters

  • SNPs: is a file containing snps (.txt) - if you switch on the button; if not, just a comma-separated list of phenotypes that you want to test. They must correspond to column headers in the file given as an argument for the data parameter. This will be used to perform a small GWAS test in order to generate a list of top SNPs. A single-nucleotide polymorphism is a substitution of a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of 0.5% from person to person in the population.

  • Pheno Codes: the ones used for pheno_file;

  • SNP Threshold: the criteria used to extract the top SNPs;

  • covariates: here you can choose which covariates will participate in the analysis;

  • Additive genotypes: if there are additive genotypes, switch on;

  • Significance Threshold: this is vector of desired significance thresholds to calculate. It can include p-value, Bonferroni, False Discovery Rate (FDR), simplem-genotype, simplem-phenotype, simplem-product;

  • Base Alpha for significance calculation: the base alpha value applied for significance calculations;

  • Use unadjusted test: here you can choose to use Chi-Square and t-tests by switching the button as you please;

  • Use MASS confidence intervals: Uses the MASS package and the confint function to calculate a confidence interval at the specified level.

The default parameters set recommended is:

  • SNP Threshold: 0.5;

  • Significance Threshold: p-value;

  • Base Alpha for significance calculation: 0.5;

  • Use unadjusted test: switch off;

  • Use MASS confidence intervals: switch off.

Results

Once you have selected the dataset to be used, chosen the pipeline and set all the parameters, you can start your analysis using the Run Analysis box; at this point, you will be redirected to this page, where you can keep an eye on which works are In Progress, which are Completed, and choose to carry out a new analysis.

By clicking on your JobName, you will have access to this page, where you can monitor all the processes involved in your analysis:

Now, selecting the Results box on the right, let's take a look at the demo results obtained using the Default Parameters Set:

Manhattan Plot is a type of scatter plot - plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data.

By clicking on the Interactive Graphs option, you can also view your results like this:

Finally, using the Export box, you will be able to download the results of your analysis in a .pdf format file.

Reference

Last updated