Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Condensed Commit by Daniel Ruskin, with contributions from Joey and J…
…avon
  • Loading branch information
Javon Sun authored and dbr15101 committed Apr 28, 2017
0 parents commit 458ca60
Show file tree
Hide file tree
Showing 631 changed files with 23,057 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .gitignore
@@ -0,0 +1,6 @@
Default/
*.o
.settings/
.project
.cproject
lambda.txt
109 changes: 109 additions & 0 deletions README.md
@@ -0,0 +1,109 @@
# Heritable Component Analysis Pipeline

This repository represents a pipeline that performs three primary functions:

1. Heritable Component Analysis

2. Heritability Estimation

3. Kinship Matrix Generation

The pipeline accepts genotypic and phenotypic data, as well as covariates, and generates a highly-heritable trait. It can then estimate the heritability of that trait via the second function above. If the user already has a kinship matrix available (i.e. from GCTA), the program can accept this matrix. Alternatively, it can use genotypic data to generate the kinship matrix.

# Usage

Running the program without any options will trigger the help function, which will show all options that are available. The program takes a command (`kinship`, `h2r`, `hca`, or `score`), as well as a set of options for each command.

Generally speaking, one can follow the below guidelines when using this program:

1. Obtain the following data: phenotypic data, quantitative and discrete covariates (optional), kinship file (optional), genotypic data (required if kinship data is not present). Ensure that individual IDs are present in all of these files, and are common across each file. If an individual ID is missing from one file but present in another, it will not be included in analysis.

2. Determine the parameters for your analysis. These are specified in the “heritable component analysis” help section. There are a few options that you must consider: “numSplits” and “lambdaVecFile”. “numSplits” controls the cross-validation functionality; if this is set to 1 (default), cross-validation will not be performed. If this is set to a value of 2 or more, cross-validation will be performed (see the cross-validation section). “lambdaVecFile” must point to a file with lambda values to use during the HCA process; each line must represent one lambda value.

3. Determine if you would like to save output data to disk; if so, specify the “outDir” parameter (set this to “.” to save to the current directory). In addition, specify the “numThreads” option (default 2) to enable multi-thread functionality.

4. Run HCA with the parameters chosen, and observe the output. Analysis may take a long time depending on the size of your dataset.

Additional options are present, but are not required. Review the documentation and help output for more details.

# HCA Cross-Validation and Lambda Tuning

If “numSplits” is equal to one or is not set, the following process will be used for lambda tuning:

1. HCA will be run with each lambda value.

2. For each Lambda value, Heritability analysis will be run with the generated trait.

3. The Lambda value that generates the most heritable trait trait will be saved as a result of the analysis.

If the “numSplits” option is greater than one, the dataset will be split randomly into “numSplits” splits. The following process will then occur:

1. The code will iterate through each lambda value.

2. For each lambda value, the code will iterate through each split. On each iteration, the chosen split will be used as cross-validation data; all other data will be marked as training data. HCA will be run with the training data and the current lambda value. Once HCA has been run with all splits, the average heritability score for the current lambda value will be calculated.

3. The lambda with the highest average heritability score will be considered the best. HCA and heritability estimation will be re-performed with this lambda value, on the full data set. This will be considered the final result set.

**Important Note on Cross-Validation Functionality**

Note that some datasets may be particularly sensitive to removing certain subjects. As specifying a numSplits value causes subjects to be removed during the training process, this may cause instability in the generated weights. If you notice unstable results with data spliting enabled, consider running the program without this functionality.

# Outputs

In addition to outputting data to the CLI, if an `outDir` parameter is specified, some data will also be saved. For all analysis, if a GRM was generated (non-pregiven), that GRM will be saved to "kinship.csv". The following analysis-specific data will also be saved:

## HCA

When HCA is run, the final weights will be saved to "trait_hca.csv". Indiviudals will be scored with these weights, and the output from the "Scoring" section will be saved. In addition, the output from the "Heritability Analysis" section will be saved for the final weights.

## Heritability Analysis

When heritability analysis is run, statistics regarding the analysis will be saved to "h2r_est.txt".

## Scoring

When scoring is run, the calculated scores will be saved to "scores.txt".

## Kinship Generation

When kinship generation is run, the kinship file will be saved to "kinship.txt".

# Special Note on Heritability Estimation

In the event that the variance-covariance matrix is non-invertible during heritability estimation, small values will be added to the matrix diagonals. This will generally resolve the invertibility error, but may adversely affect the results. A warning will be outputted in the event that the add-to-diagonals approach is used.

# Documentation

Further documentation is available in the `docs` folder.

# Dependencies

The Linux binary should work automatically on most Linux distributions. If not, compile it for your architecture.

The OSX binary requires GCC version 6. Install it by running `brew install gcc6 --without-multilib` on your machine.

# Compiling

To compile on most Linux distributions and OSX, follow these steps:

1. Install the Armadillo matrix library (download [here](http://arma.sourceforge.net/download.html) and run `cmake . && make && sudo make install`)

2. Install the NLOpt optimization library (download [here](http://ab-initio.mit.edu/nlopt/) and run `./configure && make && sudo make install`)

3. Install the OpenBLAS library [from source](https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide). Make sure to specify `DYNAMIC_ARCH=1` when running `make` and `make install`, if you plan on using the binary across multiple architectures.

4. If on Linux, run (i.e. `sudo apt-get install liblapack-dev`). If on Mac, run `brew install gcc6 --without-multilib` and `brew install lapack`.

6. Run `make --file Makefile_osx` or `make --file Makefile_linux`, depending on your platform.

# References

The following references were used while preparing this program:

```
Sun J, Kranzler HR, Bi J. Refining multivariate disease phenotypes for high chip heritability. BMC Medical Genomics. 2015;8(Suppl 3):S3. doi:10.1186/1755-8794-8-S3-S3.
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A Tool for Genome-wide Complex Trait Analysis. American Journal of Human Genetics. 2011;88(1):76-82. doi:10.1016/j.ajhg.2010.11.011.
Yang J, Benyamin B, McEvoy BP, et al. Common SNPs explain a large proportion of heritability for human height. Nature genetics. 2010;42(7):565-569. doi:10.1038/ng.608.
```
Binary file added bin/hca-linux
Binary file not shown.
Binary file added bin/hca-osx
Binary file not shown.
120 changes: 120 additions & 0 deletions docs/html/_data_8cpp.html
@@ -0,0 +1,120 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.13"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>HCA: /Users/danielruskin/src/hca-dev/src/Data.cpp File Reference</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname">HCA
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.13 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
$(function() {
initMenu('',true,false,'search.php','Search');
$(document).ready(function() { init_search(); });
});
</script>
<div id="main-nav"></div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('_data_8cpp.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>

<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>

<div class="header">
<div class="headertitle">
<div class="title">Data.cpp File Reference</div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><code>#include &quot;<a class="el" href="_data_8h_source.html">Data.h</a>&quot;</code><br />
<code>#include &lt;unistd.h&gt;</code><br />
<code>#include &lt;sys/stat.h&gt;</code><br />
<code>#include &lt;bitset&gt;</code><br />
<code>#include &quot;<a class="el" href="_snp_kinship_8h_source.html">SnpKinship.h</a>&quot;</code><br />
<code>#include &quot;<a class="el" href="_given_kinship_8h_source.html">GivenKinship.h</a>&quot;</code><br />
</div><div class="textblock"><div class="dynheader">
Include dependency graph for Data.cpp:</div>
<div class="dyncontent">
<div class="center"><img src="_data_8cpp__incl.png" border="0" usemap="#_2_users_2danielruskin_2src_2hca-dev_2src_2_data_8cpp" alt=""/></div>
<map name="_2_users_2danielruskin_2src_2hca-dev_2src_2_data_8cpp" id="_2_users_2danielruskin_2src_2hca-dev_2src_2_data_8cpp">
<area shape="rect" id="node2" href="_data_8h.html" title="Data.h" alt="" coords="638,94,697,119"/>
<area shape="rect" id="node10" href="_given_kinship_8h.html" title="GivenKinship.h" alt="" coords="123,318,233,343"/>
<area shape="rect" id="node18" href="_snp_kinship_8h.html" title="SnpKinship.h" alt="" coords="318,318,417,343"/>
<area shape="rect" id="node9" href="_individual_data_set_8h.html" title="IndividualDataSet.h" alt="" coords="393,243,529,269"/>
<area shape="rect" id="node16" href="_option_8h.html" title="Option.h" alt="" coords="612,393,683,418"/>
<area shape="rect" id="node19" href="_scorer_8h.html" title="Scorer.h" alt="" coords="525,169,595,194"/>
<area shape="rect" id="node15" href="_individual_data_8h.html" title="IndividualData.h" alt="" coords="471,393,587,418"/>
<area shape="rect" id="node12" href="_kinship_8h.html" title="Kinship.h" alt="" coords="371,393,447,418"/>
<area shape="rect" id="node13" href="util_8h.html" title="util.h" alt="" coords="145,393,193,418"/>
</map>
</div>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="dir_d522931ffa1371640980b621734a4381.html">Users</a></li><li class="navelem"><a class="el" href="dir_60a4da173fca05b90d61823adfb03c66.html">danielruskin</a></li><li class="navelem"><a class="el" href="dir_df4e26f2ffd11e56f607d6303fce6b11.html">src</a></li><li class="navelem"><a class="el" href="dir_b41b214f44b1d1e2573f3fffd563b69b.html">hca-dev</a></li><li class="navelem"><a class="el" href="dir_8a990246551b69b640aea526aed19dbb.html">src</a></li><li class="navelem"><a class="el" href="_data_8cpp.html">Data.cpp</a></li>
<li class="footer">Generated by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
</ul>
</div>
</body>
</html>
11 changes: 11 additions & 0 deletions docs/html/_data_8cpp__incl.map
@@ -0,0 +1,11 @@
<map id="/Users/danielruskin/src/hca&#45;dev/src/Data.cpp" name="/Users/danielruskin/src/hca&#45;dev/src/Data.cpp">
<area shape="rect" id="node2" href="$_data_8h.html" title="Data.h" alt="" coords="638,94,697,119"/>
<area shape="rect" id="node10" href="$_given_kinship_8h.html" title="GivenKinship.h" alt="" coords="123,318,233,343"/>
<area shape="rect" id="node18" href="$_snp_kinship_8h.html" title="SnpKinship.h" alt="" coords="318,318,417,343"/>
<area shape="rect" id="node9" href="$_individual_data_set_8h.html" title="IndividualDataSet.h" alt="" coords="393,243,529,269"/>
<area shape="rect" id="node16" href="$_option_8h.html" title="Option.h" alt="" coords="612,393,683,418"/>
<area shape="rect" id="node19" href="$_scorer_8h.html" title="Scorer.h" alt="" coords="525,169,595,194"/>
<area shape="rect" id="node15" href="$_individual_data_8h.html" title="IndividualData.h" alt="" coords="471,393,587,418"/>
<area shape="rect" id="node12" href="$_kinship_8h.html" title="Kinship.h" alt="" coords="371,393,447,418"/>
<area shape="rect" id="node13" href="$util_8h.html" title="util.h" alt="" coords="145,393,193,418"/>
</map>
1 change: 1 addition & 0 deletions docs/html/_data_8cpp__incl.md5
@@ -0,0 +1 @@
b873e50246071bf6ccec9b6b36c9ae3c
Binary file added docs/html/_data_8cpp__incl.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 458ca60

Please sign in to comment.