-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Condensed Commit by Daniel Ruskin, with contributions from Joey and J…
…avon
- Loading branch information
0 parents
commit 458ca60
Showing
631 changed files
with
23,057 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Default/ | ||
*.o | ||
.settings/ | ||
.project | ||
.cproject | ||
lambda.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Heritable Component Analysis Pipeline | ||
|
||
This repository represents a pipeline that performs three primary functions: | ||
|
||
1. Heritable Component Analysis | ||
|
||
2. Heritability Estimation | ||
|
||
3. Kinship Matrix Generation | ||
|
||
The pipeline accepts genotypic and phenotypic data, as well as covariates, and generates a highly-heritable trait. It can then estimate the heritability of that trait via the second function above. If the user already has a kinship matrix available (i.e. from GCTA), the program can accept this matrix. Alternatively, it can use genotypic data to generate the kinship matrix. | ||
|
||
# Usage | ||
|
||
Running the program without any options will trigger the help function, which will show all options that are available. The program takes a command (`kinship`, `h2r`, `hca`, or `score`), as well as a set of options for each command. | ||
|
||
Generally speaking, one can follow the below guidelines when using this program: | ||
|
||
1. Obtain the following data: phenotypic data, quantitative and discrete covariates (optional), kinship file (optional), genotypic data (required if kinship data is not present). Ensure that individual IDs are present in all of these files, and are common across each file. If an individual ID is missing from one file but present in another, it will not be included in analysis. | ||
|
||
2. Determine the parameters for your analysis. These are specified in the “heritable component analysis” help section. There are a few options that you must consider: “numSplits” and “lambdaVecFile”. “numSplits” controls the cross-validation functionality; if this is set to 1 (default), cross-validation will not be performed. If this is set to a value of 2 or more, cross-validation will be performed (see the cross-validation section). “lambdaVecFile” must point to a file with lambda values to use during the HCA process; each line must represent one lambda value. | ||
|
||
3. Determine if you would like to save output data to disk; if so, specify the “outDir” parameter (set this to “.” to save to the current directory). In addition, specify the “numThreads” option (default 2) to enable multi-thread functionality. | ||
|
||
4. Run HCA with the parameters chosen, and observe the output. Analysis may take a long time depending on the size of your dataset. | ||
|
||
Additional options are present, but are not required. Review the documentation and help output for more details. | ||
|
||
# HCA Cross-Validation and Lambda Tuning | ||
|
||
If “numSplits” is equal to one or is not set, the following process will be used for lambda tuning: | ||
|
||
1. HCA will be run with each lambda value. | ||
|
||
2. For each Lambda value, Heritability analysis will be run with the generated trait. | ||
|
||
3. The Lambda value that generates the most heritable trait trait will be saved as a result of the analysis. | ||
|
||
If the “numSplits” option is greater than one, the dataset will be split randomly into “numSplits” splits. The following process will then occur: | ||
|
||
1. The code will iterate through each lambda value. | ||
|
||
2. For each lambda value, the code will iterate through each split. On each iteration, the chosen split will be used as cross-validation data; all other data will be marked as training data. HCA will be run with the training data and the current lambda value. Once HCA has been run with all splits, the average heritability score for the current lambda value will be calculated. | ||
|
||
3. The lambda with the highest average heritability score will be considered the best. HCA and heritability estimation will be re-performed with this lambda value, on the full data set. This will be considered the final result set. | ||
|
||
**Important Note on Cross-Validation Functionality** | ||
|
||
Note that some datasets may be particularly sensitive to removing certain subjects. As specifying a numSplits value causes subjects to be removed during the training process, this may cause instability in the generated weights. If you notice unstable results with data spliting enabled, consider running the program without this functionality. | ||
|
||
# Outputs | ||
|
||
In addition to outputting data to the CLI, if an `outDir` parameter is specified, some data will also be saved. For all analysis, if a GRM was generated (non-pregiven), that GRM will be saved to "kinship.csv". The following analysis-specific data will also be saved: | ||
|
||
## HCA | ||
|
||
When HCA is run, the final weights will be saved to "trait_hca.csv". Indiviudals will be scored with these weights, and the output from the "Scoring" section will be saved. In addition, the output from the "Heritability Analysis" section will be saved for the final weights. | ||
|
||
## Heritability Analysis | ||
|
||
When heritability analysis is run, statistics regarding the analysis will be saved to "h2r_est.txt". | ||
|
||
## Scoring | ||
|
||
When scoring is run, the calculated scores will be saved to "scores.txt". | ||
|
||
## Kinship Generation | ||
|
||
When kinship generation is run, the kinship file will be saved to "kinship.txt". | ||
|
||
# Special Note on Heritability Estimation | ||
|
||
In the event that the variance-covariance matrix is non-invertible during heritability estimation, small values will be added to the matrix diagonals. This will generally resolve the invertibility error, but may adversely affect the results. A warning will be outputted in the event that the add-to-diagonals approach is used. | ||
|
||
# Documentation | ||
|
||
Further documentation is available in the `docs` folder. | ||
|
||
# Dependencies | ||
|
||
The Linux binary should work automatically on most Linux distributions. If not, compile it for your architecture. | ||
|
||
The OSX binary requires GCC version 6. Install it by running `brew install gcc6 --without-multilib` on your machine. | ||
|
||
# Compiling | ||
|
||
To compile on most Linux distributions and OSX, follow these steps: | ||
|
||
1. Install the Armadillo matrix library (download [here](http://arma.sourceforge.net/download.html) and run `cmake . && make && sudo make install`) | ||
|
||
2. Install the NLOpt optimization library (download [here](http://ab-initio.mit.edu/nlopt/) and run `./configure && make && sudo make install`) | ||
|
||
3. Install the OpenBLAS library [from source](https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide). Make sure to specify `DYNAMIC_ARCH=1` when running `make` and `make install`, if you plan on using the binary across multiple architectures. | ||
|
||
4. If on Linux, run (i.e. `sudo apt-get install liblapack-dev`). If on Mac, run `brew install gcc6 --without-multilib` and `brew install lapack`. | ||
|
||
6. Run `make --file Makefile_osx` or `make --file Makefile_linux`, depending on your platform. | ||
|
||
# References | ||
|
||
The following references were used while preparing this program: | ||
|
||
``` | ||
Sun J, Kranzler HR, Bi J. Refining multivariate disease phenotypes for high chip heritability. BMC Medical Genomics. 2015;8(Suppl 3):S3. doi:10.1186/1755-8794-8-S3-S3. | ||
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A Tool for Genome-wide Complex Trait Analysis. American Journal of Human Genetics. 2011;88(1):76-82. doi:10.1016/j.ajhg.2010.11.011. | ||
Yang J, Benyamin B, McEvoy BP, et al. Common SNPs explain a large proportion of heritability for human height. Nature genetics. 2010;42(7):565-569. doi:10.1038/ng.608. | ||
``` |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | ||
<html xmlns="http://www.w3.org/1999/xhtml"> | ||
<head> | ||
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> | ||
<meta http-equiv="X-UA-Compatible" content="IE=9"/> | ||
<meta name="generator" content="Doxygen 1.8.13"/> | ||
<meta name="viewport" content="width=device-width, initial-scale=1"/> | ||
<title>HCA: /Users/danielruskin/src/hca-dev/src/Data.cpp File Reference</title> | ||
<link href="tabs.css" rel="stylesheet" type="text/css"/> | ||
<script type="text/javascript" src="jquery.js"></script> | ||
<script type="text/javascript" src="dynsections.js"></script> | ||
<link href="navtree.css" rel="stylesheet" type="text/css"/> | ||
<script type="text/javascript" src="resize.js"></script> | ||
<script type="text/javascript" src="navtreedata.js"></script> | ||
<script type="text/javascript" src="navtree.js"></script> | ||
<script type="text/javascript"> | ||
$(document).ready(initResizable); | ||
</script> | ||
<link href="search/search.css" rel="stylesheet" type="text/css"/> | ||
<script type="text/javascript" src="search/searchdata.js"></script> | ||
<script type="text/javascript" src="search/search.js"></script> | ||
<link href="doxygen.css" rel="stylesheet" type="text/css" /> | ||
</head> | ||
<body> | ||
<div id="top"><!-- do not remove this div, it is closed by doxygen! --> | ||
<div id="titlearea"> | ||
<table cellspacing="0" cellpadding="0"> | ||
<tbody> | ||
<tr style="height: 56px;"> | ||
<td id="projectalign" style="padding-left: 0.5em;"> | ||
<div id="projectname">HCA | ||
</div> | ||
</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
</div> | ||
<!-- end header part --> | ||
<!-- Generated by Doxygen 1.8.13 --> | ||
<script type="text/javascript"> | ||
var searchBox = new SearchBox("searchBox", "search",false,'Search'); | ||
</script> | ||
<script type="text/javascript" src="menudata.js"></script> | ||
<script type="text/javascript" src="menu.js"></script> | ||
<script type="text/javascript"> | ||
$(function() { | ||
initMenu('',true,false,'search.php','Search'); | ||
$(document).ready(function() { init_search(); }); | ||
}); | ||
</script> | ||
<div id="main-nav"></div> | ||
</div><!-- top --> | ||
<div id="side-nav" class="ui-resizable side-nav-resizable"> | ||
<div id="nav-tree"> | ||
<div id="nav-tree-contents"> | ||
<div id="nav-sync" class="sync"></div> | ||
</div> | ||
</div> | ||
<div id="splitbar" style="-moz-user-select:none;" | ||
class="ui-resizable-handle"> | ||
</div> | ||
</div> | ||
<script type="text/javascript"> | ||
$(document).ready(function(){initNavTree('_data_8cpp.html','');}); | ||
</script> | ||
<div id="doc-content"> | ||
<!-- window showing the filter options --> | ||
<div id="MSearchSelectWindow" | ||
onmouseover="return searchBox.OnSearchSelectShow()" | ||
onmouseout="return searchBox.OnSearchSelectHide()" | ||
onkeydown="return searchBox.OnSearchSelectKey(event)"> | ||
</div> | ||
|
||
<!-- iframe showing the search results (closed by default) --> | ||
<div id="MSearchResultsWindow"> | ||
<iframe src="javascript:void(0)" frameborder="0" | ||
name="MSearchResults" id="MSearchResults"> | ||
</iframe> | ||
</div> | ||
|
||
<div class="header"> | ||
<div class="headertitle"> | ||
<div class="title">Data.cpp File Reference</div> </div> | ||
</div><!--header--> | ||
<div class="contents"> | ||
<div class="textblock"><code>#include "<a class="el" href="_data_8h_source.html">Data.h</a>"</code><br /> | ||
<code>#include <unistd.h></code><br /> | ||
<code>#include <sys/stat.h></code><br /> | ||
<code>#include <bitset></code><br /> | ||
<code>#include "<a class="el" href="_snp_kinship_8h_source.html">SnpKinship.h</a>"</code><br /> | ||
<code>#include "<a class="el" href="_given_kinship_8h_source.html">GivenKinship.h</a>"</code><br /> | ||
</div><div class="textblock"><div class="dynheader"> | ||
Include dependency graph for Data.cpp:</div> | ||
<div class="dyncontent"> | ||
<div class="center"><img src="_data_8cpp__incl.png" border="0" usemap="#_2_users_2danielruskin_2src_2hca-dev_2src_2_data_8cpp" alt=""/></div> | ||
<map name="_2_users_2danielruskin_2src_2hca-dev_2src_2_data_8cpp" id="_2_users_2danielruskin_2src_2hca-dev_2src_2_data_8cpp"> | ||
<area shape="rect" id="node2" href="_data_8h.html" title="Data.h" alt="" coords="638,94,697,119"/> | ||
<area shape="rect" id="node10" href="_given_kinship_8h.html" title="GivenKinship.h" alt="" coords="123,318,233,343"/> | ||
<area shape="rect" id="node18" href="_snp_kinship_8h.html" title="SnpKinship.h" alt="" coords="318,318,417,343"/> | ||
<area shape="rect" id="node9" href="_individual_data_set_8h.html" title="IndividualDataSet.h" alt="" coords="393,243,529,269"/> | ||
<area shape="rect" id="node16" href="_option_8h.html" title="Option.h" alt="" coords="612,393,683,418"/> | ||
<area shape="rect" id="node19" href="_scorer_8h.html" title="Scorer.h" alt="" coords="525,169,595,194"/> | ||
<area shape="rect" id="node15" href="_individual_data_8h.html" title="IndividualData.h" alt="" coords="471,393,587,418"/> | ||
<area shape="rect" id="node12" href="_kinship_8h.html" title="Kinship.h" alt="" coords="371,393,447,418"/> | ||
<area shape="rect" id="node13" href="util_8h.html" title="util.h" alt="" coords="145,393,193,418"/> | ||
</map> | ||
</div> | ||
</div></div><!-- contents --> | ||
</div><!-- doc-content --> | ||
<!-- start footer part --> | ||
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! --> | ||
<ul> | ||
<li class="navelem"><a class="el" href="dir_d522931ffa1371640980b621734a4381.html">Users</a></li><li class="navelem"><a class="el" href="dir_60a4da173fca05b90d61823adfb03c66.html">danielruskin</a></li><li class="navelem"><a class="el" href="dir_df4e26f2ffd11e56f607d6303fce6b11.html">src</a></li><li class="navelem"><a class="el" href="dir_b41b214f44b1d1e2573f3fffd563b69b.html">hca-dev</a></li><li class="navelem"><a class="el" href="dir_8a990246551b69b640aea526aed19dbb.html">src</a></li><li class="navelem"><a class="el" href="_data_8cpp.html">Data.cpp</a></li> | ||
<li class="footer">Generated by | ||
<a href="http://www.doxygen.org/index.html"> | ||
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li> | ||
</ul> | ||
</div> | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
<map id="/Users/danielruskin/src/hca-dev/src/Data.cpp" name="/Users/danielruskin/src/hca-dev/src/Data.cpp"> | ||
<area shape="rect" id="node2" href="$_data_8h.html" title="Data.h" alt="" coords="638,94,697,119"/> | ||
<area shape="rect" id="node10" href="$_given_kinship_8h.html" title="GivenKinship.h" alt="" coords="123,318,233,343"/> | ||
<area shape="rect" id="node18" href="$_snp_kinship_8h.html" title="SnpKinship.h" alt="" coords="318,318,417,343"/> | ||
<area shape="rect" id="node9" href="$_individual_data_set_8h.html" title="IndividualDataSet.h" alt="" coords="393,243,529,269"/> | ||
<area shape="rect" id="node16" href="$_option_8h.html" title="Option.h" alt="" coords="612,393,683,418"/> | ||
<area shape="rect" id="node19" href="$_scorer_8h.html" title="Scorer.h" alt="" coords="525,169,595,194"/> | ||
<area shape="rect" id="node15" href="$_individual_data_8h.html" title="IndividualData.h" alt="" coords="471,393,587,418"/> | ||
<area shape="rect" id="node12" href="$_kinship_8h.html" title="Kinship.h" alt="" coords="371,393,447,418"/> | ||
<area shape="rect" id="node13" href="$util_8h.html" title="util.h" alt="" coords="145,393,193,418"/> | ||
</map> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
b873e50246071bf6ccec9b6b36c9ae3c |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.