diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..aa00915 --- /dev/null +++ b/.gitignore @@ -0,0 +1,6 @@ +Default/ +*.o +.settings/ +.project +.cproject +lambda.txt diff --git a/README.md b/README.md new file mode 100644 index 0000000..5d8ac7b --- /dev/null +++ b/README.md @@ -0,0 +1,109 @@ +# Heritable Component Analysis Pipeline + +This repository represents a pipeline that performs three primary functions: + +1. Heritable Component Analysis + +2. Heritability Estimation + +3. Kinship Matrix Generation + +The pipeline accepts genotypic and phenotypic data, as well as covariates, and generates a highly-heritable trait. It can then estimate the heritability of that trait via the second function above. If the user already has a kinship matrix available (i.e. from GCTA), the program can accept this matrix. Alternatively, it can use genotypic data to generate the kinship matrix. + +# Usage + +Running the program without any options will trigger the help function, which will show all options that are available. The program takes a command (`kinship`, `h2r`, `hca`, or `score`), as well as a set of options for each command. + +Generally speaking, one can follow the below guidelines when using this program: + +1. Obtain the following data: phenotypic data, quantitative and discrete covariates (optional), kinship file (optional), genotypic data (required if kinship data is not present). Ensure that individual IDs are present in all of these files, and are common across each file. If an individual ID is missing from one file but present in another, it will not be included in analysis. + +2. Determine the parameters for your analysis. These are specified in the “heritable component analysis” help section. There are a few options that you must consider: “numSplits” and “lambdaVecFile”. “numSplits” controls the cross-validation functionality; if this is set to 1 (default), cross-validation will not be performed. If this is set to a value of 2 or more, cross-validation will be performed (see the cross-validation section). “lambdaVecFile” must point to a file with lambda values to use during the HCA process; each line must represent one lambda value. + +3. Determine if you would like to save output data to disk; if so, specify the “outDir” parameter (set this to “.” to save to the current directory). In addition, specify the “numThreads” option (default 2) to enable multi-thread functionality. + +4. Run HCA with the parameters chosen, and observe the output. Analysis may take a long time depending on the size of your dataset. + +Additional options are present, but are not required. Review the documentation and help output for more details. + +# HCA Cross-Validation and Lambda Tuning + +If “numSplits” is equal to one or is not set, the following process will be used for lambda tuning: + +1. HCA will be run with each lambda value. + +2. For each Lambda value, Heritability analysis will be run with the generated trait. + +3. The Lambda value that generates the most heritable trait trait will be saved as a result of the analysis. + +If the “numSplits” option is greater than one, the dataset will be split randomly into “numSplits” splits. The following process will then occur: + +1. The code will iterate through each lambda value. + +2. For each lambda value, the code will iterate through each split. On each iteration, the chosen split will be used as cross-validation data; all other data will be marked as training data. HCA will be run with the training data and the current lambda value. Once HCA has been run with all splits, the average heritability score for the current lambda value will be calculated. + +3. The lambda with the highest average heritability score will be considered the best. HCA and heritability estimation will be re-performed with this lambda value, on the full data set. This will be considered the final result set. + +**Important Note on Cross-Validation Functionality** + +Note that some datasets may be particularly sensitive to removing certain subjects. As specifying a numSplits value causes subjects to be removed during the training process, this may cause instability in the generated weights. If you notice unstable results with data spliting enabled, consider running the program without this functionality. + +# Outputs + +In addition to outputting data to the CLI, if an `outDir` parameter is specified, some data will also be saved. For all analysis, if a GRM was generated (non-pregiven), that GRM will be saved to "kinship.csv". The following analysis-specific data will also be saved: + +## HCA + +When HCA is run, the final weights will be saved to "trait_hca.csv". Indiviudals will be scored with these weights, and the output from the "Scoring" section will be saved. In addition, the output from the "Heritability Analysis" section will be saved for the final weights. + +## Heritability Analysis + +When heritability analysis is run, statistics regarding the analysis will be saved to "h2r_est.txt". + +## Scoring + +When scoring is run, the calculated scores will be saved to "scores.txt". + +## Kinship Generation + +When kinship generation is run, the kinship file will be saved to "kinship.txt". + +# Special Note on Heritability Estimation + +In the event that the variance-covariance matrix is non-invertible during heritability estimation, small values will be added to the matrix diagonals. This will generally resolve the invertibility error, but may adversely affect the results. A warning will be outputted in the event that the add-to-diagonals approach is used. + +# Documentation + +Further documentation is available in the `docs` folder. + +# Dependencies + +The Linux binary should work automatically on most Linux distributions. If not, compile it for your architecture. + +The OSX binary requires GCC version 6. Install it by running `brew install gcc6 --without-multilib` on your machine. + +# Compiling + +To compile on most Linux distributions and OSX, follow these steps: + +1. Install the Armadillo matrix library (download [here](http://arma.sourceforge.net/download.html) and run `cmake . && make && sudo make install`) + +2. Install the NLOpt optimization library (download [here](http://ab-initio.mit.edu/nlopt/) and run `./configure && make && sudo make install`) + +3. Install the OpenBLAS library [from source](https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide). Make sure to specify `DYNAMIC_ARCH=1` when running `make` and `make install`, if you plan on using the binary across multiple architectures. + +4. If on Linux, run (i.e. `sudo apt-get install liblapack-dev`). If on Mac, run `brew install gcc6 --without-multilib` and `brew install lapack`. + +6. Run `make --file Makefile_osx` or `make --file Makefile_linux`, depending on your platform. + +# References + +The following references were used while preparing this program: + +``` +Sun J, Kranzler HR, Bi J. Refining multivariate disease phenotypes for high chip heritability. BMC Medical Genomics. 2015;8(Suppl 3):S3. doi:10.1186/1755-8794-8-S3-S3. + +Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A Tool for Genome-wide Complex Trait Analysis. American Journal of Human Genetics. 2011;88(1):76-82. doi:10.1016/j.ajhg.2010.11.011. + +Yang J, Benyamin B, McEvoy BP, et al. Common SNPs explain a large proportion of heritability for human height. Nature genetics. 2010;42(7):565-569. doi:10.1038/ng.608. +``` diff --git a/bin/hca-linux b/bin/hca-linux new file mode 100755 index 0000000..a809387 Binary files /dev/null and b/bin/hca-linux differ diff --git a/bin/hca-osx b/bin/hca-osx new file mode 100755 index 0000000..0387985 Binary files /dev/null and b/bin/hca-osx differ diff --git a/docs/html/_data_8cpp.html b/docs/html/_data_8cpp.html new file mode 100644 index 0000000..d769b3a --- /dev/null +++ b/docs/html/_data_8cpp.html @@ -0,0 +1,120 @@ + + +
+ + + + +
+ HCA
+
+ |
+
#include "Data.h"
#include <unistd.h>
#include <sys/stat.h>
#include <bitset>
#include "SnpKinship.h"
#include "GivenKinship.h"
+ HCA
+
+ |
+
#include <iostream>
#include <armadillo>
#include <map>
#include <set>
#include <iterator>
#include <bitset>
#include "IndividualDataSet.h"
#include "Option.h"
#include "Scorer.h"
Go to the source code of this file.
++Classes | |
class | Data |
A class to load all relevant data. Responsible solely for loading data - not for reconciling missing individuals. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include "GivenKinship.h"
+ HCA
+
+ |
+
Go to the source code of this file.
++Classes | |
class | GivenKinship |
A class to load a pregiven kinship matrix. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include "IndividualData.h"
+ HCA
+
+ |
+
#include <iostream>
#include <armadillo>
#include <map>
Go to the source code of this file.
++Classes | |
class | IndividualData |
A class to represent data associated with a given individual. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include "IndividualDataSet.h"
+ HCA
+
+ |
+
#include "GivenKinship.h"
#include <armadillo>
#include <list>
#include "IndividualData.h"
#include "Option.h"
#include "SnpKinship.h"
Go to the source code of this file.
++Classes | |
class | IndividualDataSet |
A class to keep track of and reconcile IndividualData objects. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include <stdint.h>
#include <iostream>
#include <armadillo>
#include <map>
#include <set>
Go to the source code of this file.
++Classes | |
class | Kinship |
A class used to load or generate kinship data. This class must not be used directly; only subclasses should be used. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include <iostream>
#include <stdlib.h>
#include <stdint.h>
Go to the source code of this file.
++Classes | |
class | Option |
A class to load all user options. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include "RemlH2rEst.h"
+ HCA
+
+ |
+
Go to the source code of this file.
++Classes | |
class | RemlH2rEst |
A class to perform heritability analysis on a given dataset. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include "RemlHca.h"
+ HCA
+
+ |
+
#include "Option.h"
#include "Data.h"
#include "RemlH2rEst.h"
#include <nlopt.hpp>
#include <armadillo>
Go to the source code of this file.
++Classes | |
class | RemlHca |
A class to perform heritable component analysis on a given dataset. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include "Scorer.h"
+ HCA
+
+ |
+
#include "Option.h"
#include <list>
#include <armadillo>
#include "IndividualData.h"
#include "IndividualDataSet.h"
Go to the source code of this file.
++Classes | |
class | Scorer |
A class to score users based on their phenotypes and a generated weight. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
#include "SnpKinship.h"
+ HCA
+
+ |
+
Go to the source code of this file.
++Classes | |
class | SnpKinship |
A class to generate a kinship matrix from SNP data. More... | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
CData | A class to load all relevant data. Responsible solely for loading data - not for reconciling missing individuals |
CGivenKinship | A class to load a pregiven kinship matrix |
CH2rEst | A class used to perform heritability analysis |
CHca | A class used to perform heritable component analysis |
CIndividualData | A class to represent data associated with a given individual |
CIndividualDataSet | A class to keep track of and reconcile IndividualData objects |
CKinship | A class used to load or generate kinship data. This class must not be used directly; only subclasses should be used |
COption | A class to load all user options |
CRemlH2rEst | A class to perform heritability analysis on a given dataset |
CRemlHca | A class to perform heritable component analysis on a given dataset |
CScorer | A class to score users based on their phenotypes and a generated weight |
CSnpKinship | A class to generate a kinship matrix from SNP data |
CUtil |
+ HCA
+
+ |
+
This is the complete list of members for Data, including all inherited members.
+Data(Option &option) | Data | |
individualDataSet (defined in Data) | Data | |
lambdaVec (defined in Data) | Data | |
load(Option &option) | Data | |
scorers (defined in Data) | Data | |
trait (defined in Data) | Data | |
writeKinship(std::string outDir) | Data |
+ HCA
+
+ |
+
A class to load all relevant data. Responsible solely for loading data - not for reconciling missing individuals. + More...
+ +#include <Data.h>
+Public Member Functions | |
+ | Data (Option &option) |
A constructor. | |
void | load (Option &option) |
Loads data based on the provided option argument. More... | |
+void | writeKinship (std::string outDir) |
Writes the Kinship file to disk. | |
+Public Attributes | |
+IndividualDataSet | individualDataSet |
+arma::mat | trait |
+std::vector< double > | lambdaVec |
+std::vector< Scorer > | scorers |
A class to load all relevant data. Responsible solely for loading data - not for reconciling missing individuals.
+Loads data into the userDataSet, trait, lambdaVec, and scorers fields.
+void Data::load | +( | +Option & | +option | ) | ++ |
Loads data based on the provided option argument.
+Loads the following individual-specific data:
Loads the following non-individual-specific data:
+ HCA
+
+ |
+
This is the complete list of members for GivenKinship, including all inherited members.
+construct(std::string kinshipfile, std::string kinshipidfile) | GivenKinship | |
getGrm() | Kinship | |
getIdVec() | Kinship | |
idVec (defined in Kinship) | Kinship | protected |
Kinship() | Kinship | |
Kinship(arma::mat grm, std::map< std::string, int > ind, int nIndv, std::vector< std::string > idVec) | Kinship | |
m_grm (defined in Kinship) | Kinship | protected |
m_nIndv (defined in Kinship) | Kinship | protected |
+ HCA
+
+ |
+
A class to load a pregiven kinship matrix. + More...
+ +#include <GivenKinship.h>
+Public Member Functions | |
+void | construct (std::string kinshipfile, std::string kinshipidfile) |
Constructor. Loads and parses the kinship file into a matrix. | |
Public Member Functions inherited from Kinship | |
+ | Kinship () |
Constructor. | |
+ | Kinship (arma::mat grm, std::map< std::string, int > ind, int nIndv, std::vector< std::string > idVec) |
Constructor. | |
+arma::mat & | getGrm () |
Returns the generated or parsed Genetic Relationship Matrix (GRM). | |
std::vector< std::string > & | getIdVec () |
Returns the generated or parsed individual id vector. More... | |
+Additional Inherited Members | |
Protected Attributes inherited from Kinship | |
+arma::mat | m_grm |
+uint32_t | m_nIndv |
+std::vector< std::string > | idVec |
A class to load a pregiven kinship matrix.
+Used if ksSrc is pregiven.
+
+ HCA
+
+ |
+
A class used to perform heritability analysis. + More...
+ +#include <H2rEst.h>
A class used to perform heritability analysis.
+
+ HCA
+
+ |
+
A class used to perform heritable component analysis. + More...
+ +#include <Hca.h>
A class used to perform heritable component analysis.
+
+ HCA
+
+ |
+
This is the complete list of members for IndividualData, including all inherited members.
+addCCovData(std::vector< double > &cCovNew) | IndividualData | |
addPedData(std::vector< double > &pedNew) | IndividualData | |
addPhenData(std::vector< double > &phenNew) | IndividualData | |
addQCovData(std::vector< double > &qCovNew) | IndividualData | |
getCCov() | IndividualData | |
getNewGrmId() const | IndividualData | |
getNumMissingGenoValues() | IndividualData | |
getPed() | IndividualData | |
getPedId() const | IndividualData | |
getPhen() | IndividualData | |
getPregivenGrmId() const | IndividualData | |
getQCov() | IndividualData | |
getStrId() const | IndividualData | |
resetPhenData() | IndividualData | |
setNewGrmId(int idNew) | IndividualData | |
setNumMissingGenoValues(int numMissingGenoValuesNew) | IndividualData | |
setPedId(int idNew) | IndividualData | |
setPregivenGrmId(int idNew) | IndividualData | |
setStrId(std::string strIdNew) | IndividualData |
+ HCA
+
+ |
+
A class to represent data associated with a given individual. + More...
+ +#include <IndividualData.h>
+Public Member Functions | |
+std::vector< double > & | getPed () |
Returns PED data for this individual. | |
+std::vector< double > & | getPhen () |
Returns phenotypic data for this individual. | |
+std::vector< double > & | getQCov () |
Returns quantitative covariate data for this individual. | |
+std::vector< double > & | getCCov () |
Returns categorical covariate data for this individual. | |
+void | setStrId (std::string strIdNew) |
Sets the string ID for this individual (i.e. the ID provided in input files). | |
+std::string | getStrId () const |
Returns the string ID for this individual. | |
+void | setPregivenGrmId (int idNew) |
Sets this individual's position in the pregiven GRM. | |
+int | getPregivenGrmId () const |
Returns this individual's position in the pregiven GRM. | |
+void | setNewGrmId (int idNew) |
Sets this individual's position in the final GRM. | |
+int | getNewGrmId () const |
Returns this individual's position in the final GRM. | |
+void | setPedId (int idNew) |
Sets this individual's position in the genotypic data matrix. | |
+int | getPedId () const |
Returns this individual's position in the genotypic data matrix. | |
+void | setNumMissingGenoValues (int numMissingGenoValuesNew) |
Sets the number of missing genotypic values for this individual. | |
+int | getNumMissingGenoValues () |
Returns the number of missing genotypic values for this individual. | |
+void | addPedData (std::vector< double > &pedNew) |
Adds data from the PED file to this individual. | |
+void | resetPhenData () |
Deletes all phenotypic data on this individual. | |
+void | addPhenData (std::vector< double > &phenNew) |
Adds phenotypic data to this individual. | |
+void | addQCovData (std::vector< double > &qCovNew) |
Adds quantitative covariate data to this individual. | |
+void | addCCovData (std::vector< double > &cCovNew) |
Adds categorical covariate data to this individual. | |
A class to represent data associated with a given individual.
+IndividualData objects are generally kept track of with a IndividualDataSet.
+
+ HCA
+
+ |
+
This is the complete list of members for IndividualDataSet, including all inherited members.
+
+ HCA
+
+ |
+
A class to keep track of and reconcile IndividualData objects. + More...
+ +#include <IndividualDataSet.h>
+Public Member Functions | |
+void | addingPedData () |
Indicate that PED data will be added to this IndividualDataSet. | |
+void | addPedData (std::string id, int pedId, std::vector< double > &pedData) |
Add PED data to this IndividualDataSet, for the provided individual ID. | |
void | addingPhenData () |
Indicate that phenotypic data will be added to this IndividualDataSet. More... | |
+void | addPhenData (std::string id, std::vector< double > &phenData) |
Add phenotypic data to this IndividualDataSet, for the provided individual ID. | |
+void | addingQCovData () |
Indicate that quantitative covariate data will be added to this IndividualDataSet. | |
+void | addQCovData (std::string id, std::vector< double > &qCovData) |
Add quantitative covariate data to this IndividualDataSet, for the provided individual ID. | |
+void | addingCCovData () |
Indicate that categorical covariate data will be added to this IndividualDataSet. | |
+void | addCCovData (std::string id, std::vector< double > &cCovData) |
Add categorical covariate data to this IndividualDataSet, for the provided individual ID. | |
+void | addingChrData () |
Indicate that chromsome data will be added to this IndividualDataSet. | |
+void | setChrData (std::vector< int > &chrNew) |
Add chromosome data to this IndividualDataSet. | |
+void | settingGenoData () |
Indicate that genotypic data will be added to this IndividualDataSet. | |
+void | setGenoData (arma::mat &genoNew, std::vector< int > numMissingValues) |
Add genotypic data to this IndividualDataSet. | |
+void | setOption (Option &optionNew) |
Sets the Option object on this individualDataSet. | |
void | reconcile () |
Reconciles data by removing individuals with missing data. More... | |
+int | getNumSubjects () |
Returns the number of individuals currently being kept track of. | |
+bool | getCovAdded () |
Returns true if covariate data has been added. | |
+arma::mat & | getGeno () |
Returns the generated genotypic data matrix. | |
+std::vector< int > & | getChr () |
Returns the provided chromosome data. | |
+arma::mat & | getAN () |
Returns the AN matrix (only available if the matrix was generated from SNP data). | |
+arma::mat & | getGrm (int idx) |
Returns the generated genetic relationship matrix. | |
+arma::mat & | getCov (int idx) |
Returns the covariate data associated with the given index (based on split number). | |
+arma::mat & | getPhen (int idx) |
Returns the phenotypic data associated with the given index (based on split number). | |
+arma::mat & | getGrmWithoutSplit (int idx) |
Returns the GRM data with all individuals EXCEPT for those in the provided split number. | |
+arma::mat & | getCovWithoutSplit (int idx) |
Returns the covariates with all individuals EXCEPT for those in the provided split number. | |
+arma::mat & | getPhenWithoutSplit (int idx) |
Returns the phenotypic data with all individuals EXCEPT for those in the provided split number. | |
+arma::mat & | getPed () |
Returns the PED data associated with the given index (based on split number). | |
+std::vector< std::vector< std::string > > & | getSplitPartIds () |
Returns the IDs associated with each "split". | |
+std::list< std::reference_wrapper< IndividualData > > & | getIndividualList () |
Returns the IndividualData objects in a list data structure. | |
+std::map< std::string, IndividualData > & | getIndividualMap () |
Returns the IndividualData objects in a map data structure. | |
A class to keep track of and reconcile IndividualData objects.
+void IndividualDataSet::addingPhenData | +( | +) | ++ |
Indicate that phenotypic data will be added to this IndividualDataSet.
+Wipes any existing phenotypic data.
+ +void IndividualDataSet::reconcile | +( | +) | ++ |
Reconciles data by removing individuals with missing data.
+Once data is reconciled, also generates final matrices with all individuals that are still included.
+Generates several matrix formats, including:
+ HCA
+
+ |
+
This is the complete list of members for Kinship, including all inherited members.
+getGrm() | Kinship | |
getIdVec() | Kinship | |
idVec (defined in Kinship) | Kinship | protected |
Kinship() | Kinship | |
Kinship(arma::mat grm, std::map< std::string, int > ind, int nIndv, std::vector< std::string > idVec) | Kinship | |
m_grm (defined in Kinship) | Kinship | protected |
m_nIndv (defined in Kinship) | Kinship | protected |
+ HCA
+
+ |
+
A class used to load or generate kinship data. This class must not be used directly; only subclasses should be used. + More...
+ +#include <Kinship.h>
+Public Member Functions | |
+ | Kinship () |
Constructor. | |
+ | Kinship (arma::mat grm, std::map< std::string, int > ind, int nIndv, std::vector< std::string > idVec) |
Constructor. | |
+arma::mat & | getGrm () |
Returns the generated or parsed Genetic Relationship Matrix (GRM). | |
std::vector< std::string > & | getIdVec () |
Returns the generated or parsed individual id vector. More... | |
+Protected Attributes | |
+arma::mat | m_grm |
+uint32_t | m_nIndv |
+std::vector< std::string > | idVec |
A class used to load or generate kinship data. This class must not be used directly; only subclasses should be used.
+std::vector< std::string > & Kinship::getIdVec | +( | +) | ++ |
Returns the generated or parsed individual id vector.
+Maintains insertion order. The first individual in idVec represents the first individual in the GRM.
+ +
+ HCA
+
+ |
+
This is the complete list of members for Option, including all inherited members.
+getBFilePrefix() const | Option | |
getCCovFile() const | Option | |
getCmmd() const | Option | |
getKinshipFile() const | Option | |
getKinshipIDFile() const | Option | |
getKinshipSrc() const | Option | |
getLambdaVecFile() const | Option | |
getMafCutoff() const | Option | |
getMaxIterH2r() const | Option | |
getMaxIterHca() const | Option | |
getMissingGenoCutoff() const | Option | |
getNumSplits() const | Option | |
getNumThreads() const | Option | |
getOutDir() const | Option | |
getPhenFile() const | Option | |
getQCovFile() const | Option | |
getTraitFile() const | Option | |
parse(int argc, char **argv) | Option |
+ HCA
+
+ |
+
A class to load all user options. + More...
+ +#include <Option.h>
+Public Member Functions | |
+void | parse (int argc, char **argv) |
Parses all CLI options. | |
+uint16_t | getCmmd () const |
Returns the command the user specified; i.e., the analysis to run. | |
+std::string | getBFilePrefix () const |
Returns the prefix for the binary data filenames. | |
+std::string | getQCovFile () const |
Returns the quantitative covariate filename. | |
+std::string | getCCovFile () const |
Returns the categorical covariate filename. | |
+std::string | getKinshipFile () const |
Returns the kinship filename. | |
+std::string | getKinshipIDFile () const |
Returns the kinship ID filename. | |
+std::string | getPhenFile () const |
Returns the phenotype filename. | |
+std::string | getTraitFile () const |
Returns the trait filename. | |
+int | getKinshipSrc () const |
Returns the kinship source to use. | |
+double | getMafCutoff () const |
Returns the Minor Allele Frequency cutoff for GRM generation. | |
+double | getMissingGenoCutoff () const |
Returns the missing genotypic data cutoff for GRM generation. | |
+std::string | getOutDir () const |
Returns the directory for output files. | |
+int | getNumThreads () const |
Returns the number of threads to use. | |
+std::string | getLambdaVecFile () const |
Returns the lambda vector filename. | |
+int | getNumSplits () const |
Returns the number of splits to use during the cross-validation and lambda-tuning process. | |
+int | getMaxIterHca () const |
Returns the maximum number of iterations for HCA analysis (default: 200). | |
+int | getMaxIterH2r () const |
Returns the maximum number of iterations for heritability analysis (default: 200). | |
A class to load all user options.
+Note that this class does not load data: only options. The Data class is responsible for actually loading data.
+
+ HCA
+
+ |
+
This is the complete list of members for RemlH2rEst, including all inherited members.
+calcH2r() | RemlH2rEst | |
getFinalStats() | RemlH2rEst | |
RemlH2rEst(Option &newOption, arma::mat &newPhen, arma::mat &newGrm, arma::mat &newCov) | RemlH2rEst | |
saveOutput() | RemlH2rEst |
+ HCA
+
+ |
+
A class to perform heritability analysis on a given dataset. + More...
+ +#include <RemlH2rEst.h>
+Public Member Functions | |
RemlH2rEst (Option &newOption, arma::mat &newPhen, arma::mat &newGrm, arma::mat &newCov) | |
Constructor. More... | |
+void | calcH2r () |
Runs heritability analysis on the data. | |
+void | saveOutput () |
Saves output to a file. | |
+std::vector< std::vector< double > > & | getFinalStats () |
Returns the final stats from the REML analysis. | |
A class to perform heritability analysis on a given dataset.
+Uses the REML algorithm.
+RemlH2rEst::RemlH2rEst | +( | +Option & | +newOption, | +
+ | + | arma::mat & | +newPhen, | +
+ | + | arma::mat & | +newGrm, | +
+ | + | arma::mat & | +newCov | +
+ | ) | ++ |
Constructor.
+Raises an error if the provided phenotypic data has more than one column.
+ +
+ HCA
+
+ |
+
This is the complete list of members for RemlHca, including all inherited members.
+constraintFunction(const std::vector< double > &x, std::vector< double > &grad, void *data) | RemlHca | static |
getBestLambdaVal() | RemlHca | |
getBestTrainedW() | RemlHca | |
objectiveFunction(const std::vector< double > &x, std::vector< double > &grad, void *my_func_data) | RemlHca | static |
RemlHca(Option &newOption, Data &newData) | RemlHca | |
saveOutput() | RemlHca | |
train() | RemlHca |
+ HCA
+
+ |
+
A class to perform heritable component analysis on a given dataset. + More...
+ +#include <RemlHca.h>
+Public Member Functions | |
+ | RemlHca (Option &newOption, Data &newData) |
Constructor. | |
+void | train () |
Runs the REML algorithm to obtain highly-heritable traits. | |
+void | saveOutput () |
Saves output to a file. | |
+double | getBestLambdaVal () |
Return the best lambda value, found by the train function. | |
+arma::mat & | getBestTrainedW () |
Returns the weights generated by the train function. | |
+Static Public Member Functions | |
+static double | objectiveFunction (const std::vector< double > &x, std::vector< double > &grad, void *my_func_data) |
Objective function for the HCA minimization process. | |
static double | constraintFunction (const std::vector< double > &x, std::vector< double > &grad, void *data) |
Constraint function for the HCA minimization process. More... | |
A class to perform heritable component analysis on a given dataset.
+Uses the REML algorithm.
+
+
|
+ +static | +
Constraint function for the HCA minimization process.
+Constrained to be equal to zero.
+ +
+ HCA
+
+ |
+
+ HCA
+
+ |
+
A class to score users based on their phenotypes and a generated weight. + More...
+ +#include <Scorer.h>
+Public Member Functions | |
Scorer (Option &newOption, arma::mat &phen, arma::mat &trait) | |
Constructor. More... | |
+arma::mat & | getScore () |
Returns the generated scores. | |
+void | saveOutput (IndividualDataSet &individualDataSet) |
Saves output to a file. | |
A class to score users based on their phenotypes and a generated weight.
+Weights are typically generated via the HCA process.
+Scorer::Scorer | +( | +Option & | +newOption, | +
+ | + | arma::mat & | +phen, | +
+ | + | arma::mat & | +trait | +
+ | ) | ++ |
Constructor.
+Also generates scores inline.
+ +
+ HCA
+
+ |
+
This is the complete list of members for SnpKinship, including all inherited members.
+construct(Option &option, arma::mat &ped, std::vector< int > &chr, arma::mat &newGeno) | SnpKinship | |
getAN() | SnpKinship | |
getGrm() | Kinship | |
getIdVec() | Kinship | |
idVec (defined in Kinship) | Kinship | protected |
Kinship() | Kinship | |
Kinship(arma::mat grm, std::map< std::string, int > ind, int nIndv, std::vector< std::string > idVec) | Kinship | |
m_grm (defined in Kinship) | Kinship | protected |
m_nIndv (defined in Kinship) | Kinship | protected |
+ HCA
+
+ |
+
A class to generate a kinship matrix from SNP data. + More...
+ +#include <SnpKinship.h>
+Public Member Functions | |
void | construct (Option &option, arma::mat &ped, std::vector< int > &chr, arma::mat &newGeno) |
Constructor. More... | |
+arma::mat & | getAN () |
Returns AN matrix, which represents the number of SNPs that were used to calculate the GRM (on a per-individual basis). | |
Public Member Functions inherited from Kinship | |
+ | Kinship () |
Constructor. | |
+ | Kinship (arma::mat grm, std::map< std::string, int > ind, int nIndv, std::vector< std::string > idVec) |
Constructor. | |
+arma::mat & | getGrm () |
Returns the generated or parsed Genetic Relationship Matrix (GRM). | |
std::vector< std::string > & | getIdVec () |
Returns the generated or parsed individual id vector. More... | |
+Additional Inherited Members | |
Protected Attributes inherited from Kinship | |
+arma::mat | m_grm |
+uint32_t | m_nIndv |
+std::vector< std::string > | idVec |
A class to generate a kinship matrix from SNP data.
+void SnpKinship::construct | +( | +Option & | +option, | +
+ | + | arma::mat & | +ped, | +
+ | + | std::vector< int > & | +chr, | +
+ | + | arma::mat & | +newGeno | +
+ | ) | ++ |
Constructor.
+Also generates GRM inline.
+ +
+ HCA
+
+ |
+
This is the complete list of members for Util, including all inherited members.
+invertMatrix(arma::mat &m, std::string name, bool alreadyAdded=false) | Util | static |
invertMatrixSympd(arma::mat &m, std::string name, bool alreadyAdded=false) | Util | static |
parseToDouble(std::string data) | Util | static |
parseToInt(std::string data) | Util | static |
splitByDelimeter(std::string data, std::string delim) | Util | static |
+ HCA
+
+ |
+
+Static Public Member Functions | |
+static std::vector< std::string > | splitByDelimeter (std::string data, std::string delim) |
Splits a string by delimiter, and returns a vector swith the result. | |
+static double | parseToDouble (std::string data) |
Parses a string to a double, raising an error if the data is invalid. | |
+static int | parseToInt (std::string data) |
Parses a string to an integer, raising an error if the data is invalid. | |
+static arma::mat | invertMatrix (arma::mat &m, std::string name, bool alreadyAdded=false) |
Inverts the matrix via arma::inv, adding values to diagonals should inversion fail. If values are added to diagonals, outputs a warning with the "name" variable. | |
+static arma::mat | invertMatrixSympd (arma::mat &m, std::string name, bool alreadyAdded=false) |
Inverts the matrix via arma::inv_sympd, adding values to diagonals should inversion fail. If values are added to diagonals, outputs a warning with the "name" variable. | |
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
+ HCA
+
+ |
+
This page explains how to interpret the graphs that are generated by doxygen.
+Consider the following example:
This will result in the following graph:
+The boxes in the above graph have the following meaning:
+The arrows have the following meaning:
+
+ HCA
+
+ |
+
CData | A class to load all relevant data. Responsible solely for loading data - not for reconciling missing individuals |
▼CH2rEst | A class used to perform heritability analysis |
CRemlH2rEst | A class to perform heritability analysis on a given dataset |
▼CHca | A class used to perform heritable component analysis |
CRemlHca | A class to perform heritable component analysis on a given dataset |
CIndividualData | A class to represent data associated with a given individual |
CIndividualDataSet | A class to keep track of and reconcile IndividualData objects |
▼CKinship | A class used to load or generate kinship data. This class must not be used directly; only subclasses should be used |
CGivenKinship | A class to load a pregiven kinship matrix |
CSnpKinship | A class to generate a kinship matrix from SNP data |
COption | A class to load all user options |
CScorer | A class to score users based on their phenotypes and a generated weight |
CUtil |
+ HCA
+
+ |
+
This repository represents a pipeline that performs three primary functions:
+The pipeline accepts genotypic and phenotypic data, as well as covariates, and generates a highly-heritable trait. It can then estimate the heritability of that trait via the second function above. If the user already has a kinship matrix available (i.e. from GCTA), the program can accept this matrix. Alternatively, it can use genotypic data to generate the kinship matrix.
+Running the program without any options will trigger the help function, which will show all options that are available. The program takes a command (kinship
, h2r
, hca
, or score
), as well as a set of options for each command.
Generally speaking, one can follow the below guidelines when using this program:
+Additional options are present, but are not required. Review the documentation and help output for more details.
+If “numSplits” is equal to one, the following process will be used for lambda tuning:
+If the “numSplits” option is greater than one, the dataset will be split randomly into “numSplits” splits. The following process will then occur:
+In addition to outputting data to the CLI, if an outDir
parameter is specified, some data will also be saved. For all analysis, if a GRM was generated (non-pregiven), that GRM will be saved to "kinship.csv". The following analysis-specific data will also be saved:
When HCA is run, the final weights will be saved to "trait_hca.csv". Indiviudals will be scored with these weights, and the output from the "Scoring" section will be saved. In addition, the output from the "Heritability Analysis" section will be saved for the final weights.
+When heritability analysis is run, statistics regarding the analysis will be saved to "h2r_est.txt".
+When scoring is run, the calculated scores will be saved to "scores.txt".
+When kinship generation is run, the kinship file will be saved to "kinship.txt".
+In the event that the variance-covariance matrix is non-invertible during heritability estimation, small values will be added to the matrix diagonals. This will generally resolve the invertibility error, but may adversely affect the results. A warning will be outputted in the event that the add-to-diagonals approach is used.
+Further documentation is available in the docs
folder.
The Linux binary should work automatically on most Linux distributions. If not, compile it for your architecture.
+The OSX binary requires GCC version 6. Install it by running brew install gcc6 --without-multilib
on your machine.
To compile on most Linux distributions and OSX, follow these steps:
+cmake . && make && sudo make install
)./configure && make && sudo make install
)DYNAMIC_ARCH=1
when running make
and make install
, if you plan on using the binary across multiple architectures.sudo apt-get install liblapack-dev
). If on Mac, run brew install gcc6 --without-multilib
and brew install lapack
.make --file Makefile_osx
or make --file Makefile_linux
, depending on your platform.The following references were used while preparing this program:
+
+ HCA
+
+ |
+
+ + |
+ + |
+ + |
+ + |
+ + |
+ + |
+ + |
+ + |
+ + |
t |