diff --git a/README.md b/README.md index 8b13789..a23637d 100644 --- a/README.md +++ b/README.md @@ -1 +1,18 @@ +This repository contains two branches: MicroVI and MicroVI-retraining. MicroVI contains the base code for implementing MicroVI 5-fold cross-validation 100 times. MicroVI-retraining contains the code for implementing MicroVI with data augmentation, i.e., after each run of the model is trained, a portion of the learned latent space is sampled and used to finetune the trained model. +The run.py file within each project is the main file to run. A sample shell script is provided in myjob.sh. The data for each project is provided within the NEW_DATASETS folder, which is subdivided into POMP (aka M-DIET, our dataset used for linear regression) and DOMA (aka M-AGE, our dataset used for classification/logistic regression). + +To choose which setting to run (i.e., linear regression or classification), modify the first ten lines of the main function within run.py: +## Linear Regression - uncomment below: + # dataset = 'pomp' + + ## Logistic Regression (classification) - uncomment below: + dataset = 'doma' + + pct_supervised = 100 # Choose pct supervision from: 0, 5, 10, 25, 50, or 100 + alpha_list = [1.0] # Choose weightage of supervision in loss function from: 0.1, 0.25, 0.5, 1.0 + covariate_ablation = True # Choose whether to include or exclude covariates: if True, exclude covariates; if False, include covariates + + That is, set the desired dataset, percent supervision, alpha value, and whether to include covariates. + + run.py will create a nested set of Master_Results and dataset folders according to the settings above. This is where the results csv of the 100 CV runs will be stored.