Update README.md

HealthInfoLab · Apr 12, 2024 · e297b92 · e297b92
1 parent 1e78133
commit e297b92
Showing 1 changed file with 17 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,18 @@
+This repository contains two branches: MicroVI and MicroVI-retraining. MicroVI contains the base code for implementing MicroVI 5-fold cross-validation 100 times. MicroVI-retraining contains the code for implementing MicroVI with data augmentation, i.e., after each run of the model is trained, a portion of the learned latent space is sampled and used to finetune the trained model.
 
+The run.py file within each project is the main file to run. A sample shell script is provided in myjob.sh. The data for each project is provided within the NEW_DATASETS folder, which is subdivided into POMP (aka M-DIET, our dataset used for linear regression) and DOMA (aka M-AGE, our dataset used for classification/logistic regression).
+
+To choose which setting to run (i.e., linear regression or classification), modify the first ten lines of the main function within run.py:
+## Linear Regression - uncomment below:
+    # dataset = 'pomp'
+
+    ## Logistic Regression (classification) - uncomment below:
+    dataset = 'doma'
+
+    pct_supervised = 100 # Choose pct supervision from: 0, 5, 10, 25, 50, or 100
+    alpha_list = [1.0] # Choose weightage of supervision in loss function from: 0.1, 0.25, 0.5, 1.0
+    covariate_ablation = True # Choose whether to include or exclude covariates: if True, exclude covariates; if False, include covariates
+
+  That is, set the desired dataset, percent supervision, alpha value, and whether to include covariates.
+
+  run.py will create a nested set of Master_Results and dataset folders according to the settings above. This is where the results csv of the 100 CV runs will be stored.