Skip to content
Making a GAN to generate new gene data from gene expression datasets for the Data Mining Course.
Branch: master
Clone or download

README.md

Data Wrangling & GAN

R Dependencies

When using R I like using RStudio. I think it's the best IDE for R, and makes iterating on code very easy and quick. Within RStudio there is a package manager that can help you install the packages I have listed here:

  • dyplr
  • ggplot2

The following two packages are installed a little differently. First install Bioconductor Then you may install GEOQuery

Those two packages are used in get_gse_data.r, which can get any GSE, given the GSE ID and gene symbol column name.

Python Dependencies

I used a virtual environment by virtualenv. If you want to use it as well, I recommend this installation tutorial.

NOTE: Using a virtual environment doesn't allows you to use matplotlib directly. You need to map it to your system's copy of matplotlib because the graphics libraries to create window frames is closely tied to the operating system.

The prominent packages are:

  • numpy
  • Pandas
  • Scikit-Learn
  • TensorFlow
  • Keras
  • keras_adversarial

To install all the dependencies quickly and easily you should use pip

pip install -r requirements.txt

How to Run any Script

Just navigate to the folder containing the script, and run it directly.

R

If you're using RStudio, then all you need to do is source the script. There is a button for that in the top right corner of the editor window. Else from the command line:

R CMD BATCH <name_of_script>

Python

Just run it directly from the command line. Assuming that you environment is prepared, and you have all the dependencies, all you have to do is

python gan.py
You can’t perform that action at this time.