Data Wrangling & GAN
=================================

## R Dependencies

When using R I like using [RStudio](https://www.rstudio.com). I think it's the best IDE for R, and makes iterating on code very easy and quick. Within RStudio there is a package manager that can help you install the packages I have listed here:

* dyplr
* ggplot2

The following two packages are installed a little differently.
First install [Bioconductor](http://bioconductor.org/)
Then you may install [GEOQuery](http://genomicsclass.github.io/book/pages/GEOquery.html)

Those two packages are used in [get_gse_data.r](./mkdataset/get_gse_data.r), which can get any GSE, given the GSE ID and gene symbol column name.


## Python Dependencies

I used a virtual environment by [virtualenv](https://pypi.python.org/pypi/virtualenv). If you want to use it as well, I recommend [this installation tutorial](http://docs.python-guide.org/en/latest/dev/virtualenvs/).

> **NOTE**: Using a virtual environment doesn't allows you to use [matplotlib](http://matplotlib.org/faq/virtualenv_faq.html) directly. You need to map it to your system's copy of **matplotlib** because the graphics libraries to create window frames is closely tied to the operating system.

The prominent packages are:

* numpy
* Pandas
* Scikit-Learn
* TensorFlow
* Keras
* keras_adversarial

To install all the dependencies quickly and easily you should use [`pip`](https://pypi.python.org/pypi/pip/)

```bash
pip install -r requirements.txt
```

## How to Run any Script

Just navigate to the folder containing the script, and run it directly.

### R

If you're using RStudio, then all you need to do is **source** the script. There is a button for that in the top right corner of the editor window. Else from the command line:

```r
R CMD BATCH <name_of_script>
```

### Python

Just run it directly from the command line. Assuming that you environment is prepared, and you have all the dependencies, all you have to do is

```
python gan.py
```