Data Wrangling & GAN ================================= ## R Dependencies When using R I like using [RStudio](https://www.rstudio.com). I think it's the best IDE for R, and makes iterating on code very easy and quick. Within RStudio there is a package manager that can help you install the packages I have listed here: * dyplr * ggplot2 The following two packages are installed a little differently. First install [Bioconductor](http://bioconductor.org/) Then you may install [GEOQuery](http://genomicsclass.github.io/book/pages/GEOquery.html) Those two packages are used in [get_gse_data.r](./mkdataset/get_gse_data.r), which can get any GSE, given the GSE ID and gene symbol column name. ## Python Dependencies I used a virtual environment by [virtualenv](https://pypi.python.org/pypi/virtualenv). If you want to use it as well, I recommend [this installation tutorial](http://docs.python-guide.org/en/latest/dev/virtualenvs/). > **NOTE**: Using a virtual environment doesn't allows you to use [matplotlib](http://matplotlib.org/faq/virtualenv_faq.html) directly. You need to map it to your system's copy of **matplotlib** because the graphics libraries to create window frames is closely tied to the operating system. The prominent packages are: * numpy * Pandas * Scikit-Learn * TensorFlow * Keras * keras_adversarial To install all the dependencies quickly and easily you should use [`pip`](https://pypi.python.org/pypi/pip/) ```bash pip install -r requirements.txt ``` ## How to Run any Script Just navigate to the folder containing the script, and run it directly. ### R If you're using RStudio, then all you need to do is **source** the script. There is a button for that in the top right corner of the editor window. Else from the command line: ```r R CMD BATCH ``` ### Python Just run it directly from the command line. Assuming that you environment is prepared, and you have all the dependencies, all you have to do is ``` python gan.py ```