Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Just put everything together for the appendum. The README should expl…
…ain mostly everything that it needs. All the data and scripts have been modified to use relative paths, rather than absolute. Using best configuration of the GAN. Created a requirement.txt for python dependency installing.
  • Loading branch information
rjm11010 committed Dec 19, 2017
1 parent 134901f commit 97c7939
Show file tree
Hide file tree
Showing 20 changed files with 972 additions and 175 deletions.
57 changes: 57 additions & 0 deletions README.md
@@ -0,0 +1,57 @@
Data Wrangling & GAN
=================================

## R Dependencies

When using R I like using [RStudio](https://www.rstudio.com). I think it's the best IDE for R, and makes iterating on code very easy and quick. Within RStudio there is a package manager that can help you install the packages I have listed here:

* dyplr
* ggplot2

The following two packages are installed a little differently.
First install [Bioconductor](http://bioconductor.org/)
Then you may install [GEOQuery](http://genomicsclass.github.io/book/pages/GEOquery.html)

Those two packages are used in [get_gse_data.r](./mkdataset/get_gse_data.r), which can get any GSE, given the GSE ID and gene symbol column name.


## Python Dependencies

I used a virtual environment by [virtualenv](https://pypi.python.org/pypi/virtualenv). If you want to use it as well, I recommend [this installation tutorial](http://docs.python-guide.org/en/latest/dev/virtualenvs/).

> **NOTE**: Using a virtual environment doesn't allows you to use [matplotlib](http://matplotlib.org/faq/virtualenv_faq.html) directly. You need to map it to your system's copy of **matplotlib** because the graphics libraries to create window frames is closely tied to the operating system.
The prominent packages are:

* numpy
* Pandas
* Scikit-Learn
* TensorFlow
* Keras
* keras_adversarial

To install all the dependencies quickly and easily you should use [`pip`](https://pypi.python.org/pypi/pip/)

```bash
pip install -r requirements.txt
```

## How to Run any Script

Just navigate to the folder containing the script, and run it directly.

### R

If you're using RStudio, then all you need to do is **source** the script. There is a button for that in the top right corner of the editor window. Else from the command line:

```r
R CMD BATCH <name_of_script>
```

### Python

Just run it directly from the command line. Assuming that you environment is prepared, and you have all the dependencies, all you have to do is

```
python gan.py
```
Binary file added gan/__pycache__/gan.cpython-35.pyc
Binary file not shown.
213 changes: 113 additions & 100 deletions gan/gan.py
Expand Up @@ -7,8 +7,8 @@ from sklearn.cross_validation import train_test_split
from sklearn.utils import resample
# import matplotlib.pyplot as plt
# Nuerual Net Building
from keras import layers
from keras.layers import Input, Dense, Dropout, InputLayer, Reshape
from keras import layers, initializers, regularizers
from keras.layers import Input, Dense, ActivityRegularization, InputLayer, Reshape, BatchNormalization, Flatten
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils.generic_utils import Progbar
Expand Down Expand Up @@ -70,101 +70,114 @@ all_data = pd.read_csv(os.path.join(base_path, file_name))
# Prepare Data
#---------------------------------------

just_values = all_data.iloc[:, 3:]
new_sample = permute_sample(just_values, 5)
real_dataset = make_3d_dataset(just_values, 5, 2)
print(real_dataset.shape)
print(real_dataset)

# # Prepare data
# x_train = all_data.iloc[:, np.arange(20)]
#
# # column_start_index_of_genes = 2
# # class_label_column_index = 1
# # features = all_data.iloc[:, np.arange(column_start_index_of_genes, df.shape[1])]
# # labels = all_data.iloc[:, class_label_column_index]
#
# # column_start_index_of_genes = 2
# # features = all_data.iloc[:, np.arange(20)]
# # labels = labeled_data.iloc[:, class_label_column_index]
#
# # x_train, x_test, y_train, y_test = train_test_split(features, labels, test_size=0.33, random_state=42)
#
#
# #################################################
# # Variables
# #################################################
#
# # Output
# output_base_path = './'
#
# #---------------------------------------
# # Network Variables
# #---------------------------------------
#
# input_dimension = x_train.shape[1] # Number of features (e.g. genes)
#
# gen_input_shape = (input_dimension,)
# discr_input_shape = (input_dimension,)
#
# epochs = 10
# batch_size = x_train.shape[0]
#
# # Build Generative model
# generative_model = Sequential()
# # generative_model.add(InputLayer(input_shape=gen_input_shape))
# generative_model.add(Dense(units=int(1.2*input_dimension), activation='relu', input_dim=input_dimension))
# generative_model.add(Dropout(rate=0.2, noise_shape=None, seed=15))
# generative_model.add(Dense(units=int(0.2*input_dimension), activation='relu'))
# generative_model.add(Dense(units=input_dimension, activation='relu'))
# generative_model.add(Reshape(discr_input_shape))
#
# # Build Discriminator model
# discriminator_model = Sequential()
# discriminator_model.add(InputLayer(input_shape=discr_input_shape))
# discriminator_model.add(Dense(units=int(1.2*input_dimension), activation='relu'))
# discriminator_model.add(Dropout(rate=0.2, noise_shape=None, seed=75))
# discriminator_model.add(Dense(units=int(0.2*input_dimension), activation='relu'))
# discriminator_model.add(Dense(units=1, activation='sigmoid'))
#
# # Build GAN
# gan = simple_gan(generative_model, discriminator_model, normal_latent_sampling((input_dimension, )))
# model = AdversarialModel(base_model=gan,
# player_params=[generative_model.trainable_weights,
# discriminator_model.trainable_weights],
# player_names=['generator', 'discriminator'])
# # Other optimizer to try AdversarialOptimizerAlternating
# model.adversarial_compile(adversarial_optimizer=AdversarialOptimizerSimultaneous(),
# player_optimizers=['adam', 'adam'], loss='binary_crossentropy')
#
# # Print Summary of Models
# generative_model.summary()
# discriminator_model.summary()
# gan.summary()
#
# # Train
# # gan_targets takes as inputs the # of samples
# training_record = model.fit(x=x_train, y=gan_targets(x_train.shape[0]), epochs=epochs,
# batch_size=batch_size)
#
# # Diplay plot of loss over training
# # plt.plot(history.history['player_0_loss'])
# # plt.plot(history.history['player_1_loss'])
# # plt.plot(history.history['loss'])
#
# # Predict (i.e. produce new samples)
# zsamples = np.random.normal(size=(1, input_dimension))
# pred = generative_model.predict(zsamples)
# print(pred)
#
# # Save new samples to file
# # new_samples = pd.DataFrame(pred)
# # new_samples.to_csv(os.path.join(output_base_path, 'new_samples.csv'))
#
# # # save training_record
# # df = pd.DataFrame(training_record.history)
# # df.to_csv(os.path.join(output_base_path, 'training_record.csv'))
# #
# # # save models
# # generator.save(os.path.join(output_base_path, 'generator.h5'))
# # discriminator.save(os.path.join(output_base_path, "discriminator.h5"))
just_values = all_data.iloc[1:, 3:]
x_train = make_3d_dataset(just_values, 300, 100)

#################################################
# Variables
#################################################

# Output
output_base_path = './'

#---------------------------------------
# Network Variables
#---------------------------------------

# Layer Input Output Shapes
gen_input_shape = (x_train.shape[1], x_train.shape[2])
discr_input_shape = (x_train.shape[1], x_train.shape[2])

gen_output_shape = discr_input_shape

# Generator Variables
weight_initializer = initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)
bias_initializer = initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None)
generator_regularizer = regularizers.l1(0.000000005)
descriminator_regularizer = regularizers.l2(0.0000000005)

# Training Varaibles
epochs = 2
batch_size = 1 # x_train.shape[0]
input_size = x_train.shape[1] * x_train.shape[2] * batch_size

# Build Generative model
generative_model = Sequential()
# generative_model.add(InputLayer(input_shape=gen_input_shape))
generative_model.add(InputLayer(input_shape=discr_input_shape))
generative_model.add(Flatten())
generative_model.add(Dense(units=int(1.5*input_size),
use_bias=True,
activation='relu',
kernel_initializer=weight_initializer,
bias_initializer=bias_initializer,
kernel_regularizer=generator_regularizer))
# generative_model.add(ActivityRegularization(l1=0.02))
generative_model.add(Dense(units=int(input_size),
activation='relu',
kernel_initializer=weight_initializer,
kernel_regularizer=generator_regularizer))
generative_model.add(Reshape(discr_input_shape))

# Build Discriminator model
discriminator_model = Sequential()
discriminator_model.add(InputLayer(input_shape=discr_input_shape))
discriminator_model.add(BatchNormalization(axis=1))
discriminator_model.add(Flatten())
discriminator_model.add(Dense(units=int(1.5*input_size),
use_bias=True,
activation='relu',
kernel_initializer=weight_initializer,
bias_initializer=bias_initializer,
kernel_regularizer=descriminator_regularizer))
# discriminator_model.add(ActivityRegularization(l2=0.02))
discriminator_model.add(Dense(units=int(0.2*input_size),
activation='relu',
kernel_initializer=weight_initializer,
bias_initializer=bias_initializer,
kernel_regularizer=descriminator_regularizer))
discriminator_model.add(Dense(units=1, activation='sigmoid'))

# Build GAN
gan = simple_gan(generative_model, discriminator_model, normal_latent_sampling(gen_input_shape))
model = AdversarialModel(base_model=gan,
player_params=[generative_model.trainable_weights,
discriminator_model.trainable_weights],
player_names=['generator', 'discriminator'])
# Other optimizer to try AdversarialOptimizerAlternating
model.adversarial_compile(adversarial_optimizer=AdversarialOptimizerSimultaneous(),
player_optimizers=['adam', 'adam'], loss='binary_crossentropy')

# Print Summary of Models
generative_model.summary()
discriminator_model.summary()
gan.summary()

# Train
# gan_targets takes as inputs the # of samples
training_record = model.fit(x=x_train,
y=gan_targets(x_train.shape[0]),
epochs=epochs,
batch_size=batch_size)

# Diplay plot of loss over training
# plt.plot(history.history['player_0_loss'])
# plt.plot(history.history['player_1_loss'])
# plt.plot(history.history['loss'])

# Predict (i.e. produce new samples)
zsamples = np.random.normal(size=(1, x_train.shape[1], x_train.shape[2]))
pred = generative_model.predict(zsamples)
print(pred)

# Save new samples to file
new_samples = pd.DataFrame(pred[0,:,:])
new_samples.to_csv(os.path.join(output_base_path, 'new_samples.csv'))

# save training_record
df = pd.DataFrame(training_record.history)
df.to_csv(os.path.join(output_base_path, 'training_record.csv'))

# save models
generative_model.save(os.path.join(output_base_path, 'generator.h5'))
discriminator_model.save(os.path.join(output_base_path, "discriminator.h5"))

0 comments on commit 97c7939

Please sign in to comment.