MRGVAE
This repository contains our implementation of "A Deep Molecular Generative Model Based on Multi-Resolution Graph Variational Autoencoders"
Requirements
- Python(version >= 3.6)
- Pytorch(version >= 1.1.0)
- RDKit (version >= 2019.03)
- networkx (version >= 2.4 )
- numpy (version >= 1.18 )
We highly recommend you to use conda for package management.
Vocab Extraction
Fragments and interchangeable fragments are extracted by the following commond. (replace ChEMBL_Subset4Training.txt with your own molecular dataset)
unzip data/ChEMBL_Subset4Training.zip
python vocab_extract.py --mol_data ChEMBL_Subset4Training.txt --vocab_file chembl_vocab_frag_rd2.txt --bond_file chembl_vocab_bond_rd2.txt --radius 2 --ncpu 8
Molecule Processing
Molecules are converted into three-level hierarchical graphs using the fragment vocabulary from the first step.
mkdir chembl_train
python mol_preprocess.py --mol_data ChEMBL_Subset4Training.txt --vocab chembl_vocab_frag_rd2.txt --bond chembl_vocab_bond_rd2.txt --mol_folder chembl_train/ --radius 2 --ncpu 8
Model Training
Train the model with KL regularization, use
mkdir chembl_model
python train.py --mol_folder chembl_train --vocab chembl_vocab_frag_rd2.txt --bond chembl_vocab_bond_rd2.txt --model_folder chembl_model/ --epoch 10
Molecule Generation
To generate molecules, use
python generator.py --vocab chembl_vocab_frag_rd2.txt --bond chembl_vocab_bond_rd2.txt --model_folder chembl_model/model.0 --fragment_num 5 --num_decode 100 --batch_size 20 --gen_file 'gen.txt'
To generate molecules with fixed scaffold, use
python generatorwithscaffold.py --vocab chembl_vocab_frag_rd2.txt --bond chembl_vocab_bond_rd2.txt --model_folder chembl_model/model.0 --fragment_num 5 --num_decode 100 --batch_size 20 --radius 2 --gen_file 'gen.txt' --scaffold '*[CH:1]1CCn2cncc21'