From 848673ad5262f273d8d4b6a380d9cec22ae9bdab Mon Sep 17 00:00:00 2001 From: Subrata Saha Date: Wed, 28 Sep 2016 11:06:22 -0400 Subject: [PATCH] Create README.md --- README.md | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..af13b02 --- /dev/null +++ b/README.md @@ -0,0 +1,89 @@ +POMP consists of two software packages i.e., (1) POMP-DETECT: detecting candidate splice junctions and +(2) POMP_PRUNE: pruning candidate junctions vis Support Vector Machine (SVM). + +To begin with user must consider +--------------------------------- +(1) Java 1.8.0_31 or higher. +(2) R version 3.1.1 or higher with "e1071" packages. +(3) bowtie and bowtie-build executables should be in the folder of POMP-DETECT. For + convenient those two executables are supplied with the package. +(4) gcc version 4.6.1 or above +(5) Should run in Linux machine with multiple processors. + +Install and run POMP-DETECT: +---------------------------- +>g++ -O3 contig_generator.cpp +>javac Preprocess.java Utilities.java CallBackTest.java CallBackTask.java +>java -Xmx10g -cp . Preprocess + +The output of POMP-DETECT is junctions_information.info residing in OUT/ folder.will be in OUT/ folder. + + +Install and run POMP-PRUNE: +---------------------------- + +POMP-PRUNE should be installed in OUT/ folder. OUT/ folder should be in POMP-DETECT folder. + +>javac Statistics.java +>java -Xmx5g -cp . Statistics + +A R script named "script.r" can be found in the directory of POMP-PRUNE. It will be used in R for +classification purpose. Please, change the first line of this script accordingly. Output of POMP-PRUNE +will be predicted_junctions_list_.txt + +------------------------------------------------------------------------------------------------ + +Properties file (properties.prop) of POMP-DETECT +------------------------------------------------- +(1) INDEX FOLDER PATH WITH FILE PREAMBLE +-This folder contains the Bowtie created index files from reference sequence where + preamble is prefix of the index file. Let X be the folder path and chr is the preamble. + The parameter should be written as X/chr + +(2), (3) and (4) files will be created by POMP. Please give suitable file names with paths. + +(5) FASTQ file name containg reads. + +(6) File will be created by POMP. Please give suitable file name with path. This file contains the coverage information. + +(7) Bowtie will align reads within given number of mismatches. + +(8) POMP will align unmapped reads within given number of mismatches. + +(9) Maximum number of alignment per read within reference. + +(10) Number of threads to be used by Bowtie. + +(11) Length of consensus. It depends on the length of the reads. If the length of a read is |r|, the consensus length +will be L = 2*|r| + x where x > 0, such that L will be divisible by 4. For an example if read length is 50, then L = 104. + +(12) Name of the reference sequence file without extension + +(13) At the very beginning pre-process must be "on". Later it should be turned "off". For an example, we have 24 chromosomes + in human genome. So, to detect genome wide splice events, POMP at first pre-process given reads. Then it will continue without + pre-processing the data. For example at the very first to detect splice events in chromosome 1 we must turn on pre-process. For chromosome 2 to 24 + pre-process must be turned off. + +(14) Should be between 1.2 - 1.5. + +(15) Search genome for gapped alignment with this length. + +(16) Should be between 10 - 15. + +(17) Folder for the sequences. + +(18) Overlap length between reads to be built representatives. + +(19) Overlap Hamming distance between reads. Should be 1 - 3. + +(20) Number of threads to be used by POMP. + +--------------------------------------------------------------------------------------------------- + +Properties file (properties.prop) of POMP-PRUNE +------------------------------------------------- +(1) Let sampling threshold is X. Then positive samples will be X times negative samples. + +(2) Number of random samples from the positive examples. + +(3) "On" means highly accurate but very reduced negative examples (recommended for large chromosome).