SplicedJunctionsDiscovery/README.md at 297c48916668d81313a18199a332015838e02c2d · sus11005/SplicedJunctionsDiscovery

POMP consists of two software packages i.e., (1) POMP-DETECT: detecting candidate splice junctions and (2) POMP_PRUNE: pruning candidate junctions vis Support Vector Machine (SVM).

To begin with user must consider

(1) Java 1.8.0_31 or higher. (2) R version 3.1.1 or higher with "e1071" packages. (3) bowtie and bowtie-build executables should be in the folder of POMP-DETECT. For convenient those two executables are supplied with the package. (4) gcc version 4.6.1 or above (5) Should run in Linux machine with multiple processors.

Install and run POMP-DETECT:

g++ -O3 contig_generator.cpp javac Preprocess.java Utilities.java CallBackTest.java CallBackTask.java java -Xmx10g -cp . Preprocess

The output of POMP-DETECT is junctions_information.info residing in OUT/ folder.will be in OUT/ folder.

Install and run POMP-PRUNE:

POMP-PRUNE should be installed in OUT/ folder. OUT/ folder should be in POMP-DETECT folder.

javac Statistics.java java -Xmx5g -cp . Statistics

A R script named "script.r" can be found in the directory of POMP-PRUNE. It will be used in R for classification purpose. Please, change the first line of this script accordingly. Output of POMP-PRUNE will be predicted_junctions_list_<chromosome_name>.txt

Properties file (properties.prop) of POMP-DETECT

(1) INDEX FOLDER PATH WITH FILE PREAMBLE -This folder contains the Bowtie created index files from reference sequence where preamble is prefix of the index file. Let X be the folder path and chr is the preamble. The parameter should be written as X/chr

(2), (3) and (4) files will be created by POMP. Please give suitable file names with paths.

(5) FASTQ file name containg reads.

(6) File will be created by POMP. Please give suitable file name with path. This file contains the coverage information.

(7) Bowtie will align reads within given number of mismatches.

(8) POMP will align unmapped reads within given number of mismatches.

(9) Maximum number of alignment per read within reference.

(10) Number of threads to be used by Bowtie.

(11) Length of consensus. It depends on the length of the reads. If the length of a read is |r|, the consensus length will be L = 2*|r| + x where x > 0, such that L will be divisible by 4. For an example if read length is 50, then L = 104.

(12) Name of the reference sequence file without extension

(13) At the very beginning pre-process must be "on". Later it should be turned "off". For an example, we have 24 chromosomes in human genome. So, to detect genome wide splice events, POMP at first pre-process given reads. Then it will continue without pre-processing the data. For example at the very first to detect splice events in chromosome 1 we must turn on pre-process. For chromosome 2 to 24 pre-process must be turned off.

(14) Should be between 1.2 - 1.5.

(15) Search genome for gapped alignment with this length.

(16) Should be between 10 - 15.

(17) Folder for the sequences.

(18) Overlap length between reads to be built representatives.

(19) Overlap Hamming distance between reads. Should be 1 - 3.

(20) Number of threads to be used by POMP.

Properties file (properties.prop) of POMP-PRUNE

(1) Let sampling threshold is X. Then positive samples will be X times negative samples.

(2) Number of random samples from the positive examples.

(3) "On" means highly accurate but very reduced negative examples (recommended for large chromosome).