README.md

Centromere Transcription Factor RNAi screen
===========================================

Search for genes that affect centromere establishment.

Experiment background
---------------------

Centromeres are essential for cell division.
They recruit the proteins to which the spindle fibers attach,
and the spindle fibers segregate the sister chromatids.

Centromeres require a unique sub-unit protein
in their histone octamer called CENP-A.
After each cell division, the CENP-A histone proteins
need to replace the H2A-H2B dimer to reform the centromere.
This process is thought to require transcription.


Experiment summary
------------------

To study centromere formation
one needs to seed for the protein complex
that will incorporate CENP-A into histones,
replacing the original H2A-H2B histone sub-units.
The "seed" to trigger the events of this histone replacement
is a Drosophila specific protein called CAL1.
CAL1 is brought to the ectopic centromere site
using the LacO-LacI tethering system:
namely, several (32) LacO repeats
are inserted into a region of chromosome 3L
and LacI-CAL1 will be recruited to that region
due to the LacO-LacI affinity.
About 25% of the time, this system creates a new centromere.

The experiment searches for proteins involved
with centromere formation by knocking down known nuclear proteins,
which include transcription factors.
When ectopic formation dips significantly below the typical 25%,
we may have found a gene involved with centromere formation.

The data collected are fluorescent images of multi-well plates,
where each well corresponds to a protein being knocked down.

  - A red fluorescent protein labels the ectopic centromere location
    (of the LacO repeats).
  - A green fluorescent protein labels CENP-A.
  - DAPI blue fluorescence labels nuclear DNA.

Usage
-----

All programs are run using a single `Makefile`.
Type the `make help` to see a list of options:

``` sh
Usage: make [TARGET] ...

Targets:
all            (Default) Run full pipeline from image processing to plots.
help           Show this help.
z-projection   Generate maximum intensity projection images.
cellprofiler   Collect statistics about all images.
gui-cp         Interactively run CellProfiler.
gui-cpa        Interactively run CellProfiler Analyst.
stats          Find significant wells from cellprofiler measurements.
clean-all      Delete all output.
```

Run the `Makefile` in this directory to generate all results.

``` sh
make all
```

Depending

To tune CellProfiler's image processing,
it is helpful to save results in a separate directory.
One can clone this repository using a different directory name,
and link to the processed image set of the original:

``` sh
cd ..
git clone git@github.uconn.edu:MelloneLab/rnai-screen-tf.git rnai-screen-tf_20170314
cd rnai-screen-tf_20170314
rm -rf z_projection
ln -s ../../rnai-screen-tf/results/z_projection z_projection
```

A nice feature of Makefiles is the ability to overwrite any number of variables
by specifying it on the command-line in the general form `variable=value`.
For example, to use cellprofiler installed to your personal directory
by `pip install --user ...`,
you may specify the path to cellprofiler as:

``` sh
make CELLPROFILER=~/.local/bin/cellprofiler
```

Data processing
---------------

The raw input data consists of:

  1. Images of 5 plates, with 10 sites per well
	 and 3 z-slices per site ("April_16_2016.tar.xz").
  2. A spreadsheet mapping the proteins to the wells
     ("DRSC_TF_Library_Distribution.xls").

Below is the file listing of the "data" directory using `tree`:

```
data
├── April_14_2016 [11 entries exceeds filelimit, not opening dir]
├── April_14_2016.tar.xz
└── DRSC_TF_Library_Distribution.xls
```

The images are 19 GB in the xz compressed archive,
therefore it is download from FIXME_INSERT_DOI
to the data directory.

Broadly speaking this data is processed as follows:

  1. CellProfiler saves image statistics to a database.
  2. R scripts to save high confidence wells and generate plots.

CellProfiler 2 requires 2D image inputs,
therefore a Python script creates the z-projections.

CellProfiler segments the ectopic and CENP-A centromeres and
saves the statistics into an sqlite database.

The R-scripts read this CellProfiler generated database for their calculations
and plots.
	Centromere Transcription Factor RNAi screen
	===========================================

	Search for genes that affect centromere establishment.

	Experiment background
	---------------------

	Centromeres are essential for cell division.
	They recruit the proteins to which the spindle fibers attach,
	and the spindle fibers segregate the sister chromatids.

	Centromeres require a unique sub-unit protein
	in their histone octamer called CENP-A.
	After each cell division, the CENP-A histone proteins
	need to replace the H2A-H2B dimer to reform the centromere.
	This process is thought to require transcription.


	Experiment summary
	------------------

	To study centromere formation
	one needs to seed for the protein complex
	that will incorporate CENP-A into histones,
	replacing the original H2A-H2B histone sub-units.
	The "seed" to trigger the events of this histone replacement
	is a Drosophila specific protein called CAL1.
	CAL1 is brought to the ectopic centromere site
	using the LacO-LacI tethering system:
	namely, several (32) LacO repeats
	are inserted into a region of chromosome 3L
	and LacI-CAL1 will be recruited to that region
	due to the LacO-LacI affinity.
	About 25% of the time, this system creates a new centromere.

	The experiment searches for proteins involved
	with centromere formation by knocking down known nuclear proteins,
	which include transcription factors.
	When ectopic formation dips significantly below the typical 25%,
	we may have found a gene involved with centromere formation.

	The data collected are fluorescent images of multi-well plates,
	where each well corresponds to a protein being knocked down.

	- A red fluorescent protein labels the ectopic centromere location
	(of the LacO repeats).
	- A green fluorescent protein labels CENP-A.
	- DAPI blue fluorescence labels nuclear DNA.

	Usage
	-----

	All programs are run using a single `Makefile`.
	Type the `make help` to see a list of options:

	``` sh
	Usage: make [TARGET] ...

	Targets:
	all (Default) Run full pipeline from image processing to plots.
	help Show this help.
	z-projection Generate maximum intensity projection images.
	cellprofiler Collect statistics about all images.
	gui-cp Interactively run CellProfiler.
	gui-cpa Interactively run CellProfiler Analyst.
	stats Find significant wells from cellprofiler measurements.
	clean-all Delete all output.
	```

	Run the `Makefile` in this directory to generate all results.

	``` sh
	make all
	```

	Depending

	To tune CellProfiler's image processing,
	it is helpful to save results in a separate directory.
	One can clone this repository using a different directory name,
	and link to the processed image set of the original:

	``` sh
	cd ..
	git clone git@github.uconn.edu:MelloneLab/rnai-screen-tf.git rnai-screen-tf_20170314
	cd rnai-screen-tf_20170314
	rm -rf z_projection
	ln -s ../../rnai-screen-tf/results/z_projection z_projection
	```

	A nice feature of Makefiles is the ability to overwrite any number of variables
	by specifying it on the command-line in the general form `variable=value`.
	For example, to use cellprofiler installed to your personal directory
	by `pip install --user ...`,
	you may specify the path to cellprofiler as:

	``` sh
	make CELLPROFILER=~/.local/bin/cellprofiler
	```

	Data processing
	---------------

	The raw input data consists of:

	1. Images of 5 plates, with 10 sites per well
	and 3 z-slices per site ("April_16_2016.tar.xz").
	2. A spreadsheet mapping the proteins to the wells
	("DRSC_TF_Library_Distribution.xls").

	Below is the file listing of the "data" directory using `tree`:

	```
	data
	├── April_14_2016 [11 entries exceeds filelimit, not opening dir]
	├── April_14_2016.tar.xz
	└── DRSC_TF_Library_Distribution.xls
	```

	The images are 19 GB in the xz compressed archive,
	therefore it is download from FIXME_INSERT_DOI
	to the data directory.

	Broadly speaking this data is processed as follows:

	1. CellProfiler saves image statistics to a database.
	2. R scripts to save high confidence wells and generate plots.

	CellProfiler 2 requires 2D image inputs,
	therefore a Python script creates the z-projections.

	CellProfiler segments the ectopic and CENP-A centromeres and
	saves the statistics into an sqlite database.

	The R-scripts read this CellProfiler generated database for their calculations
	and plots.