Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
138 lines (105 sloc) 4.25 KB

Centromere Transcription Factor RNAi screen

Search for genes that affect centromere establishment.

Experiment background

Centromeres are essential for cell division. They recruit the proteins to which the spindle fibers attach, and the spindle fibers segregate the sister chromatids.

Centromeres require a unique sub-unit protein in their histone octamer called CENP-A. After each cell division, the CENP-A histone proteins need to replace the H2A-H2B dimer to reform the centromere. This process is thought to require transcription.

Experiment summary

To study centromere formation one needs to seed for the protein complex that will incorporate CENP-A into histones, replacing the original H2A-H2B histone sub-units. The "seed" to trigger the events of this histone replacement is a Drosophila specific protein called CAL1. CAL1 is brought to the ectopic centromere site using the LacO-LacI tethering system: namely, several (32) LacO repeats are inserted into a region of chromosome 3L and LacI-CAL1 will be recruited to that region due to the LacO-LacI affinity. About 25% of the time, this system creates a new centromere.

The experiment searches for proteins involved with centromere formation by knocking down known nuclear proteins, which include transcription factors. When ectopic formation dips significantly below the typical 25%, we may have found a gene involved with centromere formation.

The data collected are fluorescent images of multi-well plates, where each well corresponds to a protein being knocked down.

  • A red fluorescent protein labels the ectopic centromere location (of the LacO repeats).
  • A green fluorescent protein labels CENP-A.
  • DAPI blue fluorescence labels nuclear DNA.


All programs are run using a single Makefile. Type the make help to see a list of options:

Usage: make [TARGET] ...

all            (Default) Run full pipeline from image processing to plots.
help           Show this help.
z-projection   Generate maximum intensity projection images.
cellprofiler   Collect statistics about all images.
gui-cp         Interactively run CellProfiler.
gui-cpa        Interactively run CellProfiler Analyst.
stats          Find significant wells from cellprofiler measurements.
clean-all      Delete all output.

Run the Makefile in this directory to generate all results.

make all


To tune CellProfiler's image processing, it is helpful to save results in a separate directory. One can clone this repository using a different directory name, and link to the processed image set of the original:

cd ..
git clone rnai-screen-tf_20170314
cd rnai-screen-tf_20170314
rm -rf z_projection
ln -s ../../rnai-screen-tf/results/z_projection z_projection

A nice feature of Makefiles is the ability to overwrite any number of variables by specifying it on the command-line in the general form variable=value. For example, to use cellprofiler installed to your personal directory by pip install --user ..., you may specify the path to cellprofiler as:

make CELLPROFILER=~/.local/bin/cellprofiler

Data processing

The raw input data consists of:

  1. Images of 5 plates, with 10 sites per well and 3 z-slices per site ("April_16_2016.tar.xz").
  2. A spreadsheet mapping the proteins to the wells ("DRSC_TF_Library_Distribution.xls").

Below is the file listing of the "data" directory using tree:

├── April_14_2016 [11 entries exceeds filelimit, not opening dir]
├── April_14_2016.tar.xz
└── DRSC_TF_Library_Distribution.xls

The images are 19 GB in the xz compressed archive, therefore it is download from FIXME_INSERT_DOI to the data directory.

Broadly speaking this data is processed as follows:

  1. CellProfiler saves image statistics to a database.
  2. R scripts to save high confidence wells and generate plots.

CellProfiler 2 requires 2D image inputs, therefore a Python script creates the z-projections.

CellProfiler segments the ectopic and CENP-A centromeres and saves the statistics into an sqlite database.

The R-scripts read this CellProfiler generated database for their calculations and plots.

You can’t perform that action at this time.