academic/pyCRAC/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

The pyCRAC package is a collection of python scripts to analyse high
throughput data generated by RNA-sequencing, especially of molecules
crosslinked by UV to an immunoprecipitated protein of interest (i.e.
data generated by CLIP or CRAC protocols).
It can be used to remove duplicate reads,tackles directional libraries
and reports sense and anti-sense hits.

Included is the pipeline used for the analysis of a group of CRAC data
sets.


References

Genome Biol. 2014 Jan 7;15(1):R8. doi: 10.1186/gb-2014-15-1-r8.
PAR-CLIP data indicate that Nrd1-Nab3-dependent transcription
termination regulates expression of hundreds of protein coding genes in
yeast. Webb S, Hector RD, Kudla G, Granneman S.

Nature Communications, 2017; DOI: 10.1038/s41467-017-00025-5
Kinetic CRAC uncovers a role for Nab3 in determining gene expression
profiles during stress. van Nues R, Schweikert G, de Leau   E, Selega
A, Langford A, Franklin R, Iosub I, Wadsworth P, Sanguinetti G,
Granneman S.

If you want to run the test suite after installation, see README.tests.


Note on the Crac pipelines:

Use the -h flag to get a detailed help menu.

The CRAC_pipeline_PE.py script needs to be run from the folder that
contains the fastq files

The barcode list file should contain two tab-separated columns in which
the first column is the barcode sequence and the second column is the
name of the experiment

The file containing the adapter sequences should be in the fasta format.

The chromosome_lengths file should contain two tab-separated columns in
which the first column has the chromosome name and the second the
chromosome length.