documentation¶
After successful installation of the package use
$ htseq-clip -h
for a brief description of the functions available in htseq-clip. The available functions can be categorized into 4 different classes given below.
Prepare annotation¶
annotation¶
Flattens a given annotation file in GFF format to BED6 format
Arguments
-g/--gff
GFF formatted annotation file, supports .gz files-u/--geneid
Gene id attribute in GFF file (default: gene_id)-n/--genename
Gene name attribute in GFF file (default: gene_name)-t/--genetype
Gene type attribute in GFF file (default: gene_type)--splitExons
This flag splits exons into components such as 5’ UTR, CDS and 3’ UTR--unsorted
Use this flag if the GFF file is unsorted-o/--output
Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleNote
The default values for
--geneid
,--genename
and--genetype
arguments follow gencode GFF formatUsage
$ htseq-clip annotation -h
createSlidingWindows¶
Create sliding windows from the flattened annotation file
Arguments
-i/--input
Flattened annoation file, see annotation-w/--windowSize
Window size in number of base pairs for the sliding window (default: 50)-s/--windowStep
Window step size for sliding window (default: 20)-o/--output
Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip createSlidingWindows -h
mapToId¶
Extract “name” column from the annotation file and map the entries to unique id and print out in tab separated format
Arguments
-a/--annotation
Flattened annotation file from annotation or sliding window file from createSlidingWindows-o/--output
Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip mapToId -h
Extract crosslink sites¶
extract¶
Extract crosslink sites, insertions or deletions
Arguments
-i/--input
Input .bam file. Input bam file must be co-ordinate sorted and indexed-e/--mate
for paired end sequencing, select the read/mate to extract the crosslink sites from, accepted choices:1, 2
1
use the first mate in pair2
use the second mate in pair-s/--site
Crosslink site choices, accepted choices:s, i, d, m, e
(default: e)
s
startsite,i
insertion sited
deletion sitem
middle sitee
end site-g/--offset
Number of nucleotides to offset for crosslink sites (default: 0)--ignore
Use this flag to ignore crosslink sites outside of genome annotations-q/--minAlignmentQuality
Minimum alignment quality (default: 10)-m/--minReadLength
Minimum read length (default: 0)-x/--maxReadLength
Maximum read length (default: 500)-l/--maxReadInterval
Maximum read interval length (default: 10000)--primary
Use this flag consider only primary alignments of multimapped reads-c/--cores
Number of cores to use for alignment parsing (default: 5)-t/--tmp
Path to create and store temp files (default behavior: use parent folder from “–output” parameter)-o/--output
Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip extract -hNote
To extract
1``st offset position of second mate (``2
) start site (s
) in eCLIP, use:--mate 2 --site s --offset -1
Count crosslink sites¶
count¶
Counts the number of crosslink/deletion/insertion sites
Arguments
-i/--input
Extracted crosslink sites, see extract-a/--ann
Flattened annotation file, see annotation OR sliding windows file, see createSlidingWindows--unstranded
crosslink site counting is strand specific by default. Use this flag for non strand specific crosslink site counting-o/--output
Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip count -h
Helper functions¶
createMatrix¶
Create R friendly output matrix file from count function output files
Arguments
-i/--inputFolder
Folder name with output files from count function, see count-b/--prefix
Use files only with this given file name prefix (default: None)-e/--postfix
Use files only with this given file name postfix (default: None)-o/--output
Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleWarning
either
--prefix
or--postfix
argument must be givenUsage
$ htseq-clip createMatrix -h
createMaxCountMatrix¶
Create R friendly output matrix file from
crosslink_count_position_max
column in count function output files. This file can be used to filter down the output file fromcreateMatrix
function during downstream statistical analysis.Arguments
-i/--inputFolder
Folder name with output files from count function, see count-b/--prefix
Use files only with this given file name prefix (default: None)-e/--postfix
Use files only with this given file name postfix (default: None)-o/--output
Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleWarning
either
--prefix
or--postfix
argument must be givenUsage
$ htseq-clip createMatrix -h