documentation¶
After successful installation of the package use
$ htseq-clip -h
for a brief description of the functions available in htseq-clip. The available functions can be categorized into 4 different classes given below.
Prepare annotation¶
annotation¶
Flattens a given annotation file in GFF format to BED6 format
Arguments
-g/--gffGFF formatted annotation file, supports .gz files-u/--geneidGene id attribute in GFF file (default: gene_id)-n/--genenameGene name attribute in GFF file (default: gene_name)-t/--genetypeGene type attribute in GFF file (default: gene_type)--splitExonsThis flag splits exons into components such as 5’ UTR, CDS and 3’ UTR--unsortedUse this flag if the GFF file is unsorted-o/--outputOutput file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleNote
The default values for
--geneid,--genenameand--genetypearguments follow gencode GFF formatUsage
$ htseq-clip annotation -h
createSlidingWindows¶
Create sliding windows from the flattened annotation file
Arguments
-i/--inputFlattened annoation file, see annotation-w/--windowSizeWindow size in number of base pairs for the sliding window (default: 50)-s/--windowStepWindow step size for sliding window (default: 20)-o/--outputOutput file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip createSlidingWindows -h
mapToId¶
Extract “name” column from the annotation file and map the entries to unique id and print out in tab separated format
Arguments
-a/--annotationFlattened annotation file from annotation or sliding window file from createSlidingWindows-o/--outputOutput file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip mapToId -h
Extract crosslink sites¶
extract¶
Extract crosslink sites, insertions or deletions
Arguments
-i/--inputInput .bam file. Input bam file must be co-ordinate sorted and indexed-e/--matefor paired end sequencing, select the read/mate to extract the crosslink sites from, accepted choices:1, 2
1use the first mate in pair2use the second mate in pair-s/--siteCrosslink site choices, accepted choices:s, i, d, m, e(default: e)
sstartsite,iinsertion siteddeletion sitemmiddle siteeend site-g/--offsetNumber of nucleotides to offset for crosslink sites (default: 0)--ignoreUse this flag to ignore crosslink sites outside of genome annotations-q/--minAlignmentQualityMinimum alignment quality (default: 10)-m/--minReadLengthMinimum read length (default: 0)-x/--maxReadLengthMaximum read length (default: 500)-l/--maxReadIntervalMaximum read interval length (default: 10000)--primaryUse this flag consider only primary alignments of multimapped reads-c/--coresNumber of cores to use for alignment parsing (default: 5)-t/--tmpPath to create and store temp files (default behavior: use parent folder from “–output” parameter)-o/--outputOutput file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip extract -hNote
To extract
1``st offset position of second mate (``2) start site (s) in eCLIP, use:--mate 2 --site s --offset -1
Count crosslink sites¶
count¶
Counts the number of crosslink/deletion/insertion sites
Arguments
-i/--inputExtracted crosslink sites, see extract-a/--annFlattened annotation file, see annotation OR sliding windows file, see createSlidingWindows--unstrandedcrosslink site counting is strand specific by default. Use this flag for non strand specific crosslink site counting-o/--outputOutput file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleUsage
$ htseq-clip count -h
Helper functions¶
createMatrix¶
Create R friendly output matrix file from count function output files
Arguments
-i/--inputFolderFolder name with output files from count function, see count-b/--prefixUse files only with this given file name prefix (default: None)-e/--postfixUse files only with this given file name postfix (default: None)-o/--outputOutput file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleWarning
either
--prefixor--postfixargument must be givenUsage
$ htseq-clip createMatrix -h
createMaxCountMatrix¶
Create R friendly output matrix file from
crosslink_count_position_maxcolumn in count function output files. This file can be used to filter down the output file fromcreateMatrixfunction during downstream statistical analysis.Arguments
-i/--inputFolderFolder name with output files from count function, see count-b/--prefixUse files only with this given file name prefix (default: None)-e/--postfixUse files only with this given file name postfix (default: None)-o/--outputOutput file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to consoleWarning
either
--prefixor--postfixargument must be givenUsage
$ htseq-clip createMatrix -h