documentation

After successful installation of the package use

$ htseq-clip -h

for a brief description of the functions available in htseq-clip. The available functions can be categorized into 4 different classes given below.

Prepare annotation

annotation

Flattens a given annotation file in GFF format to BED6 format

Arguments

  • -g/--gff GFF formatted annotation file, supports .gz files
  • -u/--geneid Gene id attribute in GFF file (default: gene_id)
  • -n/--genename Gene name attribute in GFF file (default: gene_name)
  • -t/--genetype Gene type attribute in GFF file (default: gene_type)
  • --splitExons This flag splits exons into components such as 5’ UTR, CDS and 3’ UTR
  • --unsorted Use this flag if the GFF file is unsorted
  • -o/--output Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to console

Note

The default values for --geneid, --genename and --genetype arguments follow gencode GFF format

Usage

$ htseq-clip annotation -h

createSlidingWindows

Create sliding windows from the flattened annotation file

Arguments

  • -i/--input Flattened annoation file, see annotation
  • -w/--windowSize Window size in number of base pairs for the sliding window (default: 50)
  • -s/--windowStep Window step size for sliding window (default: 20)
  • -o/--output Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to console

Usage

$ htseq-clip createSlidingWindows -h

mapToId

Extract “name” column from the annotation file and map the entries to unique id and print out in tab separated format

Arguments

  • -a/--annotation Flattened annotation file from annotation or sliding window file from createSlidingWindows
  • -o/--output Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to console

Usage

$ htseq-clip mapToId -h

Helper functions

createMatrix

Create R friendly output matrix file from count function output files

Arguments

  • -i/--inputFolder Folder name with output files from count function, see count
  • -b/--prefix Use files only with this given file name prefix (default: None)
  • -e/--postfix Use files only with this given file name postfix (default: None)
  • -o/--output Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to console

Warning

either --prefix or --postfix argument must be given

Usage

$ htseq-clip createMatrix -h

createMaxCountMatrix

Create R friendly output matrix file from crosslink_count_position_max column in count function output files. This file can be used to filter down the output file from createMatrix function during downstream statistical analysis.

Arguments

  • -i/--inputFolder Folder name with output files from count function, see count
  • -b/--prefix Use files only with this given file name prefix (default: None)
  • -e/--postfix Use files only with this given file name postfix (default: None)
  • -o/--output Output file name. If the file name is given with .gz suffix, it is gzipped. If no file name is given, output is print to console

Warning

either --prefix or --postfix argument must be given

Usage

$ htseq-clip createMatrix -h