We have hosted the application selectseq in order to run this application in our online workstations with Wine or directly.


Quick description about selectseq:

A command-line utility to manipulate biological sequences from a FASTA or FASTQ file. It can, given a list of identifiers, get only a subset of the sequences (or their complement, i.e., sequences NOT in the list). Can also get sequence number N only. Compressed sequences files are supported if readable by zcat.

Features:
  • collect only some sequences out of a large FASTA or FASTQ file
  • get sequence number N only, regardless of ID
  • complement mode: return all sequences that are NOT in the list of IDs
  • "matching" mode: choose which part (between | characters) of the ID should match
  • sequence names provided one per line in a text file (first word in line used, or whatever is given to the -k option)
  • the > and @ symbols are ignored if present in the beginning of IDs in the list (useful if using FASTA or FASTQ identifiers)
  • if only one sequence is needed, its ID can be given directly to the -l option (no need of a file)
  • add a suffix to IDs before searching (useful when IDs come from proteins that have _1 in the ID, but genes do not)
  • compressed sequence database files (-s) are supported
  • quite mode, output only important warnings and errors


Audience: Science/Research.
User interface: Command-line.
Programming Language: Perl.
Categories:
Bio-Informatics

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.