If you find a bug or have additional requirement for fastp, please file an issue: simple usage support ultra-fast FASTQ-level deduplication.support reading from STDIN and writing to STDOUT.support long reads (data from PacBio / Nanopore devices).Two modes can be used, limiting the total split file number, or limitting the lines of each split file. split the output to multiple files (0001.R1.gz, 0002.R1.gz.) to support parallel processing.visualize quality control and filtering results on a single HTML page (like FASTQC but faster and more informative).report JSON format result for further interpreting.preprocess unique molecular identifier (UMI) enabled data, shift UMI to sequence name.Trim polyX in 3' ends to remove unwanted polyX tailing (i.e. trim polyG in 3' ends, which is commonly seen in NovaSeq/NextSeq data.correct mismatched base pairs in overlapped regions of paired end reads, if one base is with high quality while the other is with ultra low quality.Adapter sequences can be automatically detected, which means you don't have to input the adapter sequences to trim them. cut low quality bases for per read in its 5' and 3' by evaluating the mean quality from a sliding window (like Trimmomatic but faster).filter out bad reads (too low quality, too short, or too many N.).comprehensive quality profiling for both before and after filtering data (quality curves, base contents, KMER, Q20/Q30, GC Ratio, duplication, adapter contents.).splitting by limiting the lines of each file.unique molecular identifier (UMI) processing.split the output to multiple files for parallel processing.or download the latest prebuilt binary for Linux users. This tool is developed in C++ with multithreading supported to afford high performance. A tool designed to provide fast all-in-one preprocessing for FastQ files.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |