Questions tagged [fastq]

FASTQ files are used in bioinformatics to store sequence information and sequencing quality scores.

FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.

[Wikipedia]

257 questions
2
votes
1 answer

Nanopore tools designed to analyze fastq file format?

I just received my first nanopore data set and was sent a fastq file. I was expecting a fast5 file, and now I'm not sure how to begin filtering the data. Most of the tools I've come across (NanoOK, poretools) deal with the fast5 format, although…
7tbear7
  • 21
  • 3
2
votes
1 answer

Counting and removing characters in different lines

I have DNA sequence data in FASTQ format, which takes a 4-line format per record: @sequence-header-information sequence + quality-scores Each character in the sequence line has a corresponding character in the quality score line. All the sequences…
ecologysarah
  • 43
  • 1
  • 7
2
votes
2 answers

Writing a script for large text file manipulation (iterative substitution of duplicated lines), weird bugs and very slow.

I am trying to write a script which takes a directory containing text files (384 of them) and modifies duplicate lines that have a specific format in order to make them not duplicates. In particular, I have files in which some lines begin with the…
Alon Gelber
  • 113
  • 7
2
votes
3 answers

Bash script to concatenate text files with specific substrings in filenames

Within a certain directory I have many directories containing a bunch of text files. I’m trying to write a script that concatenates only those files in each directory that have the string ‘R1’ in their filename into one file within that specific…
Alon Gelber
  • 113
  • 7
2
votes
5 answers

Python - Checking concordance between two huge text files

So, this one has been giving me a hard time! I am working with HUGE text files, and by huge I mean 100Gb+. Specifically, they are in the fastq format. This format is used for DNA sequencing data, and consists of records of four lines, something like…
soungalo
  • 1,106
  • 2
  • 19
  • 34
2
votes
2 answers

Concatenate Files In Order Linux Command

I just started learning to use command line. Hopefully this is not a dump question. I have the following files in my directory: L001_R1_001.fastq L002_R2_001.fastq L004_R1_001.fastq L005_R2_001.fastq L001_R2_001.fastq L003_R1_001.fastq…
user2883746
  • 21
  • 1
  • 2
2
votes
1 answer

Parallel sed with group capture

I have to process a big file, and have been reading about parallel command to try to use more than 1 core processor when using sed, sort and so on. So I first wanted to change first line of every four (because of naming conventions of this kind of…
2
votes
2 answers

Peek into stream of Popen pipeline in Python

Background: Python 2.6.6 on Linux. First part of a DNA sequence analysis pipeline. I want to read a possibly gzipped file from a mounted remote storage (LAN) and if it is gzipped; gunzip it to a stream (i.e. using gunzip FILENAME -c) and if the…
1
vote
1 answer

R list path command only returns some of the files, but not all

I'm working on analyzing some fastq files in R for 16s work. I have a previous script from someone that has successfully done this before, but when I did: path_1 <- "set to my WD" then went to get a list of the files in the path…
Laeanna
  • 11
  • 2
1
vote
1 answer

Nextflow Units file specified is not found. Please provide a valid file

I have a following nextflow script which runs a tool perf on all the split fastq files located in the below mentioned directory. When I run the script I get the following error: *Error executing process > 'perf (29)' Caused by: Process `perf (29)`…
AishwaryaKulkarni
  • 774
  • 1
  • 8
  • 19
1
vote
1 answer

How to align read to two SHORT reference sequences and see percentage that mapped to one or the other reference?

I have PCR-Amplified fastq files of a specific target region from several samples. For each sample, I want to know the percentage of reads that align better to reference sequence #1 or #2 posted below. How should I begin to tackle this question and…
1
vote
0 answers

Fastp can not open a file

I used fastp like this > cat test | while read id > do > name=`echo $id |awk '{print $1}'` > read1=`echo $id |awk '{print $2}'` > read2=`echo $id |awk '{print $3}'` > echo $name > echo $read1 > echo $read2 > fastp \ > …
Limbo
  • 31
  • 1
1
vote
1 answer

how to produce multiple readlength.tsv at once from multiple fastq files?

ı have 16 fastq files under the different directories to produce readlength.tsv seperately and ı have some script to produce readlength.tsv .this is the script that ı should use to produce readlength.tsv zcat ~/proje/project/name/fıle_fastq | paste…
pierogi
  • 25
  • 4
1
vote
1 answer

How to extract unique read IDs from a fastq file?

I want to extract all the unique read IDs in a fastq file and output the unique read IDs to a text file. (I have done the same task for bam files using the samtools but I don't know any tools that would handle fastq files.) for BAM files: samtools…
1
vote
1 answer

using printf to include both variable output and command

I am trying to get the number of reads for my fastq files, and I wanted the output to also include the name of my files. I've found a solution online that almost works, but still not getting the right output. Example: My file…
Rachel
  • 73
  • 7
1 2
3
17 18