2

I just started learning to use command line. Hopefully this is not a dump question.

I have the following files in my directory:

L001_R1_001.fastq 
L002_R2_001.fastq 
L004_R1_001.fastq 
L005_R2_001.fastq
L001_R2_001.fastq 
L003_R1_001.fastq 
L004_R2_001.fastq 
L006_R1_001.fastq
L002_R1_001.fastq 
L003_R2_001.fastq 
L005_R1_001.fastq 
L006_R2_001.fastq

You can see in the filenames, it's a mix of R1 and R2 and the numbers after L00 are not sorted.

I want to concatenate files in the order of filename, separately for R1 and R2 files.

If I do it manually, it will look like the following:

# for R1 files
cat L001_R1_001.fastq L002_R1_001.fastq L003_R1_001.fastq L004_R1_001.fastq L005_R1_001.fastq L006_R1_001.fastq > R1.fastq


# for R2 files
cat L001_R2_001.fastq L002_R2_001.fastq L003_R2_001.fastq L004_R2_001.fastq L005_R2_001.fastq L006_R2_001.fastq > R2.fastq

Could you please help me write a script that I can re-use later? Thank you!

Leandro Papasidero
  • 3,728
  • 1
  • 18
  • 33
user2883746
  • 21
  • 1
  • 2

2 Answers2

4
cat `ls -- *_R1_*.fastq | sort` >R1.fastq
cat `ls -- *_R2_*.fastq | sort` >R2.fastq

The | sort is not needed on most systems because ls sorts the files by name.

If the names of the files contain whitespace, then do this first:

IFS='
'
pts
  • 80,836
  • 20
  • 110
  • 183
1

Try using wildcard character *. It will automatically expand file names in alphabetical order.

cat L*_R1_001.fastq > R1.fastq
cat L*_R2_001.fastq > R2.fastq

EDIT:

If above command doesn't give desired sorting, try overriding locale setting using LC_ALL=C as sugested by Fredrik Pihl

LC_ALL=C cat L*_R1_001.fastq > R1.fastq
Community
  • 1
  • 1
jkshah
  • 11,387
  • 6
  • 35
  • 45
  • Perhaps hint at using `LC_ALL=C cat L*_R1_001.fastq > R1.fastq` so no funny things happens due to different locale – Fredrik Pihl Oct 15 '13 at 18:58
  • @FredrikPihl Thanks for your feedback. I have added your suggestion in ans. – jkshah Oct 15 '13 at 19:12
  • There is no guarantee that `*` sorts the filenames. Use `ls` for that. – pts Oct 15 '13 at 21:08
  • @Fredrik Pihl: `LC_ALL=C` has no effect on the sorting and matching in `*` because the shell expands `*` earlier than applying the environment variable change. – pts Oct 15 '13 at 21:09