I'm trying to make pileup files using samtools
from two files, File1 and File2.
I have split up File1 and File2 by chromosome, resulting in having 44 files named following the format:
chr${c}.${TISSUE}_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY
where ${c} is a number between 1 and 22, and $TISSUE is either colon or muscle--22 chromosomes for colon, and 22 for muscle. I.e.; chr1.colon_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY chr2.colon_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY
.
.
.
chr22.colon_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY
chr1.muscle_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY
.
.
.
These files consist of two columns, the first just shows the chromosome number, and the second column is a position on that chromosome. I.e.;
head chr2.colon_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY
chr2 103977
chr2 112051
chr2 126199
chr2 146288
chr2 147797
chr2 147822
chr2 148548
chr2 148525
chr2 158189
chr2 158188
For each row in the file (for example, "chr2.colon_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY"
), I need to take the position, call it 'x', from column 2, and use it to get a range of a-b
, where a=x-5
and b=x+5
. I'll then plug those values into the following script:
samtools mpileup -f [REFERENCE GENOME] File1 File2 -r chr${c}:a-b
For example, suppose I'm looking at chromosome 2, position 103977 (row 1 above). Then my script would be
samtools mpileup -f [REFERENCE GENOME] File1 File2 -r chr2:103972-103982
So basically, it's a loop within a loop within a loop. Something like,
for t in $(colon, muscle)
do
for c in $seq (1 22)
do
for item (or maybe row?) in
chr${c}.${t}_run1_en2hic_PE1.bam.sorted.bam.breaks_COL1_and_COL2_ONLY
do
awk '{print $2}' | something something something
x= position in col 2, a=x-5 b=x+5
samtools mpileup -f [REFERENCE GENOME] File1 File2 -r chr${c}:a-b
done
done
done
...
Thanks in advance. I'm brand new to working with Linux and I have essentially no computer science training.