0

I use UNIX fairly infrequently so I apologize if this seems like an easy question. I am trying to loop through subdirectories and files, then generate an output from the specific files that the loop grabs, then pipe an output to a file in another directory whos name will be identifiable from the input file. SO far I have:

 for file in /home/sub_directory1/samples/SSTC*/ 
      do
           samtools depth -r chr9:218026635-21994999 < $file > /home/sub_directory_2/level_2/${file}_out
      done

I was hoping to generate an output from file_1_novoalign.bam in sub_directory1/samples/SSTC*/ and to send that output to /home/sub_directory_2/level_2/ as an output file called file_1_novoalign_out.bam however it doesn't work - it says 'bash: /home/sub_directory_2/level_2/file_1_novoalign.bam.out: No such file or directory'.

I would ideally like to be able to strip off the '_novoalign.bam' part of the outfile and replace with '_out.txt'. I'm sure this will be easy for a regular unix user but I have searched and can't find a quick answer and don't really have time to spend ages searching. Thanks in advance for any suggestions building on the code I have so far or any alternate suggestions are welcome.

p.s. I don't have permission to write files to the directory containing the input folders

user3062260
  • 1,584
  • 4
  • 25
  • 53

1 Answers1

1

Beneath an explanation for filenames without spaces, keeping it simple.
When you want files, not directories, you should end your for-loop with * and not */. When you only want to process files ending with _novoalign.bam, you should tell this to unix. The easiest way is using sed for replacing a part of the string with sed. A dollar-sign is for the end of the string. The total script will be

OUTDIR=/home/sub_directory_2/level_2
for file in /home/sub_directory1/samples/SSTC/*_novoalign.bam; do
   echo Debug: Inputfile including path: ${file}
   OUTPUTFILE=$(basename $file | sed -e 's/_novoalign.bam$/_out.txt/')
   echo Debug: Outputfile without path: ${OUTPUTFILE}
   samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE}
done

Note 1: You can use parameter expansion like file=${fullfile##*/} to get the filename without path, but you will forget the syntax in one hour. Easier to remember are basename and dirname, but you still have to do some processing.

Note 2: When your script first changes the directory to /home/sub_directory_2/level_2 you can skip the basename call.
When all the files in the dir are to be processed, you can use the asterisk.
When all files have at most one underscore, you can use cut. You might want to add some error handling. When you want the STDERR from samtools in your outputfile, add 2>&1.
These will turn your script into

   OUTDIR=/home/sub_directory_2/level_2
   cd /home/sub_directory1/samples/SSTC
   for file in *; do
       echo Debug: Inputfile: ${file}
       OUTPUTFILE="$(basename $file | cut -d_ -f1)_out.txt"
       echo Debug: Outputfile: ${OUTPUTFILE}
       samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE} 2>&1
    done
Walter A
  • 19,067
  • 2
  • 23
  • 43