1

I am trying to write a script that uses agrep to loop through files in one document and match them against another document. I believe this might use a nested loop however, I am not completely sure. In the template document, I need for it to take one string and match it against other strings in another document then move to the next string and match it again

enter image description here

If unable to see images for some odd reason I have included the links at the bottom here as well. Also If you need me to explain more just let me know. This is my first post so I am not sure how this will be perceived or if I used the correct terminologies :)

Template agrep/highlighted- https://imgur.com/kJvySbW
Matching strings not highlighted- https://imgur.com/NHBlB2R

I have already looked on various websites regarding loops

#!/bin/bash
#agrep script
echo ${BASH_VERSION}


TemplateSpacers="/Users/kj/Documents/Research/Dr. Gage 
Research/Thesis/FastA files for AGREP 
test/Template/TA21_spacers.fasta"
MatchingSpacers="/Users/kj/Documents/Research/Dr. Gage 
Research/Thesis/FastA files for AGREP test/Matching/TA26_spacers.fasta"

for * in filename 

do 

agrep -3 * to file im comparing to  

#potentially may need to use nested loop but not sure 
Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
kjustin9
  • 13
  • 5
  • See `Example 12` here... https://www.linuxtechi.com/linux-grep-command-with-14-different-examples/ – Mark Setchell Apr 30 '19 at 15:49
  • Thank you so much, do you know how to put those commands into a nested loop so that it cycles through all the lines in the template document? – kjustin9 Apr 30 '19 at 16:01
  • I don't really understand your question, or your diagram with lots of lines on it. I **guess** you want to search for all the strings saved in a file in a bunch of files, for which you don't need a loop, you'd use `grep -f strings.txt file1.txt file2.txt ... fileN.txt` – Mark Setchell Apr 30 '19 at 16:08
  • So one of the reasons I am using agrep is because I want the error margin to be within 3 characters of the string. – kjustin9 Apr 30 '19 at 17:05
  • So I know this is a diffucilt question to ask, but what I am asking is to agrep -3 "spacer1(highlighted)" to every line in the other document, then go to agrep -3 "spacer2", then agrep -3 "spacer3", agrep -3 "spacer4"... until it reaches the bottom of the document. Did that clarify it a bit, then I want it to print all the matches it found for each round – kjustin9 Apr 30 '19 at 17:09
  • By the way these are fasta files – kjustin9 Apr 30 '19 at 17:10
  • I still don't understand your question. How many documents do you want to search in? What are the names of the documents you want to search in? – Mark Setchell Apr 30 '19 at 18:22
  • I only want to search in one other document, so I want to search the line highlighted on the left in the document on the right. Then move to the line below the highlighted one and search that one in the document on the right. If that makes sense – kjustin9 Apr 30 '19 at 19:08
  • So, you only want to search in one document, and you have one other file and you want to search for the first line from that file in the document, then the second line from that file in the same document, then the third line from that file in the same document? – Mark Setchell Apr 30 '19 at 19:13
  • Yes correct! And that’s where the idea for a loop came from – kjustin9 Apr 30 '19 at 19:20

1 Answers1

0

Ok, I get it now, I think. This should get you started.

#!/bin/bash

document="documentToSearchIn.txt"

grep -v spacer fileWithSearchStrings.txt | while read srchstr ; do
   echo "Searching for $srchstr in $document"
   echo agrep -3 "$srchstr" "$document"
done

If that looks correct, remove the echo before agrep and run again.


If, as you say in the comments, you want to store the script somewhere else, say in $HOME/bin, you can do this:

mkdir $HOME/bin

Save the script above as $HOME/bin/search. Now make it executable (only necessary one time) with:

chmod +x $HOME/bin/search

Now add $HOME/bin to your PATH. So, find the line starting:

export PATH=...

in your login profile, and change it to include the new directory:

export PATH=$PATH:$HOME/bin

Then start a new Terminal and you should be able to just run:

search

If you want to be able to specify the name of the strings file and the document to search in, you can change the code to this:

#!/bin/bash

# Pick up parameters, if supplied
#   1st param is name of file with strings to search for
#   2nd param is name of document to search in
str=${1:-""}
doc=${2:-""}

# Ensure name of strings file is valid
while : ; do
   [ -f "$str" ] && break
   read -p "Enter strings filename:" str
done

# Ensure name of document file is valid
while : ; do
   [ -f "$doc" ] && break
   read -p "Enter document name:" doc
done

echo "Search for strings from: $str, searching in document: $doc"

grep -v spacer "$str" | while read srchstr ; do
   echo "Searching for $str in $doc"
   echo agrep -3 "$str" "$doc"
done

Then you can run:

search path/to/file/with/strings path/to/document/to/search/in

or, if you run like this:

search

it will ask you for the 2 filenames.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Ok it looks correct, what in that code is making sure its searching one string at a time in the template document – kjustin9 Apr 30 '19 at 19:54
  • The `read` statement reads one line at a time from the file with the search strings in it. – Mark Setchell Apr 30 '19 at 20:02
  • Ok so I know this a noob question but where on my computer do I put the files so that the script can do it's thing, I am using VS code and am new to it so i am just trying to figure it out – kjustin9 Apr 30 '19 at 20:19
  • Save the script in your login directory (`$HOME`) as `search`, then start a Terminal and make the script executable, necessary just once, with `chmod +x $HOME/search`. Now you can change directory to wherever your files are with `cd wherever/your/files/are` and run the script with `$HOME/search`. – Mark Setchell Apr 30 '19 at 20:24
  • Ok this looks correct I am going to test while removing echo, however is there a way to just take the sequences and not have agrep search for "spacer1" and just search for the sequences – kjustin9 Apr 30 '19 at 21:14
  • I have updated the answer to ignore lines containing `spacer`. – Mark Setchell Apr 30 '19 at 21:22
  • Ok cool is there a way to make it print results in a text document? Would you just say print results.txt – kjustin9 Apr 30 '19 at 21:25
  • Save output: `your_script > your_script_output.txt`. –  Apr 30 '19 at 22:23
  • where does that go exactly? Im adding it to my code but not sure which line to put it on – kjustin9 May 01 '19 at 01:39
  • When you run your script, if you want the results in a file called `$HOME/results.txt` you can either use `$HOME/search > $HOME/results.txt` or, if you always want the results in that same place, change the last line inside the script to `done > $HOME/results.txt` and simply run with `$HOME/search` – Mark Setchell May 01 '19 at 06:13
  • Now if I wanted an interface where I can insert each file how would I go about doing that, just to make it cleaner and easier to use instead of going to directory where files are located – kjustin9 May 01 '19 at 13:14
  • Do you mean you want to be able to run `search path/to/document` or you want to be able to run `search path/to/search/strings path/to/document` or something else? – Mark Setchell May 01 '19 at 13:21
  • I just basically want to be able to choose the searchstrings file and the file to search in – kjustin9 May 01 '19 at 13:41
  • Also I have been running the search script in usr/local/bin is there a way to run it somewhere else in my home directory? – kjustin9 May 01 '19 at 13:44
  • I have updated my answer with the two new requests. – Mark Setchell May 01 '19 at 15:31
  • You're welcome. Good luck with your project! Remember to come back and ask a new question if you get stuck. – Mark Setchell May 01 '19 at 15:53