0

I need to use ripgrep to find a certain pattern. This will be a string describing a chemical reaction. The output of ripgrep looks something like this:

~ rg -U --only-matching --vimgrep --replace='$1' '```smiles\n(.+)\n```'

Testing Smiles.md:5:1:OC(=O)CCC(=O)O>CCO.[H+]>CCOC(=O)CCC(=O)OCC
Another Smiles.md:5:1:CO>BrP(Br)Br>CBr

Cool! But now I need to filter out these results using a Python script. So I can pipe these results to Python and read from stdin. But there's a problem: how can I guarantee the delimiter? If I write the Python script to take everything after the 3rd colon to be the input string, how can I guarantee that the file itself doesn't have a colon in the name? How can properly separate the filename from the match when I pipe to python?

Thanks,

oguz ismail
  • 1
  • 16
  • 47
  • 69
Thor Correia
  • 1,159
  • 1
  • 12
  • 20
  • 2
    Pass in only one file at a time controlled by Python so you will have 100% certainty on what the name of the file being processed is - any other way will result in the problem you have described. Alternatively, scan the input filenames to validate that there are no `:` in them before processing. – metatoaster Dec 02 '20 at 00:42

1 Answers1

0

How about adding a pre-check stage before the ripgrep execution something like:

dir="."        # assign to your target directory
for f in "$dir"/*.md; do
    if [[ $f = *:* ]]; then             # if the file contains ":"
        badlist+=("$f")                 # then add the filename to the badlist
    fi
done
if (( ${#badlist[@]} > 0 )); then       # if the badlist is not empty...
    echo "These file(s) contain a colon character. Rename them and run again."
    printf "    %s\n" "${badlist[@]}"
    exit
fi

rg -U --only-matching --vimgrep --replace='$1' '```smiles\n(.+)\n```' "$dir"/*.md | python-script

The code above immediately stops the execution before the main ripgrep stage if any of the files contain : in the filenames. If found, you can rename the filename(s) then.

tshiono
  • 21,248
  • 2
  • 14
  • 22