2

agrep gives the error agrep: pattern too long (has > 32 chars) when there is a full stop(.) in the pattern string but not otherwise.

I want to compare(approximately) two strings, so I'm using agrep for that but its giving an error agrep: pattern too long (has > 32 chars) . But I found out that it doesn't give the error if there is no full stop in the pattern string(why?)

`echo "The quick brown fox jumped over the lazy dog." | agrep -c -4 "The quick brown fox jumped over the lazy dog."`

expected output is 1 instead it gives an error: agrep: pattern too long (has > 32 chars)

it works if I remove the full stop:

$ echo "The quick brown fox jumped over the lazy dog." | agrep -c -4 "The quick brown fox jumped over the lazy dog"  
1
Manik
  • 573
  • 1
  • 9
  • 28
  • From `man agrep`: *The limit of record length can be changed by modifying the parameter Max_record in agrep.h.* – Cyrus Aug 17 '19 at 06:45
  • @Cyrus it works if there is no full stop in the pattern string, no matter how large the string is. – Manik Aug 17 '19 at 06:48
  • Maybe you want to take a look at `tre-agrep`. – Cyrus Aug 17 '19 at 06:52
  • @Cyrus can you tell me how I can use tre-agrep on a string instead of a whole file. I tried `echo "the quick brown fox jumped over the lazy dog." | tre-agrep -4 "the quick brown fox jumped over the lazy dog` but it just echos the string – Manik Aug 17 '19 at 07:18
  • You mean that? `echo "The quick brown fox jumped over the lazy dog." | tre-agrep -c -4 "The quick brown fox jumped over the lazy dog."` – Cyrus Aug 17 '19 at 07:32
  • @Cyrus yes, the full stop is working now but `tre-agrep` doesn't pipe echo – Manik Aug 17 '19 at 07:44
  • It's unclear what output you want. – Cyrus Aug 17 '19 at 07:52
  • @Cyrus I want to compare two strings, approximately, the command should return True(1) if they match(approx) or False(0) if they don't. Normally `agrep` or `tre-agrep` compares the pattern with lines in a file but I want to compare with a string. ideally it should be something like `command -c -4 "string_to_compare" "pattern_string"` output 1 or 0 accordingly – Manik Aug 17 '19 at 08:26

3 Answers3

2

Approximate string matching / fuzzy string searching with two strings.

With agrep and bash:

if agrep -1 "abc" <<< "xbc" >/dev/null; then echo "match"; else echo "no match"; fi

or with tre-agrep and bash:

if tre-agrep -q -1 "abc" <<< "xbc"; then echo "match"; else echo "no match"; fi

Output in both cases:

match
Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • why does it show match for `if tre-agrep -q -1 "abc" <<< "xbcdefadf"; then echo "match"; else echo "no match"; fi`? the -1 maximum number of errors permitted is not applying – Manik Aug 17 '19 at 09:45
  • `abc` is with one error a substring of `xbcdefadf`. Match starts at position 0 and ends after 3 characters. See: `tre-agrep -1 --show-position "abc" <<< "xbcdefadf"` – Cyrus Aug 17 '19 at 10:09
  • Is there a way I could compare the two strings such that it looks for maximum number of errors in the whole string. i don't want to search for the string in the file but check if the string(line) matches approximately to the operand string(like if its just a typo or completely different line) i.e "the" should not match "the quick brown fox jumped over the lazy dog" the whole sentence should be approximately same – Manik Aug 17 '19 at 11:00
  • I recommend to start a new question, because this would go beyond the scope. – Cyrus Aug 17 '19 at 11:01
  • agrep -x: 'the pattern must match the whole line' does what I was looking for, but still it shows error pattern is too long(has > 32 chars). Can you tell me how to change the limit of pattern length using agrep.h? I found the /usr/bin/agrep file but don't know what to change. – Manik Aug 17 '19 at 15:23
0

The problem is that agrep is treating . as a meta character. To avoid that you must pass the option -k:

echo "The quick brown fox jumped over the lazy dog." | agrep -c -4 -k "The quick brown fox jumped over the lazy dog."

The man page on agrep says:

-k No symbol in the pattern is treated as a meta character.

desgua
  • 182
  • 1
  • 7
0

The limit of 32 characters has to do with the register width it is optimized for: 32 bit. See #define WORD 32 in agrep.h.

Switch to 64 bit seems to be more work than changing unsigned to unsigned long and doubling constants in agrep.h.