0

My regex should accept every palindrome for word length from 2 to 7 letters (without whitespaces). My regex looks like this:

^(\S?)(\S?)(\S?)\S?\3\2\1$

Can you explain to me what I did wrong during writing that regex and how can I fix it?

For me looks fine except one thing: it accepts words like: poj, kip, ret etc. I think its connected with the middle question mark but I'm not convinced.

  • 2
    I cannot reproduce https://regex101.com/r/EXptM6/1 It seems that words like `poj`, `kip` etc are rejected. – Andrej Kesely Nov 23 '22 at 20:51
  • 1
    In your Bash version, `\S` might not be supported, use `[^[:space:]]` instead. Also, define it in a variable, `rx='^([^[:space:]]?)([^[:space:]]?)([^[:space:]]?)[^[:space:]]?\3\2\1$'`, and then use `if [[ "$string" =~ $rx ]]; then ...` – Wiktor Stribiżew Nov 23 '22 at 20:56
  • Whoa... I got a segmentation fault from bash when trying out `re='(\S?)(\S?)(\S?)\S?\3\2\1$'` (note the missing `^` at the start). :-) – Ted Lyngmo Nov 23 '22 at 21:10
  • I get `re='^(\S?)(\S?)(\S?)\S?\3\2\1'` to work, but as soon as I add the `$` anchor, it matches those 3 OP mentioned too. – Ted Lyngmo Nov 23 '22 at 21:11
  • If guys say so, I'd like to ask you for one more thing. Can you make a simple text file which contains words: abba, aaa, cac, abbba, abc (every word in new line) and then use command: grep -Ec "^(\S?)(\S?)(\S?)\S?\3\2\1$" on that file? Even tho some of you say that the 3-letters non-palindrome words are rejected the output gives 5 (the correct number is 4). – Member of Pepper Nov 23 '22 at 21:22
  • 1
    @MemberofPepper I've tried your command: With `grep -Ec "^(\S?)(\S?)(\S?)\S?\3\2\1$" file.txt` I'm getting 1. Using `grep (GNU grep) 3.4` – Andrej Kesely Nov 23 '22 at 21:40
  • 1
    @MemberofPepper After converting the newlines to Unix-style newlines and using the `grep -Ec '^(\S)(\S?)(\S?)\S?\3\2\1$' a.txt` (note the `\S` at the beginning) I'm getting 4. – Andrej Kesely Nov 23 '22 at 21:52
  • please update the question with the additional details and sample code attempts, along with the (wrong) output generated by the code the and the (correct) expected output – markp-fuso Nov 23 '22 at 21:55

1 Answers1

1

If I understand this correctly, ^(\S?)(\S?)(\S?)\S?\3\2\1$ matches poj, kip and ret by capturing empty in the capture groups. Disclaimer: I'm not 100% sure this is the correct conclusion.

In order to mandate at least 2 characters, make the first capture mandatory:
^(\S)(\S?)(\S?)\S?\3\2\1$.

Example:

#!/bin/bash

teststrs=('aa' 'abccba' 'abba' 'abcba' 'abcdcba' 'abcdcbaaaaa'
          'ab' 'abc cba' 'poj' 'kip' 'ret')

re='^(\S)(\S?)(\S?)\S?\3\2\1$'

for str in "${teststrs[@]}"
do
    if [[ "$str" =~ $re ]]; then
        echo "$str matches"
    fi
done

Output:

aa matches
abccba matches
abba matches
abcba matches
abcdcba matches

You'll get a similar result for grep -E '^(\S)(\S?)(\S?)\S?\3\2\1$', given this input:

aa
abccba
abba
abcba
abcdcba
abcdcbaaaaa
ab
abca
abc cba
poj
kip
ret

You'll get this:

aa
abccba
abba
abcba
abcdcba

If you instead use your original, grep -E '^(\S?)(\S?)(\S?)\S?\3\2\1$', you'll get the three letter matches you talked about too, but using the grep --color option shows no actual matches on the four last ones:

aa
abccba
abba
abcba
abcdcba
abca
poj
kip
ret
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108