1

I have a shell script at /www/cgi-bin/test that I can access on my network at http://192.168.1.1/cgi-bin/test.

I am attempting to parse the query string, which should look like d=domain.com, and validate it against a regular expression:

#!/bin/sh

echo "Content-type: text/html"
echo ""

domain=${QUERY_STRING#d=}

if [[ ! $domain =~ [A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,}) ]]; then
    exit
fi

echo "Validation success!"

When this didn't work, I tried using another regex which I stole from here:

if [[ ! $domain =~ \
    ^(([a-zA-Z](-?[a-zA-Z0-9])*)\.)*[a-zA-Z](-?[a-zA-Z0-9])+\.[a-zA-Z]{2,}$ \
]]; then
    exit
fi

I can't get this regex to match either. In both cases, I tried escaping the curly braces (\{2,\}) according to the Advanced Bash-Scripting Guide, but that didn't make any difference.

In case it's relevant, the platform I'm on is OpenWrt 12.09.

Edit: I just realized my shell script might not support bash's [[ ... =~ ... ]] syntax. Unfortunately OpenWrt doesn't ship with bash.

Community
  • 1
  • 1
Big McLargeHuge
  • 14,841
  • 10
  • 80
  • 108

2 Answers2

2

Your regex looks fine but problem appears to be in your string manipulation of populating variable domain.

You need to replace:

domain=${QUERY_STRING#d*=}

by

domain=${QUERY_STRING#?d=}

Which will give you: domain.com in $domain

UPDATE: You have wrong shebang #!/bin/sh

You need to have:

#!/bin/bash

UPDATE 2: You can do this in sh (non bash) to get your desired value:

QUERY_STRING='d=domain.com'
domain=`echo "$QUERY_STRING" | awk -F'd=' '{print $2}'`
echo "$domain"
domain.com

To validate you can use egrep:

echo "$domain" | egrep -q '[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})'
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Wait, I don't see any difference. – thefourtheye May 17 '14 at 06:13
  • My mistake, bad copy/paste. Fixed it now – anubhava May 17 '14 at 06:15
  • Actually no, the revised code populates the variable with `d=domain.com`, which is not what I want. – Big McLargeHuge May 17 '14 at 06:22
  • That is not correct. `#?d=` will strip out `?d=` part and will leave `domain.com` as the variable's value. – anubhava May 17 '14 at 06:23
  • Run this command to test: `bash -c 'QUERY_STRING='?d=domain.com'; echo ${QUERY_STRING#?d=}'` – anubhava May 17 '14 at 06:24
  • I can verify the test command produces the desirable output, but for some reason it doesn't in the shell script. I replaced everything after `echo ""` with `echo ${QUERY_STRING#?d=}`, which printed `d=domain.com` on the page. However `echo ${QUERY_STRING#d*=}` printed `domain.com` without the `d=`. – Big McLargeHuge May 17 '14 at 06:36
  • Because you have `#!/bin/sh` shebang. You need to have `#!/bin/bash` – anubhava May 17 '14 at 06:39
  • OpenWrt doesn't ship with bash. It's all busybox. I guess that means I can't use those bash inline replacements, huh? Also my title is misleading. – Big McLargeHuge May 17 '14 at 06:40
  • @anubhava the `QUERY_STRING` doesn't contain the `?` mark. The definition of `QUERY_STRING` is the part of the url *after* the `?` mark. – janos May 17 '14 at 06:46
  • @janos: I just went by this line from OP `I am attempting to parse the query string, which should look like ?d=domain.com` – anubhava May 17 '14 at 06:48
  • My mistake. Still, the regex is not matching even after fixing the replacement. – Big McLargeHuge May 17 '14 at 06:52
1

If you don't have bash, and/or you cannot replace the shebang with #!/bin/bash, then the [[ expression might not work, or the pattern substitution with ${QUERY_STRING#pattern} might not work.

In that case, you can use awk to hit two birds with one stone:

if ! echo $QUERY_STRING | awk '$0 !~ /^d=[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})$/ {exit 1}'; then
    exit 1
fi

Or, if the pattern substitution works, and only the regex doesn't then you can use expr instead of awk:

domain=${QUERY_STRING#d=}
if ! expr $domain : '[A-Za-z0-9-]\{1,\}\(\.[A-Za-z0-9-]\{1,\}\)*\(\.[A-Za-z]\{2,\}\)$' >/dev/null; then
    exit 1
fi

In both cases, I used a bit more strict pattern for the d= in the QUERY_STRING. In both cases, be careful to end the pattern with a $, otherwise things like domain.com- would pass.

janos
  • 120,954
  • 29
  • 226
  • 236
  • Now that I've pulled all my hair out I can confirm the pattern substitution does indeed work while the `[[ ... =~ ... ]]` syntax does not. Your solution using `expr` did the trick, but please escape the plus signs. (So... much... ESCAPE.) Also thanks for the tip about ending with `$` - otherwise `google.c` passed. – Big McLargeHuge May 17 '14 at 08:05
  • In fact `+` and `\+` don't work on my system, so I used `\{1,\}` instead which works, and I just forgot it at one place, thanks for pointing out! In any case I recommend the `awk` solution instead of `expr`, if you have `awk` – janos May 17 '14 at 08:08