0

I have a string in Bash which may or may not start with any number of leading spaces, e.g.

"  foo bar baz"
" foo bar baz"
"foo bar baz"

I want to delete the first instance of "foo" from the string, and any leading spaces (there may not be any).

Following the advice from this question, I have tried the following:

str=" foo bar baz"
regex="[[:space:]]*foo"
echo "${str#$regex}"
echo "${str#[[:space:]]*foo}"

If str has one or more leading spaces, then it will return the result I want, which is _bar baz (underscore = leading space). If the string has no leading spaces, it won't do anything and will return foo bar baz. Both 'echoes' return the same results here.

My understanding is that using * after [[:space:]] should match zero or more instances of [[:space:]], not one or more. What am I missing or doing wrong here?

EDITS

@Raman - I've tried the following, and they also don't work:

echo "${str#[[:space:]]?foo}"
echo "${str#?([[:space:]])foo}"
echo "${str#*([[:space:]])foo}"

All three solutions will not delete 'foo' whether or not there is a trailing space. The only solution that kind of works is the one I posted with the asterisk - it will delete 'foo' when there is a trailing space, but not when there isn't.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
Lou
  • 2,200
  • 2
  • 33
  • 66
  • @RamanSailopal The [docs at GNU](https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html) say that `?` matches zero or one occurrence, and `*` matches zero or more occurrences. I tried it anyway and it didn't work - will update the question. – Lou Dec 01 '20 at 10:30
  • `and they also don't work:` enable extglob... – KamilCuk Dec 01 '20 at 10:35
  • 1
    What's wrong with just `${str#*foo}`? – oguz ismail Dec 01 '20 at 11:13
  • 1
    @oguzismail in the case `str=oguzfoo`, I guess op doesn't want a match. – gniourf_gniourf Dec 01 '20 at 15:31
  • @Lou: just to make sure this isn't an XY-problem: are you trying to split your string at spaces? if that's the case, you should instead use: `read -ra ary -d '' < <(printf '%s\0' "$str")`, and you'll have the tokens in the array `ary`. – gniourf_gniourf Dec 01 '20 at 15:35
  • @Lou, alright, good luck then! – gniourf_gniourf Dec 01 '20 at 16:50

3 Answers3

6

The best thing to do is to use parameter expansions (with extended globs) as follows:

# Make sure extglob is enabled
shopt -s extglob

str=" foo bar baz"
echo "${str##*([[:space:]])}"

This uses the extended glob *([[:space:]]), and the ## parameter expansion (greedy match).

Edit. Since your pattern has the suffix foo, you don't need to use greedy match:

echo "${str#*([[:space:]])foo}"

is enough.

Note. you can put foo in a variable too, but just be careful, you'll have to quote it:

pattern=foo
echo "${str#*([[:space:]])"$pattern"}"

will work. You have to quote it in case the expansion of pattern contains glob characters. For example when pattern="foo[1]".

gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
  • 1
    Thanks, this works! Why do you use the longest match instead of the shortest match out of interest? – Lou Dec 01 '20 at 10:55
  • @Lou: Otherwise it will only remove the first space. But now I realize you also have `foo` in your pattern, so `echo "${str#*([[:space:]])foo}"` would be enough. I've edited the answer (and also added a remark about putting the pattern in a variable). – gniourf_gniourf Dec 01 '20 at 11:09
  • Okay, I now seem to be having a different problem. My Bash code will only remove the substring in a variable when it's unquoted. E.g. if I have a `str="foo bar"`, and a `re="*([[:space:]])"foo"*([[:space:]])"`, then if I `echo "${str#$re}"`, it will happily delete `foo` and leave `_bar` (with leading space). But if I do `echo "${str#"$re"}"` with quotes around the regex expression, it won't. What's the deal with quoting Regex variables? I feel like I've missed something else here. – Lou Dec 08 '20 at 11:14
  • 1
    @Lou: 1. you don't need to quote the token `foo` in the `re`: this is enough: `re="*([[:space:]])foo*([[:space:]])"`. 2. The pattern _must not be quoted!_ actually, quotes are here to _prevent_ interpretation of the pattern as a pattern! hence: `echo "${str#$re}"` (without quotes for `$re`) is correct. – gniourf_gniourf Dec 08 '20 at 11:31
  • @Lou: so you only want to quote when there's a variable that might contain a pattern that you don't want to be interpreted as a pattern (e.g., something obtained from user input). Btw, do you want to remove that leading space? (if yes, you need to use greedy match with `"${str##$re}"`. And yes, quotes are a bit awkward to grok at the beginning… – gniourf_gniourf Dec 08 '20 at 11:33
  • In this case I actually want to preserve the leading space, so that works great, but it's good to know. Thanks for the clarification! I think I got confused when you said that you have to quote the pattern - looking back I realise you meant that the pattern variable should be quoted when instantiated, but not when used in the parameter expansion. Is that right? – Lou Dec 08 '20 at 12:12
  • 1
    @Lou: not quite :) The pattern variable should be quoted _in the parameter expansion **only if you don't want it to be interpreted as a pattern.**_ Here's an extremely simple example you can try: `str="foo bar"; pattern="*"`. (The fact we're using quotes here is irrelevant). Without quotes: `echo "${str#$pattern}"` you'll get `oo bar`, because `str` matches the _pattern_ `f*`. But with quotes: `echo "${str#"$pattern"}"`, you'll get `foo bar`, since `str` doesn't match the _verbatim content_ of `pattern` which is `f*`. – gniourf_gniourf Dec 08 '20 at 12:20
  • 1
    Ah that makes sense! Thanks for explaining it so clearly :) – Lou Dec 08 '20 at 12:35
3

My understanding is that using * after [[:space:]] should match zero or more instances of [[:space:]], not one or more

That's wrong.

What am I missing

That glob is not regex. In regex * matches zero or more preceding characters or groups. In glob * matches anything. It's the same as for filename expansion, think along ls [[:space:]]*foo.

You can use extended bash glob and do:

shopt -s extglob
str=' foo bar baz'
echo "${str#*([[:space:]])foo}"

To do anything more complicated, actually use a regex.

str=' foo bar baz';
[[ $str =~ ^[[:space:]]*foo(.*) ]];
echo "${BASH_REMATCH[1]}"
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • Ah brilliant! I did not know about the difference between regexes and globs before. Now it works, and the string trims correctly. Cheers :). – Lou Dec 01 '20 at 10:53
  • Out of interest though, why get the longest match from the start `##`? The shortest match `#` also works. – Lou Dec 01 '20 at 10:55
  • Och, I think you wanted to remove the spaces behind `foo` too. – KamilCuk Dec 01 '20 at 14:48
0

If what you want is a real regex match, you should be using a real regex match:

$: [[ "$str" =~ [[:space:]]*(.*) ]]
$: echo "[${BASH_REMATCH[1]}]"
[foo  bar       baz]

A more pedestrian approach would be to skip the quotes.

$: echo "[$str]"
[ foo bar baz]
$: new=$( echo $str )
$: echo "[$new]"
[foo bar baz]

Be aware that this opens you up to all sorts of messes in any more complex situations. It breaks if you wanted to preserve more than a single consecutive space between values, or a tab instead of just a quote, etc.

$: str=' foo  bar'$'\t''baz';
$: echo "[$str]"
[ foo  bar      baz]
$: new=$( echo $str )
$: echo "[$new]"
[foo bar baz]

It can cause other sorts of havoc too, but it's good to know the trick for the cases when it's appropriate.

Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
  • This isn't what I'm trying to do - I only want to trim leading spaces from the front of the string plus a given word, not from all words in the string. – Lou Dec 01 '20 at 16:12
  • Which is why I warned about it. The first match solution is the better approach – Paul Hodges Dec 01 '20 at 20:55