0

supposing I have something like that:

echo "bLah BLaH blAH" | sed -r 's/([a-zA-Z ]+)/\L&; s/[a-z]/\u&/g'

Quite a typical use for sed to get a "crazy-case" string into mixed case (first letter uppercase, rest of letters lowercase)

However, this will always affect the WHOLE string. If I, for instance, want to parse "crazy" mp3 filenames in various flavors ($tracknr - $artist - $title vs. $artist - $tracknr - $title) things get way more complicated, because sometimes titles are in foreign languages like French and mixed case just looks BUTT-UGLY in French or Italian. That's why I only want to proceed until some delimiter is reached, e. g. space-dash-space.

Hence, I'd like to use combined 's/.../...' expressions to do things step by step. However, it would be nice to have a way to "store" subexpressions from PREVIOUS expressions, to make me able to use preserved sub-matches as source expressions for the next sed replace expression.

If you think that works OOTB anyhow, you're wrong. You simply CANNOT use '\1' syntax in the second expression after the semicolon to refer to the previous expression's subexpression (of course it works once you have defined a subexpression in the second expression itself, but this possibility not be considered now). In my case, is just unknown to the parser, and you'll get the error

sed: -e expression #1, char (xx): invalid reference \1 on `s' command's RHS

Is there anything implemented to perform that sort of thing?

ghoti
  • 45,319
  • 8
  • 65
  • 104
syntaxerror
  • 661
  • 2
  • 6
  • 24

4 Answers4

2

The Problem

You want to uppercase the first letter in each word.

Your Question Makes Your Life Harder Than Necessary

You can store text in the hold space or use sequential and nested expressions to perform multiple operations on a matching pattern. You might even be able to pull some shenanigans with the hold space to re-process lines. Past a certain level of complexity, though, the real question isn't "Can language X do this?" but rather "What language is optimized for this?"

If you want to do heavy text-munging with the canonical PCRE engine and track subexpressions through complex logic, Perl is a better option. Any Turing-complete language will do, but one of the backronyms for Perl is "Pathologically Eclectic Rubbish Lister" for a reason.

The Easy GNU sed Solution

You don't need all the complexity you're asking for. Some basic GNU sed extensions will do what you want.

echo "bLah BLaH blAH" |
sed -r 's/(\b[a-zA-Z ]+\b)/\L&/g; s/\b[a-zA-Z ]/\u&/g'

This produces the desired output of uppercasing the first character of each word:

Blah Blah Blah

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
2

Assuming @CodeGnome got it right, and what you want is

You want to uppercase the first letter in each word.

you can use this alternative (which still is GNU-ism, see \L \U):

sed 's;\(.\)\([^ ]*\) \?;\U\1\L\2 ;g'

your example:

$ echo "bLah BLaH blAH" | sed 's;\(.\)\([^ ]*\) \?;\U\1\L\2 ;g'
Blah Blah Blah

if you're ok going for other solutions apart from sed you can use awk and get away with GNU-isms (thanks to dualbus on IRC)

awk '{for(i=1;i<=NF;i++){$i=toupper(substr($i,1,1))tolower(substr($i,2))}}1'

example:

$ echo "bLah BLaH blAH" | awk '{for(i=1;i<=NF;i++){$i=toupper(substr($i,1,1))tolower(substr($i,2))}}1'
Blah Blah Blah
c00kiemon5ter
  • 16,994
  • 7
  • 46
  • 48
1

A Perl one-liner approach ;)

echo "bLah BLaH blAH" |
    perl -ne '@_ = map { ucfirst } split; print join " ", @_, $/'
BLah BLaH BlAH

That will works on any Unices I guess =)

I will decompose it :

perl         # ?! dunno =)
-n           # assume "while (<>) { ... }" loop around program
-e           # one line of program (several -e's allowed, omit programfile)
@_           # default array name
=            # what you expect
map          # take a list as argument, and perform modification. Return a list
{ ucfirst }  # modification on the list
split        # without argument, takes the current line (we use -n switch)
;            # end of the first instruction
print        # what you expect
join " ", @_ # join a space on the list
$/           # by default, a newline (see perldoc perlvar)
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • Yes, thanks but I think I am going to post another thread in perl section to get this done with "rename" (the command, not the perlfunc). It will contain a question about upper/lowercasing a subexpression. – syntaxerror Jun 12 '12 at 22:16
1

Or in awk, without the overhead of regexps:

[ghoti@pc ~]$ echo "bLah BLaH blAH" | awk 'BEGIN{RS=" ";ORS=RS} {print toupper(substr($0,1,1)) tolower(substr($0,2))}'
Blah Blah Blah
ghoti
  • 45,319
  • 8
  • 65
  • 104