0

I'm looking to take a string of the form:

Article Title Of Unknown Length by Author Name: some.url

And simply change it to:

ARTICLE TITLE OF UNKNOWN LENGTH by Author Name: some.url

I have tried various options that will successfully pick out the article title, such as

(^.*?by)

and will replace it with whatever I want. e.g. if I say

replace [(^.*?by)] with [test title]

the above becomes:

test title by Author Name: some.url

However, whenever I try to replace it with \U$1 it makes the whole string uppercase rather than just what matches the query.

What am I doing wrong? I am complete regex noob by the way, only started an hour ago, but any help would be hugely appreciated...

2 Answers2

2
$string = 'Article Title Of Unknown Length by Author Name: some.url';
$string =~ s/^(.*)(?= by )/\U\1\E/gi;
print $string; # ARTICLE TITLE OF UNKNOWN LENGTH by Author Name: some.url

EDIT>

/           search for
^           at start of string 
(.*)        match and capture a group of 0+(`*`) any character (`.`)
(?= by )    followed by literal " by " (`?=` is positive lookahead)
/           replace with
\U          start upper-casing
\1          the first captured group
\E          stop upper-casing
/           options
g           search globally
i           case insensitive
guido
  • 18,864
  • 6
  • 70
  • 95
  • Could you break that down a little for me? Sorry, I'm not a total code-phobe but I can't really follow what's going on there. – user1422622 May 29 '12 at 14:10
0

Note how guido included spaces around that "by" for this regex to work with things like "Messing With Abby by Abby: some.url". You may want to replace them with "\s"s if there are tabs and such to be found in your input. Not sure how yahoo-pipes work. Most likely replace [^(.*)(?= by )] with [\U$1] will do the trick. Anyway, the pattern ^(.*)( by )(.*)$ will match the whole input, broken into three parts, so it should be easy from here to figure out how to reconstruct what you need like replace [^(.*)( by )(.*)$] with [\U$1 by \$3] or some such.

Eugene Ryabtsev
  • 2,232
  • 1
  • 23
  • 37
  • This sort of works, the problem I have is when I break into those 3 bits, and then ask it to replace the string with $1$2$3 reconstructs it fine. If I then get it to do $1$2\\U$3 it gives me everything after the 'by' in caps, but if I do \\U$1$2$3 it gives me the whole thing in caps. I'm sure Guido's answer is great, but for a complete novice like me it's too hard for me decode without a quick breakdown. – user1422622 May 29 '12 at 14:02
  • guido's solution is written for perl, not yahoo-pipes, but includes the regex shat should work. It should translate to yahoo-pipes like `replace [^(.*)(?= by )] with [\\U$1]`. This matches (and therefore replaces) only part of the string and, if yahoo-pipes support positive lookahead, should work. – Eugene Ryabtsev May 30 '12 at 08:02