1

I'm trying to use a regular expression to select all of each word except the first character, much as @mahdaeng wanted to do here. The solution offered to his question was to use \B[a-z]. This works fine, except when a word contains some form of punctuation, such as "Jack's" and "merry-go-round". Is there a way to select the entire word including any contained punctuation? (Not including outside punctuation such as "? , ." etc.)

Community
  • 1
  • 1
Nathan Arthur
  • 8,287
  • 7
  • 55
  • 80

3 Answers3

3

If you can enumerate the acceptable in-word punctuation, you could just expand upon the answer you linked:

\B[a-zA-Z'-]+
Rusty Fausak
  • 7,355
  • 1
  • 27
  • 38
3

A regex really isn't necessary here, since you can just split your word on spaces and deal with each word accordingly. Since you don't mention an underlying language, here's an implementation in Perl:

use strict;
use warnings;

$_="Jack's merry-go-round revolves way too fast!";
my @words=split /\s+/;
foreach my $word(@words)
{
  my $stripped_word=substr($word,1);
  $stripped_word=~s/[^a-z]$//i; #stripping out end punctuation
  print "$stripped_word\n";
}

The output is:

ack's
erry-go-round
evolves
ay
oo
ast
BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
1
\B[^\s]+

(where ^\s means "not whitespace") should get you what you want assuming the words are whitespace-delimited. If they're also punctuation-delimited, you might need to enumerate the punctuation:

\B[^\s,.?!]+
David Moles
  • 48,006
  • 27
  • 136
  • 235
  • Wow! That's awesome! I'll have to remember this method. In my case, however, I think it would probably stay less hairy to delineate what is allowed instead of what isn't. – Nathan Arthur Feb 29 '12 at 01:13