7

I've almost got the answer here, but I'm missing something and I hope someone here can help me out.

I need a regular expression that will match all but the first letter in each word in a sentence. Then I need to replace the matched letters with the correct number of asterisks. For example, if I have the following sentence:

There is an enormous apple tree in my backyard.

I need to get this result:

T**** i* a* e******* a**** t*** i* m* b*******.

I have managed to come up with an expression that almost does that:

(?<=(\b[A-Za-z]))([a-z]+)

Using the example sentence above, that expression gives me:

T* i* a* e* a* t* i* m* b*.

How do I get the right number of asterisks?

Thank you.

mahdaeng
  • 791
  • 4
  • 15
  • 25
  • 1
    Do you need to use regular expressions for any particular reason? Depending on the programming language you're writing in, you can use a substring with replacement to get the same effect – jerluc Jan 25 '11 at 05:54

4 Answers4

17

Try this:

\B[a-z]

\B is the opposite of \b - it matches where there is no word boundary - when we see a letter that is after another letter.

Your regex is replacing the whole tail of the word - [a-z]+, with a single asterisks. You should replace them one by one. If you want it to work, you should match a single letter, but check is has a word behind it (which is a little pointless, since you might as well check for a single letter (?<=[A-Za-z])[a-z]):

(?<=\b[A-Za-z]+)[a-z]

(note that the last regex has a variable length lookbehind, which isn't implemented in most regex flavors)

Kobi
  • 135,331
  • 41
  • 252
  • 292
  • 2
    The shortest regex here is probably `\B\w`, but `\w` adds upper case letters and underscores. – Kobi Jan 25 '11 at 06:09
  • 2
    `(?<=\b[A-Za-z]+)` won't work in any flavor but .NET and JGSoft. You had it right the first time. – Alan Moore Jan 25 '11 at 06:46
  • @Alan - good point. I've added a warning on that. Either way, I did say it was rather pointless `:)` – Kobi Jan 25 '11 at 06:55
  • And (depending on the regex flavor and whether you're planning on matching other characters than unaccented letters between a and z) you might want to use `\p{L}` or `[^\W\d_]` instead of `[a-z]`. – Tim Pietzcker Jan 25 '11 at 07:25
  • How would that work when you had something like "Jack's toothbrush"? – Nathan Arthur Feb 29 '12 at 00:37
  • 1
    @NathanArthur - That is a good question... The underlying question is even more difficult: What is a word? The pattern above assumes a word is made of alphanumeric characters, which is wrong. In fact, I do not believe I can solve that problem reliably with a simple pattern - there are just too many edge cases. Still, as for your question: In .Net, you can add an apostrophe to the pattern above: `(?<=\b[A-Za-z']+)[a-z]`. On other flavors I think `(?<=\B|\b')[a-z]` can work. Either way, it requires some thinking. – Kobi Feb 29 '12 at 07:15
2

You can try this

\B\w

this will replace all characters except for the first letter of every word

from this ==Hello==World== into ==H****==W****==

LW001
  • 2,452
  • 6
  • 27
  • 36
nulled
  • 21
  • 1
0

This is an old question. Adding an answer since the others don't seem to solve this problem completely or clearly. The simplest regular expression that handles this is /(\B[a-z])/g. This adds 'g' as a global flag, so the single character search will be repeated throughout the string.

string = "There is an enormous apple tree in my backyard."
answer = string.replace(/\B[a-z]/g, "*");

string = "There is an enormous apple tree in my backyard."
$("#stringDiv").text(string);

answer = string.replace(/\B[a-z]/g, "*");
$("#answerDiv").text(answer);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="stringDiv"></div>
<div id="answerDiv"></div>
DavidR
  • 41
  • 1
  • 5
0

Try this possibly:

(\w{1})\w*
jerluc
  • 4,186
  • 2
  • 25
  • 44