13

I am looking for a regex that matches first word in a sentence excluding punctuation and white space. For example: "This" in "This is a sentence." and "First" in "First, I would like to say \"Hello!\""

This doesn't work:

"""([A-Z].*?(?=^[A-Za-z]))""".r
hippietrail
  • 15,848
  • 18
  • 99
  • 158
princess of persia
  • 2,222
  • 4
  • 26
  • 43
  • 3
    What flavour of regex is this? – Andrew Savinykh Feb 08 '13 at 06:41
  • Can the words have numbers in them? – endy Feb 08 '13 at 06:41
  • `([a-z]+)`, case-insensitive, should be sufficient for "non-tricky" English .. however, it will fail for non-latin characters quickly - so update to [use Unicode character classes](http://stackoverflow.com/a/5005122/166390) as appropriate! Note that this assumes an NFA regex (like Ruby :D) which will "match the first thing it can", but that works in favor here as there is no need to anchor or otherwise complex look-arounds. –  Feb 08 '13 at 06:57
  • Start of a sentence or start of a string, like in your examples? What is about e.g. "It's not a good idea!" or "Fürchterlichéß Beispiel." (just an example!)? – stema Feb 08 '13 at 08:54

5 Answers5

14
(?:^|(?:[.!?]\s))(\w+)

Will match the first word in every sentence.

http://rubular.com/r/rJtPbvUEwx

endy
  • 3,872
  • 5
  • 29
  • 43
  • 1
    "123 This doesnt work" as it will return "123" instead of "This" – konyak Mar 29 '13 at 19:09
  • That is because that is the first word. Like OP had asked. If you want it to match the first dictionary word then you should be looking someplace other then regex. – endy Mar 30 '13 at 17:13
  • 1
    Would you kindly explain what everything before (\w+) does, please? – Nubarke Oct 14 '15 at 11:44
5

This is an old thread but people might need this like I did. None of the above works if your sentence starts with one or more spaces. I did this to get the first (non empty) word in the sentence :

(?<=^[\s"']*)(\w+)

Explanation:

(?<=^[\s"']*) positive lookbehind in order to look for the start of the string, followed by zero or more spaces or punctuation characters (you can add more between the brackets), but do not include it in the match.
(\w+) the actual match of the word, which will be returned

The following words in the sentence are not matched as they do not satisfy the lookbehind.

Ibrahim Mezouar
  • 3,981
  • 1
  • 18
  • 22
4

You can use this regex: ^[^\s]+ or ^[^ ]+.

Keon-Woong Moon
  • 216
  • 4
  • 10
3

You can use this regex: ^\s*([a-zA-Z0-9]+).

The first word can be found at a captured group.

eyossi
  • 4,230
  • 22
  • 20
2
[a-z]+

This should be enough as it will get the first a-z characters (assuming case-insensitive).

In case it doesn't work, you could try [a-z]+\b, or even ^[a-z]\b, but the last one assumes that the string starts with the word.

Explosion Pills
  • 188,624
  • 52
  • 326
  • 405