1

I'm building on a regular expression I found that works well for my use case. The purpose is to check for what I consider valid hashtags (I know there's a ton of hashtag regex posts on SO but this question is specific).

Here's the regex I'm using

/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,20})(\b|\r)/g

The only problem I'm having is I can't figure out how to check if the second character is a-z (the first character would be the hashtag). I only want the first character after the hashtag to be a-z or A-Z. No numbers or non-alphanumeric.

Any help much appreciated, I'm very novice when it comes to regular expressions.

silencedogood
  • 3,209
  • 1
  • 11
  • 36

3 Answers3

3

As I mentioned in the comments, you can replace [a-zA-Z0-9_]{1,20} with [a-zA-Z][a-zA-Z0-9_]{0,19} so that the first character is guaranteed to be a letter and then followed by 0 to 19 word characters (alphanumeric or underscore).

However, there are other unnecessary parts in your pattern. It appears that all you need is something like this:

/(?:^|\B)#[a-zA-Z][a-zA-Z0-9_]{0,19}\b/g

Demo.

Breakdown of (?:^|\B):

(?:         # Start of a non-capturing group (don't use a capturing group unless needed).
    ^       # Beginning of the string/line.
    |       # Alternation (OR).
    \B      # The opposite of `\b`. In other words, it makes sure that 
            # the `#` is not preceded by a word character.
)           # End of the non-capturing group.

Note: You may also replace [a-zA-Z0-9_] with \w.


References:

  • 1
    Frustratingly simple. I really need to get better at these things. Thanks. A breakdown of this `(?:^|\B)` would be really helpful. I'm guessing it's an `or` statement, saying the beginning of a line, or some kind of boundary... – silencedogood Oct 22 '19 at 16:12
1

The below should work.

(^|\B)#(?![0-9_]+\b)([a-zA-Z][a-zA-Z0-9_]{0,19})(\b|\r)

If you only want to accept two or more letter hashtags then change {0,19} with {1,19}.

You can test it here

Matt Croak
  • 2,788
  • 2
  • 17
  • 35
1

In your pattern you use (?![0-9_]+\b) which asserts that what is directly on the right is not a digit or an underscore and can match a lot of other characters as well besides an upper or lower case a-z.

If you want you can use this part [a-zA-Z0-9_]{1,20} but then you have to use a positive lookahead instead (?=[a-zA-Z]) to assert what is directly to the right is an upper or lower case a-z.

(?:^|\B)#(?=[a-zA-Z])[a-zA-Z0-9_]{1,20}\b

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70