I am building a tweet classification model, and I am having trouble finding a regex pattern that fits what I am looking for. what I want the regex pattern to pick up:
- Any hashtags used in the tweet but without the hash mark (example - #omg to just omg)
- Any mentions used in the tweet but without the @ symbol (example - @username to just username)
- I don't want any numbers or any words containing numbers returned ( this is the most difficult task for me)
- Other than that, I just want all words returned
Thank you in advance if you can help
Currently I am using this pattern:** r"(?u)\b\w\w+\b"** but it is failing to remove numbers.