-2

Please a little help with regex to use as Pattern in Linkify.

I'm trying to extract #hashtags and @mentions inside a string, so I need to find words inside the string that starts with # and @ (and ends with the blank of course), just in a single regex.

Inside the word, I need to admit every possible chars in any language (somewhere :) ).

Thank you.

EDIT

When I say every possible chars I'm wrong: I need anyhow to follow the same rules of twitter, so for example chars like - are not admitted.

shaithana
  • 2,470
  • 1
  • 24
  • 37

3 Answers3

2

If you want the Twitter rules, why not use the library from the ones who know the rules better than anyone else: the Twitter themselves? :-)

In case you use Gradle, you can just add compile 'com.twitter:twitter-text:1.12.1' to the dependencies in your Gradle file.

Or for Maven, add to pom.xml:

<dependencies>
  <dependency>
    <groupId>com.twitter</groupId>
    <artifactId>twitter-text</artifactId>
    <version>1.12.1</version>
  </dependency>
</dependencies>

Then in your code you can call the Twitter library like this:

import com.twitter.Extractor;

public class Main {
    public static void main(String[] args) {
        Extractor extractor = new Extractor();
        String text = "extracting hashtags and mentions in #java using @twitter library from @github";

        System.out.println("#hashtags:");
        for (String hashtag : extractor.extractHashtags(text)) {
            System.out.println(hashtag);
        }

        System.out.println();
        System.out.println("@mentions:");
        for (String mention : extractor.extractMentionedScreennames(text)) {
            System.out.println(mention);
        }
    }
}
Helder Pereira
  • 5,522
  • 2
  • 35
  • 52
1

UPDATE

After seeing that you want to identify hash tags according to Twitter and reading _Actual_ Twitter format for hashtags? Not your regex, not his code-- the actual one?

Try this pattern:

"^[@#]\\w+|(?<=\\s)[@#]\\w+"

It matches words that start with "@" or "#" that is either at the beginning of a line or is preceded by a space

Code Sample:

public static void main(String[] args) throws Exception {
    String string = "#hashtags and @mentions";
    Matcher matcher = Pattern.compile("^[@#]\\w+|(?<=\\s)[@#]\\w+").matcher(string);
    while (matcher.find()) {
        System.out.println(matcher.group());
    }
}

Results:

#hashtags
@mentions
Community
  • 1
  • 1
Shar1er80
  • 9,001
  • 2
  • 20
  • 29
0

Try this regex (use \\ instead of \ in Java:

/(#\S+)|(@\S+)/g

or

/([#@]\S+)/g

You can use also this to use \1 substitution:

/.*?([#@]\S+)[^#@]*/g

[Regex Demo]

and if you want to remove # and @ use this:

/.*?[#@](\S+)[^#@]*/g

or

/.*?[#@](\S+)[^#@\-]*/g

String rgx = ".*?[#@](\S+)[^#@\-]*";
Pattern pattern = Pattern.compile(rgx, Pattern.DOTALL);
shA.t
  • 16,580
  • 5
  • 54
  • 111