I am trying to write a regular expression, in java, that matches words and hyphenated words. So far I have:
Pattern p1 = Pattern.compile("\\w+(?:-\\w+)",Pattern.CASE_INSENSITIVE);
Pattern p2 = Pattern.compile("[a-zA-Z0-9]+",Pattern.CASE_INSENSITIVE);
Pattern p3 = Pattern.compile("(?<=\\s)[\\w]+-$",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
This is my test case:
Programs Dsfasdf. Programs Programs Dsfasdf. Dsfasdf. as is wow woah! woah. woah? okay. he said, "hi." aasdfa. wsdfalsdjf. go-to go- to asdfasdf.. , : ; " ' ( ) ? ! - / \ @ # $ % & ^ ~ ` * [ ] { } + _ 123
Any help would be awesome
My expected result would be to match all the words ie.
Programs Dsfasdf Programs Programs Dsfasdf Dsfasdf as is wow woah woah woah okay he said hi aasdfa wsdfalsdjf go-to go-to asdfasdf
the part I'm struggling with is matching the words that are split up between lines as one word.
ie.
go- to