0

So, I am trying to read in a document (.txt) into a java project using a buffered reader, edit it, and return it/output it. The issue I'm having is that I can't get the punctuation to be recognized. The document reads:

hello hello.hello,hello/hello?

As a test of the different circumstances I want to be able to handle. And I get:

hello hello hello hello hello

Any suggestions? (will provide sections of code if needed for answer) I was thinking about using a delimiter but can't figure out how that would be contextually (or if it is even possible with a buffered reader).

BTW, I am reading and editing this document character by character and running it through checks in multiple arrays for including certain characters. If that helps.

2 Answers2

0

You can read the whole file into a String by looping readLine (not recommended for large files)

Then on the string:

String.split("[\s.,/]")

This will split your string on several different characters.

Or if you want to split on all non-letters:

String.split("[^A-Za-z]")

This will give you an array of strings, which is rather easy to work with. Then you just write back at the end.

Cruncher
  • 7,641
  • 1
  • 31
  • 65
0

If you want to handle all the punctuation marks and whitespaces, I'd suggest you go for lucene tokenizer and get the work done. A sample implementation is given here How to use a Lucene Analyzer to tokenize a String? But this depends on what your requirements are. If it is just commas and whitespaces, then a regex would do the needful

Community
  • 1
  • 1
Karthik
  • 1,005
  • 8
  • 7