1

Hopeing someone might know whats going on here.

A little background
I am doing some simple cleaning of scraped text from a website. Part of it entails removing all the spaces from the text.

The Issue

replaceAll("\\s+", "")

Simply does not remove some of the spaces. My best guess is that some of the "spaces" are some other character or something I am not too sure. If I manually hit backspace on the "space" between two words/chars and then proceed to insert a space and re-run the replaceAll() it works fine.

In order to better get help on this issue, I will directly paste some of the strings below that have this so-called "space" in them

ANSWER: C
ANSWER: A
ANSWER: TRUE
Dylanrr
  • 31
  • 1
  • 6
  • EDIT: So it looks like stackoverflow did some sort of reformatting of the "spaces" in my examples because they work when copying and pasting them, honestly just makes me more confused. – Dylanrr Oct 15 '19 at 17:06
  • might be because there where some unusual whitespace characters on that site. Java Regex does not recognize all whitespace characters. For example, the good old "space" (`\\u0020`) will be replaced but a "thin space" (`\\u2009`) will not – qutax Oct 15 '19 at 17:19
  • So how might you go about removing all types of spaces as its a very large data set and I can't know every type used? – Dylanrr Oct 15 '19 at 17:22
  • 3
    maybe this will work: https://stackoverflow.com/a/4731164/10551549 – qutax Oct 15 '19 at 17:27
  • Works like a charms thanks. – Dylanrr Oct 15 '19 at 17:51

0 Answers0