1

I am using R for web scraping and I have a script that copies the text of a link and then uses this string to follow the link using RSelenium. Unfortunately this does not seem to work for one particular string when I run the script, but when I replace the saved string with a manually typed string it works fine. A little digging into the html of the page reveals that one of the space characters is actually a   character and this is why the string is failing to match. How to I replace   with a normal space in R? I have tried using the stringr library and the str_replace command as follows:

var1 <- str_replace(var1, pattern = "&nbsp;", " ")

But this does not appear to work. Is there anything I am obviously doing wrong? And is there a way to get R to display a string with all the weird formatting characters visible?

iProcrastinate
  • 131
  • 2
  • 7
  • 1
    your code works for me.. – Avinash Raj Apr 30 '15 at 16:52
  • 1
    If it doesn't work, you should provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and be clear on what exactly "doesn't work" means. – MrFlick Apr 30 '15 at 17:08
  • Thanks for the input. I have realised that I should have used str_replace_all() rather than str_replace(). MrFlick - I could not create a reproducible example because the page I was scraping required a log in and I could not find a way to copy the string into my post and show the actual   character since the print() function represents it as a normal space. I would be grateful if you knew of a way to do this for future reference. – iProcrastinate May 01 '15 at 09:40

1 Answers1

5

You need the \u00A0 as replace char so var1 <- gsub("\u00A0", "", var1, fixed =TRUE)

KristofMols
  • 3,487
  • 2
  • 38
  • 48
langeleppel
  • 121
  • 2
  • 3