-1

I have scraped a webpage using beautiful soup. I'm trying to get rid of a '\n' character which isnt eliminated despite whatever I try.

My effort so far:

wr=str(loc[i-1]).strip()
wr=wr.replace(r"\[|'u|\\n","")
print(wr)

Output:

    [u'\nWong; Voon Hon (Singapore, SG
Kandasamy; Ravi (Singapore, SG
Narasimalu; Srikanth (Singapore, SG
Larsen; Gerner (Hinnerup, DK
Abeyasekera; Tusitha (Aarhus N, DK

How do I eliminate the [u'\n? What am I doing wrong?

The full code is here.

FlyingAura
  • 1,541
  • 5
  • 26
  • 41

2 Answers2

1

You need to escape the newline character (double "\"):

rep=["[","u'","\\n"]
for r in rep:
    wr=wr.replace(r,"")

This is the same as @cricket_007's answer, however, the second part from his answer does not work for me. To my knowledge, str.replace() does not support these kind of regular expression lookups.

mpurg
  • 201
  • 1
  • 6
0

You need to escape the backslash or use a raw string. Otherwise, it's a newline character, not a literal \n

Also, I don't think beautifulsoup is outputting unicode strings. You see the string representation in python as u'blah'

And you shouldn't need a list of elements to remove. The expression can be

r"\[|'u|\n"
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245