-3

How do I remove the below chars from tweets in a R dataframe using regex

அனà¯à®ªà¯à®®à¯ பாசமà¯à®®à¯ நிறைநà¯à®¤ இஸà¯à®²à®¾à®®à®¿à®¯ சகோதர சகோதரிகள௠கà¯à®•௠ரமà¯à®œà®¾à®©à¯ நலà¯à®µà®¾à®´à¯à®¤à¯à®¤à¯à®•à¯à®•ள௠…

Thanks in advance. :)

Sree51
  • 149
  • 1
  • 1
  • 12

1 Answers1

2

The answer goes out to Rushabh. You can use iconv which converts the strings with one encoding to another and substitutes nonconversable charaters with the value given in argement sub:

foo <- "அனà¯à®ªà¯à®®à¯ பாசமà¯à®®à¯ நிறைநà¯à®¤ இஸà¯à®²à®¾à®®à®¿à®¯ சகோதர சகோதரிகள௠கà¯à®•௠ரமà¯à®œà®¾à®©à¯ நலà¯à®µà®¾à®´à¯à®¤à¯à®¤à¯à®•à¯à®•ள௠…"
iconv(foo, from = "UTF-8", to = "ASCII", sub = "")

Output:

[1] "aaaaaaa aaasaaaa aaaaaaa aaaaaaaa asaaaa asaaaaaaaa aaaa aaaaaaa aaaaaaaaaaaaaaaa a"
Artem
  • 3,304
  • 3
  • 18
  • 41