5

Is there any python module (may be in nltk python) to remove internet slang/ chat slang like "lol","brb" etc. If not can some one provide me a CSV file comprising of such vast list of slang?

The website http://www.netlingo.com/acronyms.php gives the list of acronyms but I am not able to find any CSV files for using them in my program.

Rkz
  • 1,237
  • 5
  • 16
  • 30
  • Thanks for that tip on acceptance I was not paying attention to it until now. About my question, I have used beautiful soup for parsing xml content, but now I am just looking for the list of acronyms does beautiful soup have one such module containing such list/dictionary of acronyms? I doubt it. – Rkz Dec 14 '11 at 10:04
  • You misunderstood me: you should use BS to turn the HTML page on the linked site into a CVS file. :) – mac Dec 14 '11 at 10:08
  • You might also want to check out the `acronyms` file that comes with the `wtf` utility in some Unix distributions. I found one version online: http://svn.dslinux.org/viewvc/dslinux/branches/bsdgames_branch/user/games/bsdgames/wtf/acronyms?revision=565&view=markup&pathrev=565 – Ferdinand Beyer Dec 14 '11 at 10:56
  • Oh that was definitely a worthy acronym file. Thanks – Rkz Dec 28 '11 at 04:59

2 Answers2

4

code to scrap http://www.netlingo.com/acronyms.php

from bs4 import BeautifulSoup
import requests, json
resp = requests.get("http://www.netlingo.com/acronyms.php")
soup = BeautifulSoup(resp.text, "html.parser")
slangdict= {}
key=""
value=""
for div in soup.findAll('div', attrs={'class':'list_box3'}):
    for li in div.findAll('li'):
        for a in li.findAll('a'):
            key =a.text
            value = li.text.split(key)[1]
            slangdict[key]=value

with open('myslang.json', 'w') as f:
    json.dump(slangdict, f, indent=2)
bones.felipe
  • 586
  • 6
  • 20
CKM
  • 1,911
  • 2
  • 23
  • 30
2
cyborg
  • 9,989
  • 4
  • 38
  • 56
  • That is definitely a huge collection of Jargons, although filtering out the needed ones is a bit of pain. Thanks – Rkz Dec 28 '11 at 05:05