I have a dataframe with a column that has many acronyms in it.
I would like to simply (a) identify all acronyms in each cell on the next column and (b) produce a list of all unique acronyms found (not duplicates).
I would like to simply use pyspellchecker to find any word that is misspelled and treat it as an acronym.
I know that method will also produce non-acronyms that are simply misspelled words but I can't think of any other way to do it (unless we assume that all acronyms will also be in all uppercase which is unfortunately not the case in my dataset).
For example I have,
Column 1 |
---|
I worked for the NBA |
I worked at the CIA |
I am seeing a pt |
CIA and NBA are both cool places to work |
Desired output:
Column 1 | Column 2 |
---|---|
I worked for the NBA | NBA |
I worked at the CIA | CIA |
I am seeing a pt | pt |
CIA and NBA are both cool places to work | CIA,NBA |
I also worked at NSA catedslf | NSA, catedslf |
and
{NBA, CIA, pt, NSA, catedslf}
I through catedslf in there just to show that its okay if I also catch misspelled words (I know its unavoidable).