Is it possible to split strings from a dataframe column based on a list of words?
For example: There is a dataframe with a column Company
, each record includes the company name, a legal form, and sometimes additional information after the legal form like 'electronics'.
Company |
---|
XYZ ltd electronics |
ABC ABC inc iron |
AB XY Z inc |
CD EF GHI JK llc incident |
I have list with 1500 worldwide legal form for companies (inc, ltd, ...). I would like to split the string in the dataframe column, based on this legal form list for example:
['gmbh', 'ltd', 'inc', 'srl', 'spa', 'co', 'sa', 'ag', 'kg', 'ab', 'spol', 'sasu', 'sas', 'pvt', 'sarl', 'gmbh & co kg', 'llc', 'ilc', 'corp', 'ltda', 'coltd', 'se', 'as', 'sp zoo', 'plc', 'pvtltd', 'og', 'gen']
In other words, to separate everything before and after the words in the list to new columns. This is the desired output:
Company | Legal form | Addition |
---|---|---|
XYZ | ltd | electronics |
ABC ABC | inc | iron |
AB XY Z | inc | |
CD EF GHI JK | llc | incident |
Note that "inc" appears in the middle, at the end, and also part of a word in the various company name examples.