I have a database with a column in a Pandas dataframe that contains a string that has two substrings (a string & a number) I extract. I do it with str.split(), but not in a very pythonic manner because split, for me at least, returns the first item of the split to the database. And I of course can assign the result to a separate dataframe complete the splits and bring it back.
I am sure there is a better way to do this, but as of yet I have not found it.
Here is an illustrative dataframe:
df = pd.DataFrame ({'d': {71: '2022-01-03', 72: '2022-01-04', 73: '2022-01-06', 74: '2022-01-07', 75: '2022-01-07'},
'e': {71: 'MDU(US5526901096) Cash Dividend USD 0.2175 per Share (Ordinary Dividend)',
72: 'KMB(US4943681035) Cash Dividend USD 1.14 per Share (Ordinary Dividend)',
73: 'IRM(US46284V1017) Cash Dividend USD 0.6185 per Share (Ordinary Dividend)',
74: 'MRK(US58933Y1055) Cash Dividend USD 0.69 per Share (Ordinary Dividend)',
75: 'PEP(US7134481081) Cash Dividend USD 1.075 per Share (Ordinary Dividend)'},
'f': {71: '108.75', 72: '1368', 73: '556.65', 74: '345', 75: '537.5'}})
df
Basically from this dataframe I want to extract the stock symbol and the dividend amount without losing the other columns of 'd' and 'f'.
From the string in column 'e' I extract the symbol of the stock at the beginningof the string and then the current dividend from the middle.
df_to_process['e'] = df_to_process['e'].str.split('\(', expand=True)
provides me with the stock symbol.
To get the dividend amount I copy column 'e' data in the dataframe to a column I create, for purposes here 'gg'. I then split it. Since the dividend amount is in the middle of my split, the column I created gets the number I want.
df_to_process[['aa','gg','cc']] = df_to_process['gg'].str.split('USD|per', expand=True)
or
y = df['gg'].str.split('USD|per', expand=True)
0 1 2
71 MDU(US5526901096) Cash Dividend 0.2175 Share (Ordinary Dividend)
72 KMB(US4943681035) Cash Dividend 1.14 Share (Ordinary Dividend)
73 IRM(US46284V1017) Cash Dividend 0.6185 Share (Ordinary Dividend)
74 MRK(US58933Y1055) Cash Dividend 0.69 Share (Ordinary Dividend)
75 PEP(US7134481081) Cash Dividend 1.075 Share (Ordinary Dividend)
I am not very familiar with Regex and am turning to that next. But is there a way to make this work with split?