I have data that looks like this
import pandas as pd
current_df = pd.DataFrame({ 'Col1':['Something (Something', 'Else) words', 'x', 'y', 'something (another', 'word) blah'],
'Col2':['(some value)', '', 'some value', 'a', 'some value',''],
'Col3':['Something (Something', 'Else) words', '(x)', 'y', 'something (another', 'word) blah'],
'Col4':['some value', '', 'some value', 'a', 'some value',''],
'Col5':['some value', '', 'some value', 'a', 'some value','']})
Col1 Col2 Col3 Col4 Col5
0 Something (Something (some value) Something (Something some value some value
1 Else) words Else) words
2 x some value (x) some value some value
3 y a y a a
4 something (another some value something (another some value some value
5 word) blah
I scraped this from a PDF and in some cases, there was a weird formatting issue where something would carry over to another line. It is always in the same two columns, and there is nothing else in those columns, just as I have shown. Is there a way that I could merge those values with the row above? Desired output is below.
In the example, the data I want to join is split by ( in one row and ) in the next. This is true for about 99% of my data, and I was thinking about trying to leverage that. But if there is a cleaner way to just merge the cells up, please let me know.
goal_df = pd.DataFrame({ 'Col1':['Something (Something Else) words', 'x', 'y', 'something (another word) blah'],
'Col2':['(some value)', 'some value', 'a', 'some value'],
'Col3':['Something (Something Else) words', '(x)', 'y', 'something (another word) blah'],
'Col4':['some value', 'some value', 'a', 'some value'],
'Col5':['some value', 'some value', 'a', 'some value'],})
Col1 Col2 Col3 Col4 Col5
0 Something (Something Else) words (some value) Something (Something Else) words some value some value
1 x some value (x) some value some value
2 y a y a a
3 something (another word) blah some value something (another word) blah some value some value