BeautifulSoup Python to Dataframe

Question

I'm trying to convert scraped data to a pd dataframe(table). The info is retrieved via beautifulsoup from different tags (a, span, div). for ul in soup_level1.find('ul', {'class':"fix3"}):

divjt=ul.find('div',{'class':"topb"})
a=divjt.find('a')
trajectory=a.text.strip()
divloc=ul.find('div',{'class':"under"})
d=divloc.find('div')
sp=ul.find('span',{'class':"blk"})
object=sp.text.strip()
try: 
    sas=ul.find_all('span',{'class':"f1"}) 
    timex=sas[0].text 
except IndexError: 
    timex=''
datalist.append[jobtitle,city,timex]
headers=['Traj', 'Object', 'Time']
A=[trajectory]
B=[object]
C=[timex]
datac=A+B+C
df = pd.DataFrame(datac)

print(df)

The result I am getting right now is

 0
0  BRD - TWD
1                          MER
2                         11/10/2018
                                                   0
0  SFX - NYT
1                                               MER
2                                        10/05/2016
                  0
0  GER - BEN
1              MER
2             05/06/2016

I would basically want to "dump" those results in a proper dataframe table where each row is printed to excel accordingly.

0  BRD - TWD    MER    11/10/2018
1  SFX - NYT    MER    10/05/2016
2  GER - BEN    MER    05/06/2016

Thank you!

Please share the web link which are parsing it to see the tags layout, without that it would be difficult to help. — nandneo, Oct 10 '18 at 10:09

score 0 · Answer 1 · answered Oct 10 '18 at 10:12

If you want the data in an a Excel use csv Format instead , A csv file can be opened in excel/Libre office to get the required result

var row =  value1 + ":" + value2 + ":" + value3  ;
   await fs.appendFile('file_name.csv', row + os.EOL, function (err) {
    if (err) throw err;
   });

this is How I Did it in Javascript.

score 0 · Answer 2 · answered Oct 10 '18 at 11:07

0

Try use zip instead datac=A+B+C. Like

zip(A, B, C)

answered Oct 10 '18 at 11:07

Zheka Koval

525
4
10

Thanks Zheka, this solved part of the issue once I wrapped df = pd.DataFrame(list(zip(A,B,C))). But I am still missing the ordered numeration, each row has 0 in front of it, while I am expecting a count. – ThinkPad Oct 10 '18 at 11:15
@ThinkPad try https://stackoverflow.com/questions/20167930/start-index-at-1-for-pandas-dataframe. It might help you. – Zheka Koval Mar 08 '20 at 05:57

score 0 · Answer 3 · answered Oct 10 '18 at 12:03

I found a solution - just needed to append the values, and then exporting to csv is easy-peasy.

    A.append(trajectory)
    B.append(object)
    C.append(timex)

    test_df = pd.DataFrame({'Col1': A,
                           'Col2': B,
                           'Col3': C})
    test_df.to_csv('file_name.csv')

BeautifulSoup Python to Dataframe

3 Answers3