0

This is a follow up to this POST. I have managed to extract and write the data in excel. This is my current code:

import csv
import os
import pandas as pd
import requests
import datetime
from bs4 import BeautifulSoup

url_list = ["https://www.dell.com/community/Inspiron-Desktops/7700-AIO-failed-on-start-up/m-p/8303065#M36274",
            "https://www.dell.com/community/Inspiron-Desktops/Inspiron-7700-AIO-win-10/m-p/8303066#M36275"]
for i in url_list:
    result = requests.get(i)
    soup = BeautifulSoup(result.text, "html.parser")

    date = '11-16-2022'
    comments = []
    
    #Fiscal Week#
    date_object = datetime.datetime.strptime(date, '%m-%d-%Y').date()
    year, week_num, day_of_week = date_object.isocalendar()
    
    comments_section = soup.find('div', {'class':'lia-component-message-list-detail-with-inline-editors'})
    comments_body = comments_section.find_all('div', {'class':'lia-linear-display-message-view'})

    for comment in comments_body:
        if date in comment.find('span',{'class':'local-date'}).text:
            comments.append({
                'FW': week_num,
                'Date': comment.find('span',{'class':'local-date'}).text.strip('\u200e'),
                'Board': soup.find_all('li', {'class': 'lia-breadcrumb-node crumb'})[1].text.strip(),
                'Sub-board':soup.find('a', {'class': 'lia-link-navigation crumb-board lia-breadcrumb-board lia-breadcrumb-forum'}).text,
'Title of Post': soup.find('div', {'class':'lia-message-subject'}).text.strip(),
                'Main Message':  soup.find('div', {'class':'lia-message-body'}).text.strip(),
                'Post Comment': comment.find('div',{'class':'lia-message-body-content'}).text.strip(),
                'Post Time' : comment.find('span',{'class':'local-time'}).text,
                'Username': comment.find('a',{'class':'lia-user-name-link'}).text,
                'URL' : str(i)                           
            })
        
   
    df1 = pd.DataFrame(comments)
    print(df1)

    with open('output.csv', 'a', newline = '') as f:
        df1.to_csv(f, mode='a', header=f.tell()==0, index = False)

This gives me 10 columns in my sheet. However, I dont want 'main message' as a separate column. I want it in the 'Post Comment column itself (provided it was made on the given date). How do i do this as they both have different tags?

NApStor
  • 15
  • 4

0 Answers0