-2

New to web scrapers and I prefer to use Python. Does anyone have any ideas for the easiest way to scrape job descriptions and input them into an excel file? Which scraper would you use?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140

2 Answers2

2

Depends, for a dynamic website Selenium is great. Selenium is a tool that automates web actions. Beautiful Soup is also another option. Beautiful Soup doesn't automate website actions, it will just scrape website data. In my opinion, Beautiful Soup is easier to learn. One basic introduction will be all you need. As for the excel file, there are several libraries you could use, that is more of a preference.

However, for your project I would go with beautiful soup.

As for the process of learning, YouTube is a great place to find tutorials, there are several for both. It's also really easy to find help with issues with either on here.

To give you a hint as to the general structure of your program, I would suggest something like this:

First Step: open an excel file, this file will remain open for the whole time

Second Step: webscraper locates the HTML tag for the job description

Third Step: use a for loop to cycle through each job description within this tag

Fourth Step: for each tag you retrieve the data and send it to an excel sheet

Fifth Step: once your done you close the excel sheet

eagleman21
  • 116
  • 3
1

Libraries I personally use: here

This is generally the boilerplate code most people probably use to start web scraping:

import requests
from bs4 import BeautifulSoup
import re
from pprint import pprint

from os.path import dirname, join
current_dir = dirname(__file__)
print(current_dir)

code = 0

url_loop = "test.com"

r = (requests.get(url_loop))

error = "The page cannot be displayed because an internal server error has occurred."

soup = BeautifulSoup(r.text, 'html.parser')
  • Request is how you send HTTP Requests
  • BS4 is how you parse and extract specific info from the page such as all h1 tags
  • Pprint just formats the result nicely

As for using the collected data in excel: Here

Good luck!

  • I'm not able to use my account any further without "correcting" my question. Any suggestions to get upvotes? – Jonathan Ambriz Oct 22 '20 at 22:20
  • @JonathanAmbriz your question isn't getting upvotes because it shows no effort from your side. No research effort, no failed attempt, nothing. We like to see what people have tried before we go about giving advice – Sabito stands with Ukraine Nov 08 '20 at 23:53