3

Using this page as an example:

https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/

How would one hypothetically scrape the text answer from behind the flashcard? It's hidden right now, but when you click on it, it rotates and shows the answer.

What I've seen so far looks like this, but the right element isn't being selected I'm sure:

def find_quizlet_flashcard_answer(quizlet_url):

    # desktop user-agent
    USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
    # mobile user-agent
    MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
    headers = {"user-agent": USER_AGENT}
    
    resp = requests.get(quizlet_url, headers=headers)

    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        inner_divs = soup.find_all("div", {"aria-hidden": "true"})
        for g in inner_divs:
            result = g.text
            print(result)
    return result
wildcat89
  • 1,159
  • 16
  • 47

1 Answers1

4

To get all questions and answers you can use this example:

import requests
from bs4 import BeautifulSoup


url = 'https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

for i, (question, answer) in enumerate(zip(soup.select('a.SetPageTerm-wordText'), soup.select('a.SetPageTerm-definitionText')), 1):
    print('QUESTION {}'.format(i))
    print()
    print(question.get_text(strip=True, separator='\n'))
    print()
    print('ANSWER:')
    print(answer.get_text(strip=True, separator='\n'))
    print('-' * 160)

Prints:

QUESTION 1

Which of the following are true regarding IPv4?
a. 32-bit address
b. 128-bit address
c. Consists of a network ID and MAC address
d. Consists of a host ID and MAC address

ANSWER:
a. 32-bit address
----------------------------------------------------------------------------------------------------------------------------------------------------------------
QUESTION 2

How many bits does a standard IPv6 unicast address use to represent the network ID?
a. 32
b. 64
c. 128
d. 10

ANSWER:
b. 64
----------------------------------------------------------------------------------------------------------------------------------------------------------------
QUESTION 3

Which of the following Windows PowerShell commands performs a DNS name query for www.contoso.com?
a. ping www.contoso.com
b. dnsquery www.contoso.com
c. resolve-DNSName -Name www.contoso.com
d. resolve-DNSquery www.comcast.net

ANSWER:
c. resolve-DNSName -Name www.contoso.com
----------------------------------------------------------------------------------------------------------------------------------------------------------------


...and so on.
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 3
    Wicked! Thanks a lot! – wildcat89 Sep 21 '20 at 23:24
  • @AndrejKessely Can I ask how you are overcoming the captcha? I keep getting denied by it. – zbush548 Oct 02 '20 at 15:05
  • @zbush548 Running the script I don't get captcha page. Maybe you are doing too many requests in too short time (try to put `time.sleep` in between the requests). Also, try different `User-Agent`s and/or headers. – Andrej Kesely Oct 02 '20 at 15:47
  • @AndrejKesely That's really strange. I don't think it's the result of too many requests, because I get the CAPTCHA on the first try. I'll try different headers! Thank you. By the way, could it be because I am using Google Colab? – zbush548 Oct 02 '20 at 17:15
  • 1
    @zbush548 It could be...the server could ban some IP ranges (maybe Google Colab too) – Andrej Kesely Oct 02 '20 at 18:04
  • 1
    Yeah, I can't get it to work for me either, it may be that I'm running it from an online ide though, so hopefully its just a banned IP, great answer though! – Ironkey Oct 10 '20 at 02:52
  • @AndrejKesely Great! It worked in my local environment. But not working inside AWS Lambda. Getting the following message: `The security system for this website has been triggered. Completing the challenge below proves you are a human and gives you temporary access.` – Awolad Hossain Jan 27 '22 at 12:07