0

For instance in this website: https://www.amazon.com/Lexani-LXUHP-207-All-Season-Radial-Tire-245/dp/B07FFH8F9V/

So I say "inspect" and I find the element that I'm interested:

<span id="productTitle" class="a-size-large product-title-word-break">        Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W       </span>

Here's the deal, I want to copy the entire thing. Not just the "Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W" text title of the product. Can someone tell me how can I do this in beatifulsoup or rvest?

I am learning Python and R and I tried to dig it in but couldn't get a raw result.

plntx
  • 1
  • 1
  • What have you tried? This is straightforward in both Python and R, and in fact it requires (slightly) *more* effort to obtain just the text than the entire tag, so I am confused as to what exactly the issue is. – Konrad Rudolph Nov 02 '22 at 10:04

2 Answers2

0

there will be problems with captcha on amazon, but if you beat it you can get what you want by

import requests
from bs4 import BeautifulSoup

the_entire_thing = BeautifulSoup(requests.get('https://www.amazon.com/Lexani-LXUHP-207-All-Season-Radial-Tire-245/dp/B07FFH8F9V/').text, 'lxml').find(id='productTitle')
Dmitriy Neledva
  • 867
  • 4
  • 10
0

In R you can just convert the node to a character vector:

library(rvest)
html <- minimal_html('<span id="productTitle" class="a-size-large product-title-word-break">        Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W       </span>')
html_node <- html_element(html, "#productTitle") 
as.character(html_node)
#> [1] "<span id=\"productTitle\" class=\"a-size-large product-title-word-break\">        Lexani LXUHP-207 Performance Radial Tire - 245/45R18 100W       </span>"

Created on 2022-11-02 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20