0

I am trying to extract number of youtube comments and tried several methods.

My Code:

from selenium import webdriver
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time

DRIVER_PATH = <your chromedriver path>
wd = webdriver.Chrome(executable_path=DRIVER_PATH)

url = 'https://www.youtube.com/watch?v=5qzKTbnhyhc'

wd.get(url)
wait = WebDriverWait(wd, 100)

time.sleep(40)
v_title = wd.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print("title Is ")
print(v_title)

comments_xpath = '//h2[@id="count"]/yt-formatted-string/span[1]'
v_comm_cnt = wait.until(EC.visibility_of_element_located((By.XPATH, comments_xpath)))
#wd.find_element_by_xpath(comments_xpath)
print(len(v_comm_cnt))

I get the following error:

selenium.common.exceptions.TimeoutException: Message: 

I get correct value for title but not for comment_cnt. Can any one please guide me what is wrong with my code?

Please note that comments count path - //h2[@id="count"]/yt-formatted-string/span[1] point to correct place if I search the value in inspect element.

Prophet
  • 32,350
  • 22
  • 54
  • 79
user1768029
  • 415
  • 8
  • 22

1 Answers1

1

Updated answer
Well, it was tricky!
There are several issues here:

  1. This page has some bad java scripts on it making the Selenium webdriver driver.get() method to wait until the timeout for the page loaded while it looks like the page is loaded. To overcome that I used Eager page load strategy.
  2. This page has several blocks of code for the same areas so as sometimes one of them is used (visible) and sometimes the second. This makes working with element locators difficultly. So, here I am waiting for visibility of title element from one of that blocks. In case it was visible - I'm extracting the text from there, otherwise I'm waiting for the visibility of the second element (it comes immediately) and extracting the text from there.
  3. There are several ways to make page scrolling. Not all of them worked here. I found the one that is working and scrolling not too much.
    The code below is 100% working, I run it several times.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.service import Service


options = Options()
options.add_argument("--start-maximized")

caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
s = Service('C:\webdrivers\chromedriver.exe')

driver = webdriver.Chrome(options=options, desired_capabilities=caps, service=s)
url = 'https://www.youtube.com/watch?v=5qzKTbnhyhc'
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver, 10)

title_xpath = "//div[@class='style-scope ytd-video-primary-info-renderer']/h1"
alternative_title = "//*[@id='title']/h1"
v_title = ""
try:
    v_title = wait.until(EC.visibility_of_element_located((By.XPATH, title_xpath))).text
except:
    v_title = wait.until(EC.visibility_of_element_located((By.XPATH, alternative_title))).text

print("Title is " + v_title)
comments_xpath = "//div[@id='title']//*[@id='count']//span[1]"

driver.execute_script("window.scrollBy(0, arguments[0]);", 600)
try:
    v_comm_cnt = wait.until(EC.visibility_of_element_located((By.XPATH, comments_xpath)))
except:
    pass
v_comm_cnt = driver.find_element(By.XPATH, comments_xpath).text
print("Video has " + v_comm_cnt + " comments")

The output is:

Title is Music for when you are stressed  Chil lofi | Music to Relax, Drive, Study, Chill
Video has 834 comments

Process finished with exit code 0
Prophet
  • 32,350
  • 22
  • 54
  • 79
  • Thanks for your quick reply. I am still getting the same error - "selenium.common.exceptions.TimeoutException: Message". I tried to increase webdriver time and also time sleep. Still it threw error. Have you tried your code at your end ? – user1768029 Aug 31 '22 at 18:50
  • I did not actually tried this code, I have no python on my PC today. In case this code did not work try scrolling for entire page height or scrolling to the comments counter element. I think scrolling should work here. In the initially opened page `'//h2[@id="count"]/yt-formatted-string/span[1]'` not existing but after scrolling it appears. I checked that manually. – Prophet Aug 31 '22 at 18:53
  • Well, it was interesting, so I installed the selenium with python and did that! It was not so simple :) – Prophet Aug 31 '22 at 21:50
  • 1
    Thanks a lot. This is perfect! It resolved the error – user1768029 Sep 01 '22 at 01:50
  • It was my pleasure. Some challenge is good :) – Prophet Sep 01 '22 at 06:58
  • Would you like to reply this in selenium? https://stackoverflow.com/questions/73574973/selenium-return-random-field-value-as-missing – user1768029 Sep 01 '22 at 20:13
  • As you can see, I commented there. probably that question will be closed since it is missing basic details. I'm sorry. It is not me, other users will do that. I did not vote to close it. – Prophet Sep 01 '22 at 21:15
  • Thanks again for your reply. I wanted to ask a generic question if it is okay to raise the wait upto 250. I have added my code now in the question with other details. If you are okay with my current details, Could you please vote for reopen ? Thanks again for all your time. – user1768029 Sep 01 '22 at 22:12
  • It is better to ask a new question. Much more simple. And it will appear at the top of the questions while to reopen you will need 3 votes and it will appear far from the top so people will not see it. – Prophet Sep 01 '22 at 22:16
  • Thank You for your advice. I have deleted the previous and added a new one. https://stackoverflow.com/questions/73576011/selenium-return-random-field-value-as-missing. Please take a look when you have some time. – user1768029 Sep 01 '22 at 22:31
  • Sure. But I also need to sleep sometimes :) – Prophet Sep 02 '22 at 07:05
  • Hey Thanks for all your help. I will be deleting that post. I have resolved the issue. My design was not correct . – user1768029 Sep 02 '22 at 08:25
  • Ah, OK. I saw that post but still did not have time to debug it – Prophet Sep 02 '22 at 08:32
  • Your suggestions really helped me a lot. It was a great help! – user1768029 Sep 02 '22 at 08:35
  • You are really welcome. I like challenges :) – Prophet Sep 02 '22 at 08:37