2

I'm trying to create a script to download subtitles from one specific website. Please read the comments in the code. Here's the code:

import requests
from bs4 import BeautifulSoup

count = 0
usearch = input("Movie Name? : ")
search_url = "https://www.yifysubtitles.com/search?q="+usearch
base_url = "https://www.yifysubtitles.com"
print(search_url)
resp = requests.get(search_url)
soup = BeautifulSoup(resp.content, 'lxml')
for link in soup.find_all("div",{"class": "media-body"}):       #Get the exact class:'media-body'
    imdb = link.find('a')['href']                               #Find the link in that class, which is the exact link we want
    movie_url = base_url+imdb                                   #Merge the result with base string to navigate to the movie page
    print("Movie URL : {}".format(movie_url))                   #Print the URL just to check.. :p

    next_page = requests.get(movie_url)                         #Soup number 2 begins here, after navigating to the movie page
    soup2 = BeautifulSoup(next_page.content,'lxml')
    #print(soup2.prettify())
    for links in soup2.find_all("tr",{"class": "high-rating"}): #Navigate to subtitle options with class as high-rating
        for flags in links.find("td", {"class": "flag-cell"}):  #Look for all the flags of subtitles with high-ratings
            if flags.text == "English":                         #If flag is set to English then get the download link
                print("After if : {}".format(links))
                for dlink in links.find("td",{"class": "download-cell"}):   #Once English check is done, navigate to the download class "download-cell" where the download href exists
                    half_dlink = dlink.find('a')['href']                    #STUCK HERE!!!HERE'S THE PROBLEM!!! SOS!!! HELP!!!
                    download = base_url + half_dlink
                    print(download)

I'm getting the following error :

 File "C:/Users/PycharmProjects/WhatsApp_API/SubtitleDownloader.py", line 24, in <module>
    for x in dlink.find("a"):
TypeError: 'NoneType' object is not iterable
  • You should build some fallbacks: what if there is no English subtitle, or no subtitle with the highest rating. See also https://stackoverflow.com/questions/3887381/typeerror-nonetype-object-is-not-iterable-in-python – wasmachien Mar 20 '18 at 09:56
  • I checked out your link, but the thing is "dlink.find('a')['href'], should return a download link to a movies subtitle. And as for the fallbacks, once I get this script working properly in it's rudimentary form, I'll modify it. Thanks for the reply! – Aakash Hirve Mar 20 '18 at 10:02
  • 1
    Did you try changing the above line this `for dlink in links.find("td",{"class": "download-cell"}):` to this `for dlink in links.find_all("td",{"class": "download-cell"}):` – Abdullah Ahmed Ghaznavi Mar 20 '18 at 10:16
  • @Abdullah Ahmed Ghaznavi Thanks! It worked. But what's the difference between "find()" and "find_all()" methods in the context of my code? – Aakash Hirve Mar 20 '18 at 10:21
  • welcome! :) The only difference is that `find_all()` returns a list containing the single result, and `find()` just returns the result. and in your case you are running a loop on a single element rather than a list – Abdullah Ahmed Ghaznavi Mar 20 '18 at 10:27
  • And it works so i will make a answer of it so that it might help others! – Abdullah Ahmed Ghaznavi Mar 20 '18 at 10:29

2 Answers2

1

Just change the above line this:

for dlink in links.find("td",{"class": "download-cell"}):

to this:

for dlink in links.find_all("td",{"class": "download-cell"}):

because you are running a loop on an single element rather than a list.

Note: The only difference is that find_all() returns a list containing the single result, and find() just returns the result.

Hope this will helps you! :)

Abdullah Ahmed Ghaznavi
  • 1,978
  • 3
  • 17
  • 27
0

Have a look at the documentation of find_all() and find().

find_all():

The find_all() method looks through a tag’s descendants and retrieves all descendants that match your filters.

find:

The find_all() method scans the entire document looking for results, but sometimes you only want to find one result. If you know a document only has one <body> tag, it’s a waste of time to scan the entire document looking for more. Rather than passing in limit=1 every time you call find_all, you can use the find() method.

So, you don't need to loop over the find() function to get the tags. You need to make the following changes in your code (removed the unnecessary for loops):

...
# Previous code is the same

soup2 = BeautifulSoup(next_page.content,'lxml')
for links in soup2.find_all("tr",{"class": "high-rating"}):
    if links.find("td", {"class": "flag-cell"}).text == "English":
        print("After if : {}".format(links))
        half_dlink = links.find('td', {'class': 'download-cell'}).a['href']
        download = base_url + half_dlink
        print(download)
Keyur Potdar
  • 7,158
  • 6
  • 25
  • 40