0

I'm using Google Books API to get details about books using their ISBN numbers

ISBN - International Standard Book Number is a numeric commercial book identifier that is intended to be unique

When calling the API using different ISBNs, the response is not the same always as some books have certain fields missing

requests.get(f"https://www.googleapis.com/books/v1/volumes?q=isbn:{'8180315339'}").json() requests.get(f"https://www.googleapis.com/books/v1/volumes?q=isbn:{'938733077X'}").json()

O/p of both the responses will have different numbers of fields returned

I can use try & except to handle errors, but that continues to the next iteration in the loop, i.e calls the API with the next ISBN, how do save the info which is available, and add np.nan in data frame where data is missing


data = requests.get(f"https://www.googleapis.com/books/v1/volumes?q=isbn:{'938733077X'}").json()
# Loop through the items in the "items" field of the JSON data
for item in data['items']:
  # Extract the relevant fields from the item
    try:
        title = item['volumeInfo']['title']
        subtitle = item['volumeInfo']['subtitle']
        authors = item['volumeInfo']['authors']
        publisher = item['volumeInfo']['publisher']
        published_date = item['volumeInfo']['publishedDate']
        description = item['volumeInfo']['description']
        pageCount = item['volumeInfo']['pageCount']
        category = item['volumeInfo']['categories']
        imageS = item['volumeInfo']['imageLinks']['smallThumbnail']
        imageM = item['volumeInfo']['imageLinks']['thumbnail']
        language = item['volumeInfo']['language']
        textSnippet = item['searchInfo']['textSnippet']
    except KeyError:
        continue
# Add the fields to the results list as a tuple
results.append((title, subtitle, authors, publisher, published_date, description, pageCount, category, imageS, imageM, language, textSnippet))

# Create a DataFrame from the results list
df_ = pd.DataFrame(results, columns=['Title', 'Sub Title', 'Authors', 'Publisher', 'Published Date', 'Description', 'Page Count', 'Category', 'imageS', 'imageM', 'Language', 'Text'])

2 Answers2

1

Try using this

title = item.get('volumeInfo', dict()).get('title') # this way if there is no such field you will get None instead of KeyError
Nick
  • 101
  • 3
  • Your version is broken. If `volumeInfo` doesn't exist you will be calling `get` on `None`. Change it to: `item.get('volumeInfo', dict()).get('title')`. Really though, it would be better to capture `volumeInfo` first and only do the rest if it isn't `None` – OneMadGypsy Jan 07 '23 at 18:53
  • This will not work in case the the key volumeInfo is not present. NoneType object has no attribute ‘get’ – satyamdalai Jan 07 '23 at 18:55
  • changed accordingly – Nick Jan 07 '23 at 18:56
  • That will work, but according to the way the OP is doing things they will be calling `get` on the same thing over and over. What about: `if vi := item.get('volumeInfo'):` and then `vi.get('theField')` for all the necessary fields. – OneMadGypsy Jan 07 '23 at 18:59
1

First try to get item['volumeInfo'], and continue only if this succeeds. Using operator.itemgetter will make the code much more compact as well.

from operator import itemgetter


extract = itemgetter("title", 
                     "subtitle",
                     "authors",
                     "publisher",
                     "published_date",
                     "description",
                     "pageCount",
                     "categories",
                     "imageLinks",
                     "language",
                     "textSnippet")
get_thumbnails = itemgetter("smallThumbnail", "thumbnail")

for item in data["items"]:
    try:
        info = item["volumeInfo"]
    except KeyError:
        continue

    t = extract(info)
    results.append(t[:8] + get_thumbnails(t[8]) + t[9:])
chepner
  • 497,756
  • 71
  • 530
  • 681