1

I am trying to retrieve abstracts via Scopus Abstract Retrieval. I have a file with 3590 EIDs.

import pandas as pd
import numpy as np

file = pd.read_excel(r'C:\Users\Amanda\Desktop\Superset.xlsx', sheet_name='Sheet1')

from pybliometrics.scopus import AbstractRetrieval
for i, row in file.iterrows():
  q = row['EID']
  ab = AbstractRetrieval(q,view='META_ABS')
  file.at[i,"Abstract"] = ab.description
  print(str(i) + ' ' + ab.description)
  print(str(''))

I get a value error - Value Error

In response to the value error, I altered the code.

from pybliometrics.scopus import AbstractRetrieval
error_index_valueerror = {}

    for i, row in file.iterrows():
      q = row['EID']
      try:
        ab = AbstractRetrieval(q,view='META_ABS')
        file.at[i,"Abstract"] = ab.description
        print(str(i) + ' ' + ab.description)
        print(str(''))
      except ValueError:
        print(f"{i} Value Error")
        error_index_valueerror[i] = row['Title']
        continue

When I trialed this code with 10-15 entries, it worked well and I retrieved all the abstracts. However, when I ran the actual file with 3590 EIDs, the output would be a series of 10-12 value errors before a type error ('can only concatenate str (not "NoneType") to str surfaces.

Series of value errors and a type error

I am not sure how to tackle this problem moving forward. Any advice on this matter would be greatly appreciated!

(Side note: When I change view='FULL' (as recommended by the documentation), I still get the same outcome.)

Apples
  • 29
  • 5

1 Answers1

1

Without EIDs to check, it is tough to point to the precise cause. However, I'm 99% certain that your problem are missing abstracts in the .description property. It's sufficient when the first call is empty, because it will turn the column type into float, to which you wish to append a string. That's what the error says.

Thus your problem has nothing to do with pybliometrics or Scopus, but with the way you bild the code.

Try this instead:

import pandas as pd
import numpy as np
from pybliometrics.scopus import AbstractRetrieval

def parse_abstract(eid):
    """Retrieve Abstract of a document."""
    ab = AbstractRetrieval(q, view='META_ABS')
    return ab.description or ab.abstract


FNAME = r'C:\Users\Amanda\Desktop\Superset.xlsx'
df = pd.read_excel(FNAME, sheet_name='Sheet1')
df["abstract"] = df["EID"].apply(parse_abstract)

Instead of appending values one-by-one in a loop, which is slow and error-prone, I use pandas' .apply() methods.

Also note how I write ab.description or ab.abstract. https://pybliometrics.readthedocs.io/en/stable/classes/AbstractRetrieval.html states that both should yield the same but can be empty. With this statement, if ab.description is empty (i.e., falsy), it will use ab.abstract instead.

MERose
  • 4,048
  • 7
  • 53
  • 79
  • Thank you for pointing out the error. Now, I understand why I get those error messages. Appreciate your help with the more robust code. – Apples Feb 27 '22 at 12:00