-1

I would like to do a Wikidata query of many values that are listed in a column of a CSV file on my computer. How can I load the values from the CSV file into the Wikidata query automatically without copying them in manually?

So far I have worked with the Wikidata query in Visual Studio Code.

This is the query I made for one person:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF {"2467372"}
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}

I want to automatically load many values into the curly brackets of the query above from the column of my CSV file.

Tom Morris
  • 10,490
  • 32
  • 53
MrLibre
  • 1
  • 2
  • Are such values stored in a single column? Are they just item identifiers or they also have a `wd:` prefix? By "automatic" do you mean in a programmatic way or are you also ok with semi-automatic ways like a notepad find & replace tool? Please update your question with such information. – logi-kal Jan 23 '23 at 15:23
  • Yes, such values are stored in a single column in the CSV-file. They are just item identifiers like Q49227399 without a wd-prefix. I would like to have an automatic way. I know a semi-automatic way would be possible with OpenRefine but I would like to have an automatic process. – MrLibre Jan 23 '23 at 16:05
  • What programming language do you want to use for automatize this process? What have you already tried? – logi-kal Jan 23 '23 at 16:26
  • Nothing so far, I am new to this topic. My preferred programming language would be python but I am still learning. – MrLibre Jan 23 '23 at 16:51
  • Your starting query doesn't run. Can you please update it to a working [example](https://stackoverflow.com/help/minimal-reproducible-example)? A Also, the example entity, Q49227399, appears to be a place in Norway, which is unlikely to have either a VIAF ID or a birth location. – Tom Morris Jan 23 '23 at 19:58
  • Thanks (even though that's a completely different query!) – Tom Morris Jan 23 '23 at 23:46

2 Answers2

0

So, say you have a file my_file.csv with the following content:

2467372
63468347
12447

First of all, import a python library for reading files (like fileinput).

Then declare the pattern that you want to use for your query, using %s as placeholder for the identifiers.

Now, build a list of identifiers as follows:

identifiers = ['wd:'+line.strip() for line in fileinput.input(files='my_file.csv')]

And finally join the list using a space character as separator and pass this string to your query pattern:

query = query_pattern % ' '.join(identifiers)

This is the final code:

import fileinput

filename = 'my_file.csv'
query_pattern = '''SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF { %s }
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}'''

identifiers = ['"'+line.strip()+'"' for line in fileinput.input(files=filename)]
query = query_pattern % ' '.join(identifiers)
print(query)

Executing it, you'll get:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF { "2467372" "63468347" "12447" }
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .

   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}
logi-kal
  • 7,107
  • 6
  • 31
  • 43
0

First, I feel compelled to point out that if you don't already know a programming language OpenRefine can do this for you in a few clicks.

Having said that, here's a basic Python program that accomplishes what you literally asked for - reading a set of VIAF ids and adding them to your query:

import csv


def expand_query(ids):
    query = """
    SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation ?birthLocationLabel WHERE {
        VALUES ?VIAF {
        """ + '"' + '" "'.join(ids) + '"' """
        }
        ?Author wdt:P214 ?VIAF. 
        OPTIONAL { ?Author wdt:P19 ?birthLocation. }
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de,en". }
    }
    """
    return query


def main():
    with open('../data/authors.csv', "rt") as csvfile:
        csvreader = csv.DictReader(csvfile, dialect=csv.excel)
        ids = [row["viaf"] for row in csvreader]
        print(expand_query(ids))


if __name__ == "__main__":
    main()

It expects a CSV file with a column called viaf and will ignore all other columns. e.g.

name,viaf
Douglas Adams,113230702
William Shakespeare,96994048
Bertolt Brecht,2467372

I've tweaked the query slightly to:

  • always output a row even if the birth location isn't available
  • output the label for the birth location
  • add English as an additional fallback language for labels

This makes the assumption that you've got a small enough set of identifiers to be able to use a single query, but you can extended it to:

  • read identifiers in batchs of a convenient size
  • use SPARQLwrapper to send the results to the Wikidata SPARQL endpoint and parse the results
  • write the results to a different CSV file in chunks as they're received
Tom Morris
  • 10,490
  • 32
  • 53