Wikidata query with values from CSV-File

Question

I would like to do a Wikidata query of many values that are listed in a column of a CSV file on my computer. How can I load the values from the CSV file into the Wikidata query automatically without copying them in manually?

So far I have worked with the Wikidata query in Visual Studio Code.

This is the query I made for one person:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF {"2467372"}
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}

I want to automatically load many values into the curly brackets of the query above from the column of my CSV file.

Are such values stored in a single column? Are they just item identifiers or they also have a `wd:` prefix? By "automatic" do you mean in a programmatic way or are you also ok with semi-automatic ways like a notepad find & replace tool? Please update your question with such information. — logi-kal, Jan 23 '23 at 15:23
Yes, such values are stored in a single column in the CSV-file. They are just item identifiers like Q49227399 without a wd-prefix. I would like to have an automatic way. I know a semi-automatic way would be possible with OpenRefine but I would like to have an automatic process. — MrLibre, Jan 23 '23 at 16:05
What programming language do you want to use for automatize this process? What have you already tried? — logi-kal, Jan 23 '23 at 16:26
Nothing so far, I am new to this topic. My preferred programming language would be python but I am still learning. — MrLibre, Jan 23 '23 at 16:51
Your starting query doesn't run. Can you please update it to a working [example](https://stackoverflow.com/help/minimal-reproducible-example)? A Also, the example entity, Q49227399, appears to be a place in Norway, which is unlikely to have either a VIAF ID or a birth location. — Tom Morris, Jan 23 '23 at 19:58

logi-kal · Answer 1 · 2023-01-24T10:15:45.040

So, say you have a file my_file.csv with the following content:

2467372
63468347
12447

First of all, import a python library for reading files (like fileinput).

Then declare the pattern that you want to use for your query, using %s as placeholder for the identifiers.

Now, build a list of identifiers as follows:

identifiers = ['wd:'+line.strip() for line in fileinput.input(files='my_file.csv')]

And finally join the list using a space character as separator and pass this string to your query pattern:

query = query_pattern % ' '.join(identifiers)

This is the final code:

import fileinput

filename = 'my_file.csv'
query_pattern = '''SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF { %s }
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}'''

identifiers = ['"'+line.strip()+'"' for line in fileinput.input(files=filename)]
query = query_pattern % ' '.join(identifiers)
print(query)

Executing it, you'll get:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF { "2467372" "63468347" "12447" }
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .

   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}

score 0 · Answer 2 · answered Jan 23 '23 at 23:45

First, I feel compelled to point out that if you don't already know a programming language OpenRefine can do this for you in a few clicks.

Having said that, here's a basic Python program that accomplishes what you literally asked for - reading a set of VIAF ids and adding them to your query:

import csv


def expand_query(ids):
    query = """
    SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation ?birthLocationLabel WHERE {
        VALUES ?VIAF {
        """ + '"' + '" "'.join(ids) + '"' """
        }
        ?Author wdt:P214 ?VIAF. 
        OPTIONAL { ?Author wdt:P19 ?birthLocation. }
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de,en". }
    }
    """
    return query


def main():
    with open('../data/authors.csv', "rt") as csvfile:
        csvreader = csv.DictReader(csvfile, dialect=csv.excel)
        ids = [row["viaf"] for row in csvreader]
        print(expand_query(ids))


if __name__ == "__main__":
    main()

It expects a CSV file with a column called viaf and will ignore all other columns. e.g.

name,viaf
Douglas Adams,113230702
William Shakespeare,96994048
Bertolt Brecht,2467372

I've tweaked the query slightly to:

always output a row even if the birth location isn't available
output the label for the birth location
add English as an additional fallback language for labels

This makes the assumption that you've got a small enough set of identifiers to be able to use a single query, but you can extended it to:

read identifiers in batchs of a convenient size
use SPARQLwrapper to send the results to the Wikidata SPARQL endpoint and parse the results
write the results to a different CSV file in chunks as they're received

Wikidata query with values from CSV-File

2 Answers2