0

I found a query retrieving all properties of Wikidata together with property id, label, description and aliases

PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX wikibase: <http://wikiba.se/ontology#>

SELECT ?p ?pt ?pLabel ?d ?aliases WHERE {
  {
    SELECT ?p ?pt ?d
              (GROUP_CONCAT(DISTINCT ?alias; separator="|") as ?aliases)
    WHERE {
      ?p wikibase:propertyType ?pt .
      OPTIONAL {?p skos:altLabel ?alias FILTER (LANG (?alias) = "en")}
      OPTIONAL {?p schema:description ?d FILTER (LANG (?d) = "en") .}
    } GROUP BY ?p ?pt ?d
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
  }
}

and a query counting properties used by items pointing to Q46 through a statement

SELECT ?property ?count
WHERE {
  SELECT ?property (COUNT(?item) AS ?count)
  WHERE {
    ?item ?statement wd:Q46 . # items pointing to Q46 through a statement
    ?property wikibase:statementProperty ?statement . # property used for that statement
  } GROUP BY ?property # count usage for each property pointing to that entity
} ORDER BY DESC(?count) # show in descending order of uses

I would combine them without depending on Q46 but I don't know exactly how.

Flaviu
  • 931
  • 11
  • 16
  • something like this maybe? -- `SELECT ?property ?propertyLabel ?propertyAltLabel ?propertyDescription ?count WHERE { {SELECT ?property (COUNT(?item) AS ?count) WHERE { ?item ?statement wd:Q46 . ?property wikibase:statementProperty ?statement . } GROUP BY ?property } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY DESC(?count) ` – UninformedUser Apr 09 '21 at 07:48
  • @UninformedUser Well, the query should not depend on Q46 - this is a key issue. All existing statements should be taken into account. – Flaviu Apr 09 '21 at 08:00
  • and what is the final result then? the number of statements per property? I'm not sure where the issue is right now? I mean, if you do not want to depend on `wd:Q46`, why are you not replacing it by a variable? Clearly, this query will likely lead to a timeout for obvious reasons - it's a public shared service and there are hat, 9000 properties or something? and a patter `?s ?p ?o` is worst case scenario as no database index could be used. – UninformedUser Apr 09 '21 at 08:47
  • 1
    In that case, I'd suggest to get all properties first, then perform multiple queries with maybe 50 or 100 properties per query given in a `VALUES` clause. So a small client side Python script would be my way to go. – UninformedUser Apr 09 '21 at 08:48
  • @UninformedUser I need just property id, type, label, description and aliases as it is in the first query. Maybe count could be added but nothing about statements and items. – Flaviu Apr 09 '21 at 09:55

1 Answers1

0

Such SPARQL query will take too much time leading to execution time out. The alternatives are:

  1. Develop & use an application that
  1. Develop & use an application that
  • reads bzip2 dump archive as described at point 1
  • import parsed JSON data into an SQL database
  • perform SQL queries on your own database extracting valuable data
  1. Another way involving less development effort is:
  • extract Wikidata JSON dump archive (~65 GiB) resulting an ~1.4 TB json file
  • develop a small aplication that parse that type of json file using an event-driven parser
  • parse that JSON extracting valuable data
Flaviu
  • 931
  • 11
  • 16
  • PS I extracted myself those properties and I posted them on https://gist.github.com/cflaviu/e0bf0678f2ede69732b7131894f71031 – Flaviu Apr 13 '21 at 10:15