0

I am currently using the MediaWiki API (through SPARQL SERVICE wikidata:mwapi) to query entities from Wikidata, using the wbsearchentities endpoint.

However, I have noticed that the search results are not that good (example searching for charlton or heston, does not return Charlton Heston among the 10 first results, which would be expected), while the standard search endpoint query?list=search works better.

My question is what algorithm is used for wbsearchentities, and why doesn't it work as well as the standard query?list=search ?

My current understanding is that wbsearchentities only searches labels, while query?list=search is a full-text search, but it still should not justify this discrepancy in results IMO.

Cheers !

mhham
  • 161
  • 1
  • 5
  • from the [API description](https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities): *Searches for entities using labels and aliases. Returns a label and description for the entity in the user language if possible. Returns details of the matched term. The matched term text is also present in the aliases key if different from the display label.* – UninformedUser Dec 31 '18 at 11:52
  • Why do you expect "Charlton Heston" to be returned among the first 10? Check this query: `SELECT * WHERE { SERVICE wikibase:mwapi { bd:serviceParam wikibase:api "EntitySearch" . bd:serviceParam wikibase:endpoint "www.wikidata.org" . bd:serviceParam mwapi:search "charlton" . bd:serviceParam mwapi:language "en" . ?item wikibase:apiOutputItem mwapi:item . ?num wikibase:apiOrdinal true . } } ORDER BY ASC(?num) LIMIT 20` - there are so many items in Wikidata named "Charlton", it's obvious that those results will be returned w.r.t. string similarity – UninformedUser Dec 31 '18 at 12:02
  • Thanks for your comment. The main idea is to have relevant search results. When searching on the wikidata website search bar, I usually get quite good search results, and I believe they are the ones from `query?list=search`. I am just wondering why `wbsearchentities` does not work so well, and in the end if it is useful at all compared to the standard query (maybe for performance ?) – mhham Dec 31 '18 at 12:07
  • ok. the same question was asked and answered [here](https://stackoverflow.com/questions/37170179/wikidata-api-wbsearchentities-why-are-results-not-the-same-in-python-than-in-wi). hope this helps or at least answers the question? – UninformedUser Dec 31 '18 at 12:10
  • by the way, when I enter "Charlton" in the textbox at the top right of the Wikidata start page, I still don't get "Charlton Heston" as one of the first 10 auto-suggested items – UninformedUser Dec 31 '18 at 12:12
  • Note, if you really want a "better" fulltext search, string similarity alone is clearly not enough. Some measure of popularity like e.g. pagerank is needed. And I'm sure, that's not done in Wikidata – UninformedUser Dec 31 '18 at 12:14
  • Ok that's interesting ! I should then probably be using the standard MediaWiki search API and not `wbsearchentities`. Also, your remark concerning pagerank is really on spot, as I have recently stumbled upon [this](https://hal.archives-ouvertes.fr/hal-01905724) – mhham Dec 31 '18 at 13:55
  • yeah, I know those guys and their work. They also published some datasets, though indeed outdated now: http://people.aifb.kit.edu/ath/ But, it's pretty simple to recompute one of those measures. Nevertheless, you would have to load it into your own triple store. Either in combination with the Wikidata dump, or just the pagerank dataset alone which means one local query for the entity lookup + a remote request to the Wikidata endpoint to gather further data – UninformedUser Dec 31 '18 at 15:29
  • Yes, I am considering working on a local wikidata dump, to make all of this faster. But this is another topic ! – mhham Dec 31 '18 at 15:58
  • `SELECT ?item ?itemLabel ?num { SERVICE wikibase:mwapi { bd:serviceParam wikibase:api "Search" . bd:serviceParam wikibase:endpoint "www.wikidata.org" . bd:serviceParam mwapi:srsearch "charlton" . ?item wikibase:apiOutputItem "@title" . bd:serviceParam wikibase:limit 20 . ?num wikibase:apiOrdinal true . } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } ORDER BY ?num` – Stanislav Kralin Jan 04 '19 at 08:10

0 Answers0