How to remove & reason for pseudo duplicate entries in govdata.de SPARQL-Query

Question

I have the following very simple query. The query can be testet at: https://www.govdata.de/web/guest/sparql-assistent After looking at the data I saw that there are pseudo duplicate entries which I don´t understand.

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT *
WHERE {
>     ?dataset a dcat:Dataset .
>     ?dataset dct:title ?title .
>     ?dataset dct:description ?description .
> }

An example for the data is the title (You can just search for it after apllying the query in the sparql-assistent: "100m-Höhenlinien Wuppertal 2015" There will be two entries: "100m-Höhenlinien Wuppertal 2015"@de "100m-Höhenlinien Wuppertal 2015"

Where is the difference, what does @de mean / where does ist come frome? I don´t know which field it could be, I checked some fields inside of the the documentation of the data: https://www.dcat-ap.de/def/dcatde/2.0/spec/ But no luck so far.

In the web-portal-search (https://www.govdata.de/) you can only see one entry, so I am a bit confused. There are many datasets where the title is doubled, but most are also doubled inside of the normal portal.

More a second question: In the web-portal I also don´t know how to say which 'dataset' field is which URL in the web-portal. A good example: https://www.govdata.de/web/guest/suchen/-/details/gerichte8e8c3 https://www.govdata.de/web/guest/suchen/-/details/gerichte-2021-07-13 --> There are even more entries, the also have different metadata. But I don´t know how I get from 'dataset' to one of the stated URL above.

I tried to look at different metadata-fields. I did expect that I find there any details. I am new to SPARQL, so I think it is primariy a knowledge problem.

`@de` is a language tag, RDf provides string literals in multiple languages being assigned in triples. You can avoid your issue with either requiring a language tag (and filtering for specific languages) or just using the lexical form only: `PREFIX dcat: PREFIX dct: SELECT DISTINCT * WHERE { ?dataset a dcat:Dataset . ?dataset dct:title ?title . filter(lang(?title) = 'de') ?dataset dct:description ?description . filter(lang(?description) = 'de') }` - note this would only return German literals — UninformedUser, Mar 24 '23 at 12:02
Or you do `SELECT DISTINCT ?dataset (str(?title_) as ?title) (str(?description_) as ?description WHERE { ?dataset a dcat:Dataset . ?dataset dct:title ?title_ . filter(lang(?title) = 'de') ?dataset dct:description ?description_ . }` - indeed, if a dataset provides titles in multiple languages, you should filter for specific languages — UninformedUser, Mar 24 '23 at 12:06
Your second question, as it’s not closely related to your first one, should be its own question post. — Stefan - brox IT-Solutions, Mar 24 '23 at 12:57
@UninformedUser: Thanks. It works, but its not really what I am looking for as the lang literal is not used everywhere. So I could accept this or think about more complex solutions. Still I don't understand the duplicates. For example the title Dichlordiphenyldichlorethan (p,p) im Meerwasser 2020 has three titles in german language with language literal. But only one result in the web portal. One of them having a english descr. without lang. literal. And there is an english title dichlorodiphenyldichloroethane (p,p) in sea water 2020 with german description and lang. literal 'de'. — Matteo, Mar 25 '23 at 23:21

How to remove & reason for pseudo duplicate entries in govdata.de SPARQL-Query

0 Answers0