I have recently been exploring linked data and I keep running into one issue after the other. To overcome the performance lag while accessing external endpoints, I wanted to store data dumps locally.
However, the datasets I come across mostly have issues. One frequent one is the URI quality (e.g. Error importing in Jena's TDB: Bad character in IRI (space): <http://bio2rdf.org/genecards:BCR/ABL[space]...>
)
How do I deal with such issue? Is there a way to clean such data dumps or even remove such triples with issues?