2

Is there a way to breakdown large triple files (any format), so that they can be parsed? I am currently attempting to load a LOD data dump into a local triplestore using python (rdflib).
Also, is there a better (in terms of scalability) alternative to rdflib?

RDangol
  • 179
  • 9
  • 1
    Given an N-Triples file without blank nodes, you could simply split them line-wise. If there are blank nodes in the data, the triple store must keep the blank node identifiers from the document during loading. I don't know how RDFLib behaves in that direction because the scope of a blank node is defined as the document. – UninformedUser Oct 26 '17 at 14:28
  • What means better? There are plenty of triple stores, many of them also with HTTP support for querying. Some of those triple stores provide optimized bulk-loading. But again, I'm not that familiar with RDFLib - maybe it also provides such a feature. – UninformedUser Oct 26 '17 at 14:30
  • @AKSW I mean better in terms of scalability. If I were to work with millions of triples what would be a suitable RDF library/framework? – RDangol Oct 26 '17 at 14:55
  • @AKSW what about the files with other formats? – RDangol Oct 27 '17 at 07:17
  • 1
    N-Triples is the most obvious format for splitting as each line denotes a single triple, thus, it can be split easily. Other formats do have to be parsed and probably kept into memory to ensure completeness. E.g. Turtle can span several lines and RDF/XML is even more complicated. IF you want to split, please use N-Triples. – UninformedUser Oct 27 '17 at 08:20
  • RDF4J and Apache Jena are two popular Java-based RDF/SPARQL libraries. Indeed, any such library is limited by the memory it can use. Otherwise, native triple stores can be used indeed – UninformedUser Oct 27 '17 at 08:21
  • @AKSW Thank you. I tried converting the files to N-triple format and then splitting it into smaller files. So far, seems to work. – RDangol Oct 28 '17 at 04:00
  • @RDangol I'm facing a similar issue. Did your N-triple have any blank nodes? Did you find any workaround for that? – Niveditha Karmegam Mar 04 '19 at 11:58

0 Answers0