0

I want to create a large list of person names and for this reason I downloaded a n-triple file from the KB Yago4. I used the Python library RDFLib and this code:

from rdflib.graph import Graph
g = Graph()
g.parse("yago_data.nt", format="nt")

But I only receive this error: ParseError("Failed to eat %s at %s" % (pattern.pattern, self.line))

How could I load the content of the n-triple file? Or is there a way to convert it to, e.g., xml/txt file? OS: Win10

Help is much appreciated!

joey11235
  • 53
  • 1
  • 7
  • if it doesn't fit into your memory use a proper triple store and use SPARQL to query for person names. – UninformedUser Jul 14 '22 at 05:59
  • that said, are you sure it is a memory issue or not just a syntax issue? – UninformedUser Jul 14 '22 at 06:00
  • also, an .nt file is just a text file, what would be the point of a txt file? You can always parse the nt file via standard Python streaming parser and process the file line wise – UninformedUser Jul 14 '22 at 06:01
  • also, which file exactly do you use? the `ntx` file needs an RDF Star capable parser which rdflib is currently not at the moment (there is an open ticket, but isn'T done yet) – UninformedUser Jul 14 '22 at 06:07
  • I'm working with a .nt file. Could you recommend a Python streaming parser which can handle .nt files? – joey11235 Jul 14 '22 at 11:23
  • why do you need an parser that can handle nt files? I would first start with a simple line based approach to filter the person triples. But even then, you'd need another run to get the labels for the person IDs. So, looping twice over the dataset and just keep the relevant data should do the trick – UninformedUser Jul 15 '22 at 17:04

0 Answers0