How to open and parse large n-triple file with Python?

Question

I want to create a large list of person names and for this reason I downloaded a n-triple file from the KB Yago4. I used the Python library RDFLib and this code:

from rdflib.graph import Graph
g = Graph()
g.parse("yago_data.nt", format="nt")

But I only receive this error: ParseError("Failed to eat %s at %s" % (pattern.pattern, self.line))

How could I load the content of the n-triple file? Or is there a way to convert it to, e.g., xml/txt file? OS: Win10

Help is much appreciated!

if it doesn't fit into your memory use a proper triple store and use SPARQL to query for person names. — UninformedUser, Jul 14 '22 at 05:59
that said, are you sure it is a memory issue or not just a syntax issue? — UninformedUser, Jul 14 '22 at 06:00
also, an .nt file is just a text file, what would be the point of a txt file? You can always parse the nt file via standard Python streaming parser and process the file line wise — UninformedUser, Jul 14 '22 at 06:01
also, which file exactly do you use? the `ntx` file needs an RDF Star capable parser which rdflib is currently not at the moment (there is an open ticket, but isn'T done yet) — UninformedUser, Jul 14 '22 at 06:07
I'm working with a .nt file. Could you recommend a Python streaming parser which can handle .nt files? — joey11235, Jul 14 '22 at 11:23
why do you need an parser that can handle nt files? I would first start with a simple line based approach to filter the person triples. But even then, you'd need another run to get the labels for the person IDs. So, looping twice over the dataset and just keep the relevant data should do the trick — UninformedUser, Jul 15 '22 at 17:04

How to open and parse large n-triple file with Python?

0 Answers0