How to show data from the RDF archive in scala flink

Question

I am looking for an approach to load and print data from .n3 files of .tar.gz archive in scala. Or should I extract it?

If you want to download the file, it is located on http://wiki.knoesis.org/index.php/LinkedSensorData

Could anyone describe how can I print the data on the screen from this archive using scala?

Please clarify what you're trying to achieve with Flink! And then you might have a look at the [SANSA project](http://sansa-stack.net/). — UninformedUser, Jun 22 '18 at 05:51
I have to implement reasoning operator on top of the Flink. Just take RDF data and make a simple query in parallel. — renataleb, Jun 22 '18 at 09:30
What is a reasoning operator? SANSA already provides reasoning via Apache Flink. — UninformedUser, Jun 22 '18 at 09:34
Say, it is a function which can get information from the rdf data using predicates. — renataleb, Jun 22 '18 at 09:39
And what is the concrete question now? SANSA is a framework to process RDF data in Apache Spark and Apache Flink and uses Apache Jena. Apache Jena is a general RDF processing framework. — UninformedUser, Jun 23 '18 at 06:53

score 2 · Accepted Answer · answered Jun 21 '18 at 12:00

2

The files that you are dealing with are large. I therefore suggest you import it into an RDF store of some sort rather than try and parse it yourself. You can use GraphDB, Blazegraph, Virtuso and the list goes on. A search for RDF stores should give many other options. Then you can use SPARQL to query the RDF store (which is like SQL for relational databases).

For finding a Scala library that can access RDF data you can see this related SO question, though it does not look promising. I would suggest you look at Apache Jena, a Java library.

You may also want to look at the DBPedia Extraction Framework where they extract data from Wikipedia and store it as RDF data using Scala. It is certainly not exactly what you are trying to do, but it could give you insight into the tools they used for generating RDF from Scala and related issues.

answered Jun 21 '18 at 12:00

Henriette Harmse

4,167
1
13
22

Thank you for the answer! My task is to try reasoning operator in flink (stream). I have only less than 10k files and 12,6 MiB total. Thus, it is not so much to be in memory. I need only to load the data from n3 format and made some actions, without SPARQL and others. For example, show only the data about temperature from LinkedSensorData. Also, I can connect to the cluster with huge amount of memory. The work has only educational/research purposes. – renataleb Jun 21 '18 at 13:59
Ah, OK! I looked at the observation data and not the sensor data! Sure, then you can load it directly using say Jena. – Henriette Harmse Jun 21 '18 at 14:04
I am sorry for my newbie questions, it is really hard to start with functional programming and reasoning. Is it possible to tether Jena and Apacke Flink? I tried to google it, but cannot find any important information. – renataleb Jun 21 '18 at 20:09
I accepted your answer as it contains link to Jena library. Thank you. – renataleb Jun 24 '18 at 16:47

How to show data from the RDF archive in scala flink

1 Answers1