1

I am just using Graphdb EE for evaluation.

I intend to migrate my bigdata from Cassandra to Graphdb but i read the docs that it can contain 2^40 entity = 2,000B entities. I have few questions regarding it:

  1. Is a way to extend to unlimited entities?

  2. I want to use many repositories to manage my data and the way to connect them to use as single repo ?

  3. Is there a way to search on multiple entites and on multiple properties (already indexed on elasticsearch) /entity?

  4. Do i need to create each ES connector all properties /per entities to get the best performance?

Ananth
  • 2,597
  • 1
  • 29
  • 39

1 Answers1

1

David, please, see below quick answers.

  1. Is a way to extend to unlimited entities?

2^40 means 1T entities. Do you really need more than this?

Entities in GraphDB are the nodes in the graph: URI, literals, blank lists. On average, you would have a multiple edges/statements per node (say 5x).

  1. I want to use many repositories to manage my data and the way to connect them to use as single repo?

Yes, please see the so called internal federation, which allows you to efficiently do federation in a SPARQL query, across repositories in one and the same GraphDB instance.

  1. Is there a way to search on multiple entites and on multiple properties (already indexed on elasticsearch) / entity?

I am not sure I understand your questions. You can definitely embed multiple FTS queries in a single SPARQL query. Those FTS queries can search for different entities using different fields. You can read more on this here.

  1. Do I need to create each ES connector all properties / per entities to get the best performance?

You can have multiple indices for one and the same repo. The best way to boost performance is to have specific indices (on specific properties/fields with specific filters) for those queries which are most critical for you.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
  • Thanks for reply my question. Yes, because my data is extremely large and maybe more larger in the future. so i need to careful about the performance and data storage. If i have a 1B Entities -> how long SPARQL query takes on these entities (expect <3s). If the entities too large, Does it affect the inference ? Thanks – david backa Mar 09 '20 at 07:51
  • David, yes, most of the queries should finish in less than 1 sec. Query performance is not so much dependent on repository size. Even with a repo with 1M entities, you can get slow query if your query implies full table scan or Cartesian products. Inference is performed upon update, so, it doesn't affect queries. If you get into really big volumes of data, you can consider off-loading some of them in MongoDB and using the GraphDB-Mongo connector, which allows you to query the data in Mongo via SPARQL. Please, consider writing to support@ontotext.com to describe your case and get recommendation – Atanas Kiryakov Mar 09 '20 at 10:28
  • Thanks for anwsering my question. if i update (CRUD thing, object property, data property) my ontology, my data is still fine or face some problems ? – david backa Mar 10 '20 at 06:33
  • Major changes in the ontology my trigger inference or retraction of inferred statements, which on a big repository can take a while. I.e. the corresponding update transaction will take some time - depending on the nature of the change. This is the price to pay for materialization-based reasoning. On the other hand, this is the only option for reasoning on big datasets. Backward-chaining is known not to work well and there is fundamental reason - it makes query optimization practically impossible due to lack of stats for the selectivity of each of the query patterns. You data will not be harmed – Atanas Kiryakov Mar 10 '20 at 08:13
  • I created Repo with Ruleset OWL2-RL and then imported ontology owl2 with a SWRL rule(it works on Hermit Reasoner), but graphdb cant infer the fact extracted from that SWRL rule. I dont know why? – david backa Mar 11 '20 at 07:47
  • David, can you please provide reference to these ontology? GraphDB does not support SWRL, because it is a superset of OWL 2 DL. And we don't believe DL logic can be efficiently supported on top of big volumes of data in a database engine. It does require statisfiability checking, which is too expensive to do at big scale. – Atanas Kiryakov Mar 11 '20 at 15:53
  • my ontology about social network. Just simple SWRL rules two people(?p1,?p2) workInOrg Org(?org) => colleagueOf(?p1,?p2) and if thier colleague, one people workInOrg => the other person also works in that Org. I also build OWL Axioms but it cant take affect on Graphdb. Another try is that i wrote a custom rule, but Graphdb only alow to choose one Rule (In my case "owrl2-rl") and my custom rule only take affect when combining on this rule OWL2-RL. Is there a way to use combine rules ? – david backa Mar 12 '20 at 02:33