what are the possible ways to version your RDF store in gitlab?
Asked
Active
Viewed 201 times
2
-
the question can't be answered. An RDF store is a database, nobody stores databases in Git based versioning platforms – UninformedUser Dec 15 '20 at 18:33
-
1I think you can do that by creating a database dump and then get it posted as a commit. – Arihant Godha Dec 15 '20 at 18:43
-
1depending on the type of "dump" ... what would be the point of putting it into Gitlab? I mean sure, you can put RDF data files into Github. Clearly, getting those nice Git stuff like changes and diffs out of it does only work when triples/quads keep their positions in the file when the RDF data has been edited. Otherwise, it would always be a full new file per diff. For managing RDF data via Git I can recommend https://github.com/AKSW/QuitStore – UninformedUser Dec 15 '20 at 19:10
-
the intent is to track all the RDF data changes in Git, not sure exactly how to achieve it and if that's worth. – ammo Dec 16 '20 at 19:50
-
You can run a task that periodically exports the database to Turtle and then commits that if there are changes. It will just have to be sufficiently stable – unchanged database should produce same text, and small changes to the data should yield small changes to the textual representation. As an example, sorted N-Triples with no blank nodes will be stable enough, but it is also too verbose to be fit for manual browsing. Might be sufficient for diffs though. Just be aware of blank nodes, since they could have different names in N-Triples (or that might completely change after any update). – IS4 Dec 18 '20 at 14:39
1 Answers
1
As @UninformedUser mentioned already, the QuitStore was developed with this motivation. It generates commits on Sparql Update requests and also implements mere operations on the data. To represent the data in the repository it maintains a canonical representation of the data. This representation allows to view the diffs of the data and also work quite good with the pack files of git to reduce space.
A good start to maintain a quite stable representation of your triples is to use the N-Triples serialization, sort the triples and make them unique. This can be seen in the update-job or the orkg-dump (update.sh). It boils down to:
LC_ALL=C rapper -i <your input serialization> -o n-triples <your file> | sort -u > dump.nt
Setting the locale with LC_ALL=C
is important to maintain the same order across execution environments.

white_gecko
- 4,808
- 4
- 55
- 76