1

I'm going to use Xodus for storing time-series data (100-500 million rows are inserted daily.)

I saw that Xodus was creating and deleting a lot of .xd files in the background. I read about log-structured design, but I don't clearly understand whether file is created on each transaction commit. Is each file represents snapshot of whole database? Is there any way to disable transactions (i don't need it) ?

Can I get any performance benefits by sharding my data between different stores ? I can store every metric in separate store instead of using one store with multikey. For now I'm creating separate store for each day

user12384512
  • 3,362
  • 10
  • 61
  • 97

1 Answers1

1

The .xd files don't actually represent certain transactions. The files are ordered, so they can be thought as an infinite log of records. Each transaction writes the changes and some meta information for making it possible to retrieve/search for saved data. Any .xd file has its maximum size, and when it is reached the new file is created.

It is not possible to disable transactions.

Basically, sharding your data between different stores gives better performance, at least the smaller the stores are, the faster and smoother GC works in background. The way you shard your data defines the way you can retrieve it. If data in different shards is completely decoupled than it is even better to store shards in different environments, not stores of a single environment. This will also physically isolate data in different shards, not only logically.

Vyacheslav Lukianov
  • 1,913
  • 8
  • 12
  • Can you kindly provide more details regarding database design (maybe you have a link with description) Is whole database is spreaded across multiple xd files ? Is creation of multiple .xd files are done to handle transactions(MVCC) ? – user12384512 Aug 11 '17 at 12:31
  • We don't have any more detailed description of the database design than the one available on Xodus wiki pages: https://github.com/JetBrains/xodus/wiki. Yes, the whole database is spreaded across multiple xd files. At any time, amongst all xd files only one (the newest) is writable, so MVCC is handled without different writable files. – Vyacheslav Lukianov Aug 16 '17 at 15:02
  • The whole database can be thought as a tree. The tree is a partially persistent data structure (https://en.wikipedia.org/wiki/Persistent_data_structure). It lets having different versions (snapshots) of the database in a rather cheap way. – Vyacheslav Lukianov Aug 16 '17 at 15:12