1

I'm using Xodus for storing time-series data (100-500 million rows are inserted daily.)

I have multiple stores per one environment. New store is created every day, older stores (created more than 30 days can be deleted). Recently my total environment size grew up to 500 gb.

Reading/Writing speed degraded dramatically, after initial investigation it turns out, that Xodus background cleaner thread is consuming almost all IO resources. iostats shows almost 90 % utilization with 20 mb/sec reading and 0 mb/sec writing.

I decided to give background thread some time to cleanup environment, but it keep running for few days, so eventually I had to delete whole environment.

Xodus is great tool, it looks for me that I've made wrong choose, Xodus is not designed for inserting huge amount of data due append-only modifications design. If you insert too much data, background cleaner thread will not be able to compact your data and will consume all IO.

Can you advice any tip and tricks when working with big data size with Xodus ? I could create new environment every day instead of creating new store

user12384512
  • 3,362
  • 10
  • 61
  • 97

1 Answers1

1

If you are ok about fetching data from different environments, then you will definitely benefit from creating an instance of Environment every day instead of an instance of Store. In that case, GC will work on only a daily amount of data. Insertion rate will be more or less constant, whereas fetching will slowly degrade with the increase of the total amount of data.

If working with several environments within a single JVM, make sure the exodus.log.cache.shared setting of EnvironmentConfig is set to true.

Vyacheslav Lukianov
  • 1,913
  • 8
  • 12
  • Thanks for your answer. Does background-cleaner thread should traverse whole environment to make cleanup? Why background-cleaner thread takes so much time when working with huge environment? – user12384512 Mar 23 '18 at 16:19
  • @user12384512 Background cleaner doesn't traverse entire environment. GC maintains utilization per each .xd file, and utilization of whole environment. Utilization is a fraction of usable space to the whole space. Cleaner always starts with a file with the least utilization (i.e. having most free space), it moves actual data to the end of the log (to the most recent .xd file). After an .xd file is cleaned, the GC transaction is committed and cleaner can proceed with another file. In this process some other data can be required to be moved in order to keep the database consistency. – Vyacheslav Lukianov Mar 30 '18 at 11:25
  • In case of a huge environment (especially in case of huge single `Store`), the data gets fragmented more, so an attempt to clean a single file more likely would require random read access in the environment. BTW, such cleaning process reduces not only free space, but fragmentation as well. – Vyacheslav Lukianov Mar 30 '18 at 11:26
  • @user12384512 hello, are you running Xodus in one server, have you tried to run Xodus in multiple servers? – quarks Aug 12 '18 at 05:12