2

While going through ElasticSearch's definitive guide, I stumble upon some mystery. It is first established that search is near real-time, since changes need to be refreshed as a new segment into the filesystem cache (by default every second) only after which it can be seen by the search mechanism, and no fsync is used because it would be too costly.

Then comes along the translog. For some reason, it CAN be used to have real-time CRUD. So the engine first goes through all segments it knows about in the filesystem cache, and adds the changes it finds in the translog. If the translog can be kept up-to-date in realtime, what's the inherent issue in keeping segments up-to-date in realtime? Is it to prevent too many segments in the cache?

Additionally, why can the translog be fsynced every 5 seconds by default with no problems, while segments can't?

user1610325
  • 313
  • 1
  • 9

1 Answers1

0

Segments are immutable. They are never updated but rather merged with other segments to make bigger segments. By having immutable segments, ElasticSearch offloads caching to the OS via page/file caching.

The translog acts as an append only buffer that gets promoted to a persisted segment once it's flushed.

Andrew White
  • 52,720
  • 19
  • 113
  • 137
  • Thanks, but I got there. The thing is, the translog does get flushed (and thus made into a segment) every 5 seconds by default. Why is that different from the original issue it's trying to solve? – user1610325 Apr 26 '15 at 20:26
  • I'm sorry, I guess I don't understand your question then. – Andrew White Apr 26 '15 at 20:49