2

I understand that Apache Apex runs on Hadoop and YARN. Does it utilize HDFS for persistence and replication to protect against data loss? or does it have its own?

Community
  • 1
  • 1

2 Answers2

2

Apache Apex uses checkpointing of operator state for fault tolerance. Apex uses HDFS to write these checkpoints for recovery. However, the store for checkpointing is configurable. Apex also has an implementation to checkpoint to Apache Geode. Apex also uses HDFS to upload artifacts such application package containing the application jar, its dependencies and configurations etc that are needed to launch the application.

ashwin111
  • 146
  • 1
  • 4
1

Apache Apex does not have it's own file system. Streaming applications written with Apex will, by default, use HDFS for checkpointing, persistence and for saving application specific data.

PradeepKumbhar
  • 3,361
  • 1
  • 18
  • 31