0

I run many jobs on Flink,and backend use rocksDB, one of my job got error and restart all night,

error message like :

java.lang.IllegalStateException: Could not initialize keyed state backend.
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:330)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:221)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:679)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:666)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:708)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error while opening RocksDB instance.
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.openDB(RocksDBKeyedStateBackend.java:1063)
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.access$3300(RocksDBKeyedStateBackend.java:128)
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restoreInstance(RocksDBKeyedStateBackend.java:1472)
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restore(RocksDBKeyedStateBackend.java:1569)
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:996)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:775)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:319)
Caused by: org.rocksdb.RocksDBException: Corruption: Sst file size mismatch: /mnt/dfs/3/hadoop/yarn/local/usercache/sloth/appcache/application_1526888270443_0002/flink-io-84ec9962-f37f-4fbc-8262-a215984d8d70/job-1a72a5f09ac8a80914256306363505aa_op-CoStreamFlatMap_1361_4_uuid-0b019d7f-2d28-44dc-baf2-12774ed3518f/db/008919.sst. Size recorded in manifest 132174005, actual size 2674688
Sst file size mismatch: /mnt/dfs/3/hadoop/yarn/local/usercache/sloth/appcache/application_1526888270443_0002/flink-io-84ec9962-f37f-4fbc-8262-a215984d8d70/job-1a72a5f09ac8a80914256306363505aa_op-CoStreamFlatMap_1361_4_uuid-0b019d7f-2d28-44dc-baf2-12774ed3518f/db/008626.sst. Size recorded in manifest 111956797, actual size 14286848
Sst file size mismatch: /mnt/dfs/3/hadoop/yarn/local/usercache/sloth/appcache/application_1526888270443_0002/flink-io-84ec9962-f37f-4fbc-8262-a215984d8d70/job-1a72a5f09ac8a80914256306363505aa_op-CoStreamFlatMap_1361_4_uuid-0b019d7f-2d28-44dc-baf2-12774ed3518f/db/008959.sst. Size recorded in manifest 43157714, actual size 933888
    at org.rocksdb.TtlDB.openCF(Native Method)
    at org.rocksdb.TtlDB.open(TtlDB.java:132)
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.openDB(RocksDBKeyedStateBackend.java:1054)
    ... 12 more

when I found this, it kill it manually, And start it again.Then it work well.

How this error happens,I can't find any message from google or somewhere

spoon
  • 41
  • 5

1 Answers1

0

I found the error before this exception.

No space left on device

I think this problem cost this question

spoon
  • 41
  • 5