1

I am new to Apache Pulsar and I am trying to recover from a situation. To start with I have Lucidworks Fusion running on Amazon EKS. Fusion uses Apache Pulsar as their distributed messaging platform. While upgrading the version of Fusion I seem to have corrupted ledger data and since I am unable to get Fusion running again. Indeed nearly all pods in this cluster are dependent on Pulsar, more specifically bookkeeper pods and since bookkeeper can not startup given the corrupted ledgers the pods remain in a CrashLoopBackOff state and Fusion remains down. Here is the error in the bookkeeper pod logs:

17:41:13.184 [main] INFO org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage - Creating single directory db ledger storage on data/bookkeeper/ledgers/current
17:41:13.298 [main] INFO org.apache.bookkeeper.proto.BookieNettyServer - Shutting down BookieNettyServer
17:41:13.303 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
java.io.IOException: Error open RocksDB database
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:182) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:83) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:58) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex.<init>(EntryLocationIndex.java:58) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:162) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:149) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:775) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:136) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:105) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:41) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:301) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.server.Main.doMain(Main.java:221) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.server.Main.main(Main.java:203) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:313) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
Caused by: org.rocksdb.RocksDBException: While opening a file for sequentially reading: data/bookkeeper/ledgers/current/locations/MANIFEST-000159: No such file or directory
at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-5.13.3.jar:?]
at org.rocksdb.RocksDB.open(RocksDB.java:231) ~[org.rocksdb-rocksdbjni-5.13.3.jar:?]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:179) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
... 14 more

Is there a way that I could fix this ledger data ?

If fixing this ledger data is not possible how can I clear/flush all data out of Pulsar and start fresh, re-initialized Pulsar of sorts ?

nabello
  • 716
  • 11
  • 29
  • `Caused by: org.rocksdb.RocksDBException: While opening a file for sequentially reading: data/bookkeeper/ledgers/current/locations/MANIFEST-000159: No such file or directory` - how does your bookkeeper persistent volume works? did the upgrade change anything about this? – gohm'c Apr 15 '22 at 02:21
  • @gohm'c those are great questions that I do not know the answers to. The Fusion upgrade should be recycling all pods and when the pods restart they should be using a new more recent version of the docker image than was used before. Maybe during that process Bookkeepers data was corrupted ? Not sure, it is all I can think of with my little knowledge of Apache Pulsar and specifically Apache Bookkeeper. – nabello Apr 15 '22 at 15:27

0 Answers0