My code for creating Berkeley DB file:
def create_bdb_object(filename):
bdb = bsddb3.db.DB()
bdb.set_flags(bsddb3.db.DB_DUP | bsddb3.db.DB_DUPSORT)
open_flags = bsddb3.db.DB_CREATE | bsddb3.db.DB_EXCL
if os.path.exists(filename) and is_create:
os.remove(filename)
bdb.open(filename, dbtype=bsddb3.db.DB_BTREE, flags=open_flags)
return bdb
After that, I wrote some pickled data into this file. The file creates without any problems.
Update#1: Code for writing to the file:
def write_to_the_file(filename, kv_pair_rdd):
bdb_filename = f'{filename}.new'
bdb = create_bdb_object(bdb_filename)
for url, record in kv_pair_rdd.toLocalIterator():
bdb.put(url.encode(), pickle.dumps(record, protocol=2))
bdb.close()
os.rename(bdb_filename, filename)
But when I try to read this file I get not all data from it. In the file should be 9 records, but after reading I get only 4.
When I do db_dump -p filename
I get 9 records
Code for reading data from file:
bdb = bsddb3.db.DB()
bdb.set_flags(bsddb3.db.DB_DUP | bsddb3.db.DB_DUPSORT)
bdb.open(filename)
bdb_cursor = bdb.cursor()
record = bdb_cursor.first()
while record:
print(record[0], pickle.loads(record[1]))
record = bdb_cursor.next()
bdb_cursor.close()
bdb.close()
Could anybody explain to me what I'm doing wrong, please?