0

TLDR: How do I inspect \ browse an LMDB binary file?

Complete py n00b here. I've just had an LMDB file dumped in my lap to analyze for data errors that are causing bugs in downstream software. I don't know the data schema and it's about 1 Gb in size. I've spent about an hour looking for a Q&D way to use jupyter and pandas to browse the file without knowing the schema but no joy.

What is the shortest way to do this? A link would be fine.

Ophir Yoktan
  • 8,149
  • 7
  • 58
  • 106
empty
  • 5,194
  • 3
  • 32
  • 58
  • 1
    Do you have experience with other programming languages, or are you new to programming in general? Also, is the LMDB file a text file or something binary looking? If it is text and you provide a snippet, perhaps someone can help answer the question: "How to load this file into pandas?" – Gordon Bean Aug 08 '16 at 19:53
  • I have a _lot_ of programming experience, just not in Python. The LMDB is binary. @gordonbean – empty Aug 08 '16 at 20:20
  • If the LMDB is binary and you don't know the schema, you're in a pickle. You need to know something about how to interpret the bytes in order to load the file - pandas can't figure that out for you. However, if you know something about the schema, you may be able to piece together sufficient information to solve your problem. Do you have ANY information about what is in this file? Is it a table? Do you know any fields? Do you know any of the data that should be present (i.e. the first entry should have "foobar" for the name field)? – Gordon Bean Aug 08 '16 at 20:36

1 Answers1

0

lmdb is a type of embedded key value store:

you can use this package to read the db either by specific keys, or by iterating over it. note that it's common practice that also the values themselves are binary serialized objects - you'll have to inspect them to see how they are formated

Sep
  • 347
  • 3
  • 13
Ophir Yoktan
  • 8,149
  • 7
  • 58
  • 106