2

I tried to load a QVD file to pandas dataframe using this tool as given in the below script. The problem is that it works perfectly but it's not optimized plus it provides only a way to get rows by index which is why I was forced to use a for-loop.

As result, as the number of rows increases so is the complexity. I found the qvd.getRow() function results that complexity but I couldn't find any other way to parse the QVD file. I'm looking for such tool but more efficient, especially in time as I'm dealing with some files with ~1M records.


import qvdfile.qvdfile 
import pandas as pd 

qvd = qvdfile.QvdFile ("file.qvd")

df = pd.DataFrame(columns=qvd.getRow(0).keys())
cols = list(qvd.getRow(0).keys())

for r in range(int(qvd.attribs["NoOfRecords"])):
    df = pd.concat([df, pd.DataFrame([qvd.getRow(r)], columns=cols)], ignore_index=True)

  • pd.concat is a terribly slow approach..if your going this route, you're best to make//append to a dictionary or list and at the very end, construct a dataframe from it – Derek Eden May 15 '21 at 05:02
  • Yes you are right, as compared to `append` it's less efficient but the problem comes from that function `qvd.getRow()`. I tried to iterate it alone and it results in almost the same complexity as I got before. – Mahery Ranaivoson May 15 '21 at 05:06
  • True the github page says it was coded inefficiently on purpose for simplicity, and to contact the package author for more info on more efficient methods. – Derek Eden May 15 '21 at 05:31
  • Yes, it's done. Thanks for your time @derek. – Mahery Ranaivoson May 15 '21 at 05:49

1 Answers1

5

I think this project should fix your performance issue: https://pypi.org/project/qvd/

I was able to read 750k rows, 55 columns in about 15 seconds.

pip install qvd

from qvd import qvd_reader

df = qvd_reader.read('test.qvd')
print(df)
Jérôme
  • 142
  • 2
  • 8