QVD file to pandas DataFrame

Question

I tried to load a QVD file to pandas dataframe using this tool as given in the below script. The problem is that it works perfectly but it's not optimized plus it provides only a way to get rows by index which is why I was forced to use a for-loop.

As result, as the number of rows increases so is the complexity. I found the qvd.getRow() function results that complexity but I couldn't find any other way to parse the QVD file. I'm looking for such tool but more efficient, especially in time as I'm dealing with some files with ~1M records.


import qvdfile.qvdfile 
import pandas as pd 

qvd = qvdfile.QvdFile ("file.qvd")

df = pd.DataFrame(columns=qvd.getRow(0).keys())
cols = list(qvd.getRow(0).keys())

for r in range(int(qvd.attribs["NoOfRecords"])):
    df = pd.concat([df, pd.DataFrame([qvd.getRow(r)], columns=cols)], ignore_index=True)

pd.concat is a terribly slow approach..if your going this route, you're best to make//append to a dictionary or list and at the very end, construct a dataframe from it — Derek Eden, May 15 '21 at 05:02
Yes you are right, as compared to `append` it's less efficient but the problem comes from that function `qvd.getRow()`. I tried to iterate it alone and it results in almost the same complexity as I got before. — Mahery Ranaivoson, May 15 '21 at 05:06
True the github page says it was coded inefficiently on purpose for simplicity, and to contact the package author for more info on more efficient methods. — Derek Eden, May 15 '21 at 05:31

score 5 · Accepted Answer · answered Nov 25 '21 at 13:55

5

I think this project should fix your performance issue: https://pypi.org/project/qvd/

I was able to read 750k rows, 55 columns in about 15 seconds.

pip install qvd

from qvd import qvd_reader

df = qvd_reader.read('test.qvd')
print(df)

answered Nov 25 '21 at 13:55

Jérôme

142
2
8

Thanks for your help. This is what I'm looking for. I was able to read 1M+ in less than 10 secs. – Mahery Ranaivoson Nov 28 '21 at 06:43

QVD file to pandas DataFrame

1 Answers1

Linked