In Python 3.6.3 under Anaconda I try do read a dbf file with memo data in it. The file is exported from a commercial software. I use the dbf package, version 0.97.11. The file type is:
In [10]: dbf.table_type('C:\\Users\\kmec\\Documents\\Python Scripts\\misc\\test_dbf\\RelLinks.dbf')
Out[10]: (131, 'dBase III Plus w/memos')
The file has an accompanying dbt file RelLinks.DBT (about 460 MB) located in the same folder as the dbf file. My understanding was (based on http://dbf-software.com/memo-blob.html), that the memo data is stored in the .DBT file. Therefore I assumed that the following code should work:
d=dbf.Table( 'C:\\Users\\kmec\\Documents\\Python Scripts\\misc\\test_dbf\\RelLinks.dbf', ignore_memos=False, codepage='cp852')
d.open()
rellinks = []
for record in d[:10]:
print(record)
rellinks.append([record.id, record.srcyear, record.srcid, record.dstyear, record.dstid])
d.close()
rl = pd.DataFrame.from_records(rellinks, columns = ['rl_id', 'rl_srcyear', 'rl_srcid', 'rl_dstyear', 'rl_dstid'] )
However, the "out" prompt in the IPython console (under Spyder 3.3.1) never shows up so I needed to close the console.
When setting ignore_memos = True
the code runs but of course the resulting columns with memo data in the data frame are empty.
So is there a way to read the dbf file with memo data in this case?
Edit:
print(d)
results in:
Table: C:\Users\kmec\Documents\Python Scripts\misc\test_dbf\RelLinks.dbf
Type: dBase III Plus
Codepage: ascii (plain ol' ascii)
Status: DbfStatus.READ_ONLY
Last updated: 2020-06-25
Record count: 131847
Field count: 14
Record length: 146
--Fields--
0) id N(11,0)
1) reltype M
2) subreltype M
3) srcyear M
4) srctype M
5) srcid N(11,0)
6) srcitemtyp M
7) srcitemid N(11,0)
8) dstyear M
9) dsttype M
10) dstid N(11,0)
11) dstitemtyp M
12) dstitemid N(11,0)
13) mjvazba M
print(d)
before and after d.open()
changes only the status from CLOSED to READ_ONLY (quite expectedly)
It turns out that the codepage is not cp852 but ascii.
After fixing this
d=dbf.Table( 'C:\\Users\\kmec\\Documents\\Python Scripts\\misc\\test_dbf\\RelLinks.dbf', ignore_memos=False, codepage='ascii')
and executing
for record in d[:10]:
print(record)
it freezes as before, but ctrl+C forces a keyboard interrupt with the following output:
Traceback (most recent call last):
File "<ipython-input-24-902d684f24ef>", line 3, in <module>
print(record)
File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3024, in __str__
result.append("%3d - %-10s: %r" % (seq, field, self[field]))
File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 2956, in __getitem__
return self.__getattr__(item)
File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 2923, in __getattr__
value = self._retrieve_field_value(name)
File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3122, in _retrieve_field_value
datum = retrieve(record_data, fielddef, self._meta.memo, self._meta.decoder)
File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 4058, in retrieve_memo
data = memo.get_memo(block)
File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3607, in get_memo
return self._get_memo(block)
File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3655, in _get_memo
newdata = self.meta.mfd.read(self.meta.memo_size)
KeyboardInterrupt