2

In Python 3.6.3 under Anaconda I try do read a dbf file with memo data in it. The file is exported from a commercial software. I use the dbf package, version 0.97.11. The file type is:

In [10]: dbf.table_type('C:\\Users\\kmec\\Documents\\Python Scripts\\misc\\test_dbf\\RelLinks.dbf')
Out[10]: (131, 'dBase III Plus w/memos')

The file has an accompanying dbt file RelLinks.DBT (about 460 MB) located in the same folder as the dbf file. My understanding was (based on http://dbf-software.com/memo-blob.html), that the memo data is stored in the .DBT file. Therefore I assumed that the following code should work:

d=dbf.Table( 'C:\\Users\\kmec\\Documents\\Python Scripts\\misc\\test_dbf\\RelLinks.dbf', ignore_memos=False, codepage='cp852')
d.open()
rellinks = []
for record in d[:10]:
    print(record)
    rellinks.append([record.id, record.srcyear, record.srcid, record.dstyear, record.dstid])   
d.close()
rl = pd.DataFrame.from_records(rellinks, columns = ['rl_id', 'rl_srcyear', 'rl_srcid', 'rl_dstyear', 'rl_dstid'] )

However, the "out" prompt in the IPython console (under Spyder 3.3.1) never shows up so I needed to close the console.
When setting ignore_memos = True the code runs but of course the resulting columns with memo data in the data frame are empty. So is there a way to read the dbf file with memo data in this case?

Edit:

print(d) results in:

Table: C:\Users\kmec\Documents\Python Scripts\misc\test_dbf\RelLinks.dbf

    Type:          dBase III Plus
    Codepage:      ascii (plain ol' ascii)
    Status:        DbfStatus.READ_ONLY
    Last updated:  2020-06-25
    Record count:  131847
    Field count:   14
    Record length: 146
    --Fields--
      0) id N(11,0)
      1) reltype M
      2) subreltype M
      3) srcyear M
      4) srctype M
      5) srcid N(11,0)
      6) srcitemtyp M 
      7) srcitemid N(11,0) 
      8) dstyear M
      9) dsttype M
     10) dstid N(11,0)
     11) dstitemtyp M
     12) dstitemid N(11,0)
     13) mjvazba M 

print(d) before and after d.open() changes only the status from CLOSED to READ_ONLY (quite expectedly)

It turns out that the codepage is not cp852 but ascii.
After fixing this

d=dbf.Table( 'C:\\Users\\kmec\\Documents\\Python Scripts\\misc\\test_dbf\\RelLinks.dbf', ignore_memos=False, codepage='ascii') 

and executing

for record in d[:10]:
    print(record)

it freezes as before, but ctrl+C forces a keyboard interrupt with the following output:

Traceback (most recent call last):

  File "<ipython-input-24-902d684f24ef>", line 3, in <module>
    print(record)

  File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3024, in __str__
    result.append("%3d - %-10s: %r" % (seq, field, self[field]))

  File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 2956, in __getitem__
    return self.__getattr__(item)

  File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 2923, in __getattr__
    value = self._retrieve_field_value(name)

  File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3122, in _retrieve_field_value
    datum = retrieve(record_data, fielddef, self._meta.memo, self._meta.decoder)

  File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 4058, in retrieve_memo
    data = memo.get_memo(block)

  File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3607, in get_memo
    return self._get_memo(block)

  File "C:\Users\kmec\Anaconda3\lib\site-packages\dbf\__init__.py", line 3655, in _get_memo
    newdata = self.meta.mfd.read(self.meta.memo_size)

KeyboardInterrupt  
runnerup
  • 21
  • 3

1 Answers1

0

I was unable to find a problem in the dbf module. My guess is that you are either having memory issues (each memo field's data is between 3 and 4 million bytes), or you have a slow hard drive, or both.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237