What encoding to use to represent DBF (CDX) indexes

Question

I am trying to extract data from several FoxPro tables using an open-source solution. There are several .dbf tables and associated .cdx indexes. The trouble I have is that two of the columns in the tables are made up of characters which are not standard characters. These columns are foreign keys to other tables. In order to extract data, I need a way/program to recognize these symbols so that joins can be performed.

It seems like an issue of finding the correct encoding for the keys. However, I have included details on my various attempts in case I'm just too far down the rabbit hole to see a better solution.

1. Python dbf module

I have had the most luck with the python dbf module. I can do the following and actually see the keys:

import dbf
tbl = dbf.Table('table.dbf')
tbl.open()
print(tbl[1])

  0 - f_key1 : '\x00\x00\x01'
  1 - f_key2: '\x00\x01'
  2 - some_data1: ' 64'
  3 - some_data2: ' 13'

I have tried exporting this data, but get an error:

dbf.export(tbl, filename='table.csv', format='csv', header=True)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2: character maps to <undefined>

I don't know what encoding to use.

2. pgdbf

Another approach I used was to install pgdbf (version 0.6.2) on a Debian VirtualBox. This allowed me to convert the tables to Postgres. I used the default settings,

pgdbf -c -e -n -Q -t table.dbf > table.pgsql

It seems, however, that the foreign key columns were not converted. The columns appear blank and performing an outer join on them creates a result far larger than it should. I have the results of a join performed using FoxPro on two of the tables. One table has 650404 rows and the other 149253. The result of a full outer join on the two foreign keys produced a table of 650404.

The FoxPro code was

SELECT Table1.id1, Table2.id2;
FROM ;
3Q04!TABLE1 ;
FULL JOIN 3Q04!TABLE2 ;
ON  Table1.f_key1 = Table2.f_key1;
AND  Table1.f_key2 = Table2.f_key2

When I try the same join in the converted Postgres table, the VirtualBox runs out of space (generating far more than 650404 rows). This implies, it seems, that the foreign keys were not converted and the join is matching on everything.

The Postgres query was

create view t1_t2 as
  select
      d.id as id1
    , e.id as id2
  from table1 d
  full outer join table2 e
  on d.f_key1 = e.f_key1
  and d.f_key2 = e.f_key2
;

3. pgloader

I started playing around with pgloader, but it too does not seem able to directly handle .cdx files\indexes.

Why wouldn't you simply use VFPOLEDB and get your data? You are not explaining what really you are trying to do. If you need to do this from a platform other than windows, you could use say Go as a language and "github.com/mattn/go-adodb" package. Or ask someone with VFP, that you could share the data, to give you the data, say, as an SQLite database. VFP data is UTF8. However, VFP has the ability to store any character -including char 0- in char fields. That might be your problem and you could get the data by casting to blob as binary. — Cetin Basoz, Jun 07 '18 at 15:04
I am the author of [`dbf`](https://pypi.org/project/dbf/). I'd be happy to look at some of those files and attempt a solution. My email address can be found at that page. — Ethan Furman, Jun 07 '18 at 16:07

What encoding to use to represent DBF (CDX) indexes

0 Answers0