I am trying to extract data from several FoxPro tables using an open-source solution. There are several .dbf
tables and associated .cdx
indexes. The trouble I have is that two of the columns in the tables are made up of characters which are not standard characters. These columns are foreign keys to other tables. In order to extract data, I need a way/program to recognize these symbols so that joins can be performed.
It seems like an issue of finding the correct encoding for the keys. However, I have included details on my various attempts in case I'm just too far down the rabbit hole to see a better solution.
1. Python dbf
module
I have had the most luck with the python dbf
module. I can do the following and actually see the keys:
import dbf
tbl = dbf.Table('table.dbf')
tbl.open()
print(tbl[1])
0 - f_key1 : '\x00\x00\x01'
1 - f_key2: '\x00\x01'
2 - some_data1: ' 64'
3 - some_data2: ' 13'
I have tried exporting this data, but get an error:
dbf.export(tbl, filename='table.csv', format='csv', header=True)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2: character maps to <undefined>
I don't know what encoding to use.
2. pgdbf
Another approach I used was to install pgdbf
(version 0.6.2) on a Debian VirtualBox. This allowed me to convert the tables to Postgres. I used the default settings,
pgdbf -c -e -n -Q -t table.dbf > table.pgsql
It seems, however, that the foreign key columns were not converted. The columns appear blank and performing an outer join on them creates a result far larger than it should. I have the results of a join performed using FoxPro on two of the tables. One table has 650404 rows and the other 149253. The result of a full outer join on the two foreign keys produced a table of 650404.
The FoxPro code was
SELECT Table1.id1, Table2.id2;
FROM ;
3Q04!TABLE1 ;
FULL JOIN 3Q04!TABLE2 ;
ON Table1.f_key1 = Table2.f_key1;
AND Table1.f_key2 = Table2.f_key2
When I try the same join in the converted Postgres table, the VirtualBox runs out of space (generating far more than 650404 rows). This implies, it seems, that the foreign keys were not converted and the join is matching on everything.
The Postgres query was
create view t1_t2 as
select
d.id as id1
, e.id as id2
from table1 d
full outer join table2 e
on d.f_key1 = e.f_key1
and d.f_key2 = e.f_key2
;
3. pgloader
I started playing around with pgloader
, but it too does not seem able to directly handle .cdx
files\indexes.