1

I'm trying to export the GeoTools HSQL 2 database and load it back into HSQL 1 for a legacy system that needs the older database format. The tables include characters like the degree symbol. However, it's coming out as the escape sequence \u0080 rather the encoded character. I need to either fix that or have HSQL 1 import convert the escaped characters back into the correct encoding.

e.g.

cp modules/plugin/epsg-hsql/src/main/resources/org/geotools/referencing/factory/epsg/EPSG.zip /tmp
cd /tmp
unzip EPSG.zip
java -jar hsqldb-2.4.1.jar 
# For the file, put jdbc:hsqldb:file:/tmp/EPSG
SELECT 'epsg-dump'

And in the results I see things like this \u00b5:

INSERT INTO EPSG_ALIAS VALUES(389,'epsg_unitofmeasure',9109,7302,'\u00b5rad','')

Looking into hsqldb, I'm not sure how to control the encoding the of the data being written, assuming that this is the correct location to look:

https://github.com/ryenus/hsqldb/blob/master/src/org/hsqldb/scriptio/ScriptWriterText.java

Kurt Schwehr
  • 2,638
  • 3
  • 24
  • 41

2 Answers2

2

You can use the following procedure:

  1. In the source database, create TEXT tables with exactly the same columns as the original tables. Use CREATE TEXT TABLE thecopyname (LIKE thesourcename) for each table.
  2. Use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8' for each of the copy tables.
  3. INSERT into each thecopyname table with SELECT * FROM thesourcename.
  4. Use SET TABLE thecopyname SOURCE OFF for each thecopyname
  5. You will now have several thecopyname.csv files (each with its own name) with UTF8 encoding.
  6. Use the reverse procedure on the target database. You need to explicity create the TEXT tables then use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8'
fredt
  • 24,044
  • 3
  • 40
  • 61
  • I'm having trouble with the mechanics of doing this from the UI, so I've asked a follow up question here: https://stackoverflow.com/questions/54357312/how-to-do-multiple-commands-one-execute-in-the-hsqldb-gui – Kurt Schwehr Jan 25 '19 at 00:19
1

The encoding looks like Unicode (one to four hex digits). Try this in bash (quick & dirty):

echo -ne "$(< dump.sql)" > dump_utf8.sql

Freddy
  • 4,548
  • 1
  • 7
  • 17
  • Can you explain that please? – Bsquare ℬℬ Jan 20 '19 at 13:10
  • `$(< dumpl.sql)` reads the sql dump (bash command substitution, same as `$(cat dump.sql)`). The output is echoed interpreting the unicode backslash escaped characters (option -e) and without a trailing newline (option -n). The echoed result is then redirected to file dump_utf8.sql. I'm assuming your system is running in default locale UTF-8, to verify it use `echo $LANG` in your bash shell, thus resulting in a valid UTF-8 encoded sql dump. – Freddy Jan 20 '19 at 19:03
  • This is a very cool trick, but it gets tripped up with \u000a getting turned into a new line. – Kurt Schwehr Jan 22 '19 at 21:17