3

I am a former Digital Equipment Co engineer (worked with Rdb/RTR/VMS on lots of very big systems and have 10 years of distributed systems processing experience in C years ... but quite a long time ago).

I am looking for advice on what this error means in practice, as I go about diagnosing the problem. Here is the Postgres error message

ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x62 0x40

Environment MAC Maveriks XOS, GNU C, Xcode, Postgres 9.3. (server side) with libpq

I am new to Postgres but wrote the C query set to pull all of the metadata about user tables, columns, data types, lengths, ordinal positions out of the Postgres keep them in memory in a catalogue of my own and generate all simple user table queries dynamically.

This query made it past the query preparation call.

Writing defensively I checked twice for errors:

 if ((res = PQprepare(db, statement_name, insert_query, data->nParam, NULL)) == NULL)
     Dbms_Crash(db, NULL, "Dbms_insert() PQprepare returned NULL");

 if (PQresultStatus(res) != PGRES_COMMAND_OK)
     Dbms_Crash(db, res, "Dbms_insert() Failed");

 PQclear(res);

Here is my parser dump of my generated query

insert_query:
INSERT INTO image_metadata (latitude,longitude,altitude,filename,utc_datetime)
VALUES ($1::double precision,$2::double precision,$3::double precision,$4::character varying,$5::timestamp without time zone);

Here is the call that fails:

    res = PQexecPrepared(db,
                         statement_name,
                         data->nParam,
                         (const char* const *)data->ptParam[i],
                         data->pdlParam[i],
                         data->pParamfmt,
                         PGFORMAT_STRING);

ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x62 0x40

I got past a problem with the date format in utc_datetime where Postgres expects YYYY-MM-DD HH:MM:SS for OSI universal coordinated time dates and I was feeding it YYYY:MM:DD and this one error became the next. (So my array indirection, indexing etc is working as Postgres printed the offending date, of the fifth parameter)

I am writing the obvious parameter dumper to look at what I am feeding it, but I know it got the date right when the error went away when I fixed the date format.

Is the UTF8 message coming from the filename string or one of the double precision fields?
Is passing binary a bad idea and just passing strings likely to cause less problems?

My schema will eventually have say 30 tables and this type of error in memory alignment on the server side is very worrying. My aim is to write zero SQL code.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • What string did you pass as a _filename_ parameter? – klin May 10 '14 at 23:55
  • So I just succeeded in loading 15 rows the UTF8 error is intermittent.The filename is correct and so is the timestamp when I look from PGAdmin. The double precisions are crap for all three altitude, longitude and latitude. So I'll look there. My dumper came up withbms_Diagnostics dump parameter row for row: [0] Columnname [latitude] parameter length: 8 Columnname [longitude] parameter length: 8 Columnname [altitude] parameter length: 8 Columnname [filename] parameter length: 94 Columnname [utc_datetime] parameter length: 19 – user3624412 May 11 '14 at 01:15
  • What is the struct definition of "data" `data->ptParam[]` should be an array of pointers to pointers to string, which looks awkward. Maybe omit the `[i]` indexing ? – wildplasser May 11 '14 at 10:42
  • Its a char ***, as I am proceessing sets, not rows and not tuple elements. Sets are the fundemental abstraction of Relational Databases. Such a shame that langauge folks rule the day, and we get so fixated on syntax over sematics. i.e. most of the code I am writing is data conversation... Exif to Core Foundations, to C to Postgress. I come from a time when Jim Gray and Phil Bernstein where inventing the physical OLTP performance at DEC. At the firm where I learned software engineering we wrote our own SQL complier from lex and yacc into the BLR binary request langauge ... Rdb didn't have SQL. – user3624412 May 12 '14 at 19:23
  • To follow up on the comment of sets, if you make them self describing, and encapsulated, and mapped to contiguous virtual memory (we wrote a buddy heap manaager ... Knuth which is perfect for sets), you can ship generically them as one thing from the DMBS accross the network to an application and vice versa as one thing (by name)! The dbms/langauge/network translation becomes trivial and generic. Its great for running financial markets, (orders, trades and quotes) The power of semantics... versus artifical complexity of languages .. xml etc. This all came about when OO databases died. – user3624412 May 12 '14 at 19:40

1 Answers1

2

In your PQprepare() function the last parameter is NULL. That implies that all query parameters are in text format. If you passed some of them as binary, you sent just a garbage. My advice is to use text format for parameters in prepared queries.

klin
  • 112,967
  • 15
  • 204
  • 232
  • Fix done, thanks again. I noticed SQLCA and SQLDA structures in Postgres in the system tables, but they are not exposed by the libpq interface. If there is an access layer to this you can force memory alignment and drop C structure templates over the contiguous memory blocks annd solve a lot of problems in both data definition and mapping. Also cut memory allocations from gizzions to one or two calls. I have used this before to generate 70% of the data access layer of schemas with hundreds of tables with no impedance mismatch. I guess I could go look at the Postgres source code. – user3624412 May 11 '14 at 04:22
  • 1
    Please do. If you spot room for improvement in specific areas and can offer detailed proposals *please* let pgsql-hackers know. Patches are even better, but don't spend tons of time on a patch before raising it for discussion; sometimes things aren't done a particular way for a reason. Also, I suggest you check out `libpqtypes` (which I'd rather like to bring into `libpq` proper), it can make things easier. – Craig Ringer May 11 '14 at 04:47