2

I'm trying to copy data from a view in either on-premise PostgeSQL or MSSQL to a table in Azure PostgreSQL. I can't seem to get this to successfully copy when I map more than 6 columns from source to sink.

I suspected that one of the source columns was the issue, so varied what columns were mapped but all columns will succeed if less than 6 are copied in total. I then tried different source and sink tables, which produces the same results.

If I copy to an intermediary csv file I can import/export unlimited columns successfully.

Error output shown in the ADF console consistently is: "Operation on target Copy_7dp failed: Type=Npgsql.PostgresException,Message=08P01: invalid message format,Source=Npgsql,'"

Matt
  • 53
  • 1
  • 6

2 Answers2

2

Having had a discussion with the ADF team at Microsoft they enlightened me to the fact that Postgresql has a 16 bit limit to the number of elements that can be written at once. i.e. if (row size * column size) > 65536 it will raise this error.

The solution is to simply reduce the "Write batch size" in the Sink properties of the Copy Data activity e.g this image

Matt
  • 53
  • 1
  • 6
1

Error output shown in the ADF console consistently is: "Operation on target Copy_7dp failed: Type=Npgsql.PostgresException,Message=08P01: invalid message format,Source=Npgsql,'"

You may find some clues in this old case which is similar to your error trace. Please see the solution in above link: https://github.com/npgsql/npgsql/issues/189#issuecomment-67001166

Basically, if the string value we try to persist using Npgsql was derived from MemoryStream.GetBuffer() it will explode, but if it came from MemoryStream.ToArray() it will work fine.

It is explained in official document:

Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

However,i suspect that this is the defect of ADF Postgresql connector and we could not adjust any transfer source code at the use level.(Surely,you could submit feedback to azure adf team to get official statement)

For now,as workaround, you could use csv file to be an intermediary. Transfer data from on-premise database to csv files in Azure Blob Storage. Then transfer data into destination postgresql database.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • thanks for the response. I'd not discovered that specific npgsql issue but that seems to describe what's going on perfectly. Would assume that the 6/7 column transition causes the ADF connector to switch to GetBuffer from ToArray. I'll try and take this to the Azure team as you suggest. – Matt Oct 18 '19 at 11:44
  • @Matt You're so kind. I also would like to know that if you have any progress from azure team. Thanks a lot. – Jay Gong Oct 18 '19 at 12:54
  • 1
    CSV files with more than about 9000 rows(or 200kb size) also won't import more than 6 columns without raising the 08P01 error either. – Matt Oct 21 '19 at 14:33