0

I suddenly got weird error in ETL pipeline for migration from Oracle do Neo4j.

The ETL is implemented as docker-compose of 3 images:

  • Pentaho PDI
  • source Oracle image
  • target Neo4j image

The main pipeline in PDI loads data from Oracle, converts them into CSV and stores into Neo4j where these files are further processed. Since some moment, the sftp transfer of zip containing CSV files failed with following error:

2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - Started FTP job to ${remote_server}
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - ERROR (version 8.3.0.0-371, build 8.3.0.0-371 from 2019-06-11 11.09.08 by buildguy) : Error getting files from FTP : There was a problem while connecting to neo4j:22
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - ERROR (version 8.3.0.0-371, build 8.3.0.0-371 from 2019-06-11 11.09.08 by buildguy) : java.io.IOException: There was a problem while connecting to neo4j:22
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.Connection.connect(Connection.java:791)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.Connection.connect(Connection.java:563)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.entries.ftpdelete.JobEntryFTPDelete.SSHConnect(JobEntryFTPDelete.java:966)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.entries.ftpdelete.JobEntryFTPDelete.execute(JobEntryFTPDelete.java:746)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:686)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:565)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.entries.job.JobEntryJobRunner.run(JobEntryJobRunner.java:69)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at java.lang.Thread.run(Thread.java:748)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - Caused by: java.io.IOException: Key exchange was not finished, connection is closed.
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.KexManager.getOrWaitForConnectionInfo(KexManager.java:92)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.TransportManager.getConnectionInfo(TransportManager.java:230)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.Connection.connect(Connection.java:743)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  ... 14 more
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - Caused by: java.io.IOException: Cannot negotiate, proposals do not match.
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.KexManager.handleMessage(KexManager.java:413)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:754)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:469)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  ... 1 more

The error is difficult to google - there is couple of similar problems (1, 2, 3, 4, 5) yet it hardly explains the cause of sudden malfunction.

I felt it is something around ssh key exchange but I don't know ssh deep enough to understand what happened.

Tomáš Záluský
  • 10,735
  • 2
  • 36
  • 64

1 Answers1

1

My colleague later noticed that neo4j:4.2.3 image was repushed and its new version is built upon Debian Bullseye. By comparing with computer where new version of neo4j was not yet pulled we realized that version of openssh was changed from 7.9p1-10 to 8.4p1-5. (dpkg --list | grep openssh). Then we were easily able to reproduce the bug locally and prove that PDI works against old Neo4j image but fails against new image.

One option was to tweak updated Neo4j image and force downgrade openssh to previous version. This would probably work, however it would close the gate to any upgrades, patches and limited room for manoeuvre in case of any problems. Hence we decided the right solution was to upgrade client.

Our version of the Pentaho PDI (dictated btw by customer) utilizes trilead-ssh2 library build 213. Unfortunately, with newer versions (tried 217 and the newest 222) it failed too. Replacing library by Jenkins fork build 217 made ssh communication ultimately working again. It seems the essential part of success is pull request #60 which adds new KEX algorithms. The fork needs two dependencies (eddsa in central maven repo and jbcrypt which I could find neither in central nor in Spring repo but it can be found here) which must be copied into Pentaho PDI data-integration/lib directory as well.

Tomáš Záluský
  • 10,735
  • 2
  • 36
  • 64