Currently our system (Client) on AWS EC2 instance encountered a sftp problem when trying to send a large file to a customer remote sftp server (“160.xxx.xxx.35.bc.googleusercontent.com). For the small sized files, the sftp transfer works fine but when the file size is about or larger than 1GB, we found that only 1068392448 bytes were transferred to Server sftp site. However, when we send the same large file with the same code and same environments to our own non-googleusercontent remote sftp server (with only the URL/UserName/Password differ), it was successful and all data transferred correctly.
This problem happened after the customer server side made some changes by adding the Load Balancer process. The customer server side did some investigation, adjusted the load balancer timeout but it did not help to resolve this issue. It is said the Client side stopped the data transfer after 1068392448 bytes and Server side waited and dropped the connection after the allowed idle time (~50second).
Our investigation noticed that the large source file was read from AWS S3 and saved to local correctly. When the large file data is written to Server sftp site reach 1068392448 bytes (consistently for all test results), the TCP socket connection status changed from ESTABLISHED to CLOSE_WAIT after Server allowed idle time (about ~50s). The process stayed in this status forever until it is manually stopped/killed. When the TCP socket connection is in CLOSE_WAIT status, the data transfer process showed in dump is in a wait loop at awaitSpace() method (in PipedInputStream class of java.io package). The buffer is indicated to be full and waiting to be written to the Server side. Below is the codes of the wait loop:
private void checkStateForReceive() throws IOException {
if (!connected) {
throw new IOException("Pipe not connected");
} else if (closedByWriter || closedByReader) {
throw new IOException("Pipe closed");
} else if (readSide != null && !readSide.isAlive()) {
throw new IOException("Read end dead");
}
}
private void awaitSpace() throws IOException {
while (in == out) {
checkStateForReceive();
/* full: kick any waiting readers */
notifyAll();
try {
wait(1000);
} catch (InterruptedException ex) {
throw new java.io.InterruptedIOException();
}
}
}
Below is the dump:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode):
"Connect thread files.liveramp.com session" #13 daemon prio=5 os_prio=0 tid=0x000000001e711800 nid=0x63c0 in Object.wait() [0x000000001f81f000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:273)
at java.io.PipedInputStream.receive(PipedInputStream.java:231)
- locked <0x00000006c1c944c8> (a com.jcraft.jsch.Channel$MyPipedInputStream)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at com.jcraft.jsch.IO.put(IO.java:64)
at com.jcraft.jsch.Channel.write(Channel.java:438)
at com.jcraft.jsch.Session.run(Session.java:1459)
at java.lang.Thread.run(Thread.java:745)
"org.apache.commons.vfs2.cache.SoftRefFilesCache$SoftRefReleaseThread" #11 daemon prio=5 os_prio=0 tid=0x000000001ea2d000 nid=0x3078 in Object.wait() [0x000000001efcf000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000006c1c24710> (a java.lang.ref.ReferenceQueue$Lock)
at org.apache.commons.vfs2.cache.SoftRefFilesCache$SoftRefReleaseThread.run(SoftRefFilesCache.java:74)
"Service Thread" #10 daemon prio=9 os_prio=0 tid=0x000000001dfc5000 nid=0x1944 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread2" #9 daemon prio=9 os_prio=2 tid=0x000000001df12800 nid=0x6848 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" #8 daemon prio=9 os_prio=2 tid=0x000000001df07800 nid=0x5720 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #7 daemon prio=9 os_prio=2 tid=0x000000001df04800 nid=0x6358 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Monitor Ctrl-Break" #6 daemon prio=5 os_prio=0 tid=0x000000001df56000 nid=0x6910 runnable [0x000000001e1cf000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
- locked <0x00000006c1c24f88> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
- locked <0x00000006c1c24f88> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:64)
"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x000000001c69e800 nid=0x4880 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x000000001c69d800 nid=0x2534 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x000000001c67e800 nid=0x4908 in Object.wait() [0x000000001d9df000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006c1c26660> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000006c1c26660> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x00000000030b5000 nid=0x6340 in Object.wait() [0x000000001d8df000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006c1c26818> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000006c1c26818> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x0000000002fc4000 nid=0x56dc in Object.wait() [0x0000000002e0f000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at com.jcraft.jsch.Session.write(Session.java:1269)
- locked <0x00000006c1c943d0> (a com.jcraft.jsch.ChannelSftp)
at com.jcraft.jsch.ChannelSftp.sendWRITE(ChannelSftp.java:2646)
at com.jcraft.jsch.ChannelSftp.access$100(ChannelSftp.java:36)
at com.jcraft.jsch.ChannelSftp$1.write(ChannelSftp.java:806)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
- eliminated <0x00000006c1c27a88> (a org.apache.commons.vfs2.provider.sftp.SftpFileObject$SftpOutputStream)
at org.apache.commons.vfs2.util.MonitorOutputStream.write(MonitorOutputStream.java:123)
- locked <0x00000006c1c27a88> (a org.apache.commons.vfs2.provider.sftp.SftpFileObject$SftpOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
- eliminated <0x00000006c1c26a50> (a org.apache.commons.vfs2.provider.DefaultFileContent$FileContentOutputStream)
at org.apache.commons.vfs2.util.MonitorOutputStream.write(MonitorOutputStream.java:123)
- locked <0x00000006c1c26a50> (a org.apache.commons.vfs2.provider.DefaultFileContent$FileContentOutputStream)
at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:805)
at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:784)
at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:755)
at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:771)
at org.apache.commons.vfs2.FileUtil.copyContent(FileUtil.java:37)
at org.apache.commons.vfs2.provider.AbstractFileObject.copyFrom(AbstractFileObject.java:295)
at com.merkleinc.dat.sftptran.cli.SftpVfs2App.main(SftpVfs2App.java:88)
"VM Thread" os_prio=2 tid=0x000000001c657000 nid=0x28b4 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x0000000002fda800 nid=0x2b54 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x0000000002fdc000 nid=0x4528 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x0000000002fdd800 nid=0x4850 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x0000000002fdf800 nid=0x7f0 runnable
"VM Periodic Task Thread" os_prio=2 tid=0x000000001e023000 nid=0x5d58 waiting on condition
JNI global references: 225
Heap
PSYoungGen total 49664K, used 14491K [0x000000076b400000, 0x000000076ed00000, 0x00000007c0000000)
eden space 49152K, 29% used [0x000000076b400000,0x000000076c21efd0,0x000000076e400000)
from space 512K, 6% used [0x000000076e500000,0x000000076e508000,0x000000076e580000)
to space 4608K, 0% used [0x000000076e880000,0x000000076e880000,0x000000076ed00000)
ParOldGen total 175104K, used 2879K [0x00000006c1c00000, 0x00000006cc700000, 0x000000076b400000)
object space 175104K, 1% used [0x00000006c1c00000,0x00000006c1ecfc70,0x00000006cc700000)
Metaspace used 11230K, capacity 11448K, committed 11648K, reserved 1058816K
class space used 1265K, capacity 1318K, committed 1408K, reserved 1048576K
Currently, we are using com.github’s vfs-s3 Version 2.4.2 with related apache.commons vfs2 version 2.1. and com.jcraft’s jsch version 0.1.54 for sftp data transfer. Our basic codes are as below:
import org.apache.commons.vfs2.*;
Public void upload(String sourceUri, String targetUri) throws IOException, URISyntaxException {
StandardFileSystemManager fsManager = new StandardFileSystemManager();
fsManager.init();
try (FileObject sourceFile = getFileObject(sourceUri, fsManager);
FileObject remoteFile = fsManager.resolveFile(createConnectionString(targetUri), createOptions())) {
// Copy s3 file to sftp server
remoteFile.copyFrom(sourceFile, Selectors.SELECT_SELF);
} finally {
fsManager.close();
}
};
We tried to test the same code in the local Window environments. Occasionally we could see the same problem symptoms as in ec2 instance but most time the problem is not there (inconsistent). We tested on other linux system to send the large file to the customer server sftp site and there was no problem.
We have tried to upgrade to com.github.abashev’s vfs-s3 Version 3.0.0 and 4.0.0 with corresponding versions of vfs2 on AWS EC2 instance. However, it reproduced the same problem results for the same large files. Tried to set loadOpenSSHConfig to be true with proper config file values to keep the connection alive, but it did not help for this situation.
We tried to test sftp directly by sftp "put" command with large file to the remote server from the EC2 instance (and other Windows/Linux platforms) and the large file data transfer is always successful.
The question is where is the potential root cause of the problem? Why the server side stopped receiving the data after 1068392448 bytes or why the rest of the data could not be sent to the server side? Any hard limits on our EC2 environments blocked the data transfer operation (we tried to check a few limits but not still not clear)? Or the Server site incorrectly sent the connection close “FIN” request to Client site (how to prove it)? Thanks for any suggestions for the potential solution.