15

UPDATE: 04 Jan 2015

I still have these issues. Users of our app have increased and I see all kind of network errors. Our app sends out emails everytime there is a network related error on app.

Our app does a financial transactions - so re-submits are not really idempotent - so very scared of enabling HttpClient's retry feature. we have done some kind of response caching on server to handle re-submits done explicitly by user. However, still no solution that works without bad user experience.

Original Question

I have an android app which posts data as part of user operation. The data includes few images & I package them as Protobuf message (byte array, in effect) and post it to server over HTTPS connection.

Though the app works fine for most part, but we are seeing connection errors occasionally. The issue has become more pronounced now that we have some users in relatively slow network areas (2G connections). However, the issue is not limited to slow connections areas, issue is seen with customers using WiFi and 3G connections.

Here are few exceptions we notice in our App logs

Below one happens after 5 minutes, as I had set Socket timeout to 5 minutes. The app was trying to post 145kb of data in this case

Stack trace java.net.SocketTimeoutException: Read timed out at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_read(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.read(OpenSSLSocketImpl.java:662) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:103) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:191)

Below one happened 2.5 minutes ( socket timeout was set to 5 minutes), client was sending 144kb of data

javax.net.ssl.SSLException: Write error: ssl=0x5e4f4640: I/O error during system call, Broken pipe at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_write(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLOutputStream.write(OpenSSLSocketImpl.java:704) at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:109) at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)

Below one happened after 1 minute.

Stack trace javax.net.ssl.SSLException: Connection closed by peer at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:378) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.(OpenSSLSocketImpl.java:634) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl.java:605)

Below one happened after 77 seconds

Stack trace javax.net.ssl.SSLException: SSL handshake aborted: ssl=0x5e2baf00: I/O error during system call, Connection reset by peer at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:378) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.(OpenSSLSocketImpl.java:634) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl.java:605) at org.apache.http.impl.io.SocketInputBuffer.(SocketInputBuffer.java:70)

Below one happened after 15 seconds (Connect timeout is set to 15 seconds)

Time Taken : 15081 Stack trace org.apache.http.conn.ConnectTimeoutException: Connect to /103.xx.xx.xx:443 timed out at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:121) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:144) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:164) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:119) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:365)

Here is the source code snippets that I use for posting the reqeust

HttpParams params = new BasicHttpParams();
HttpConnectionParams.setConnectionTimeout(params, 15000); //15 seconds
HttpConnectionParams.setSoTimeout(params, 300000); // 5 minutes

HttpClient client = getHttpClient(params);
HttpPost post = new HttpPost(uri);
post.setEntity(new ByteArrayEntity(requestByteArray));
HttpResponse httpResponse = client.execute(post);

    ....

public static HttpClient getHttpClient(HttpParams params) {
    try {
        KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType());
        trustStore.load(null, null);

        SSLSocketFactory sf = new TrustAllCertsSSLSocketFactory(trustStore);
        sf.setHostnameVerifier(SSLSocketFactory.STRICT_HOSTNAME_VERIFIER);


        HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
        HttpProtocolParams.setContentCharset(params, HTTP.UTF_8);

        SchemeRegistry registry = new SchemeRegistry();
        registry.register(new Scheme("http", PlainSocketFactory.getSocketFactory(), 80));
        registry.register(new Scheme("https", sf, 443));

        ClientConnectionManager ccm = new ThreadSafeClientConnManager(params, registry);
        DefaultHttpClient client = new DefaultHttpClient(ccm, params);
        // below line of code will disable the retrying of HTTP request when connection is timed
        // out.

        client.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(0, false));
        return client;
    } catch (Exception e) {
        return new DefaultHttpClient();
    }
}

I have read some forums indicating that we should use HttpUrlConnection class. I did make code changes to use https://code.google.com/p/basic-http-client/ as a hot fix. Though it worked on my Samsung phone, it seemed to have some issue in phone customer was using, it was not even able to connect to our site. I had to roll it back, though I can relook at it if the root cause can be pinned to DefaultHttpClient.

OUr web server is nginx, and our web service runs on Apache Tomcat. Customers are mostly using Android 4.1+ phones. The customer from whose phone I have retrieved above stack traces is using Micromax A110Q phone with Android 4.2.1

Any inputs on this will be highly appreciated. Thanks a lot!

Update:

  1. I had noticed that we were not shutting down the Connection Manager. So added below code in finally block of the code where I use the http client.
  if (client != null) {           client.getConnectionManager().shutdown();
  }
  1. Updated nginx configuration to accept data upto size of 5M as its default is 1Mb and some clients were submitting more than 1MB and server was severing connection with 413 error.
client_max_body_size 5M;
  1. Also increased the nginx proxy read timeout so that it waits longer for getting data from client.
proxy_read_timeout 300;

With the above changes, the errors have reduced a bit. In last one week, I see following two types of erros:

  1. org.apache.http.conn.ConnectTimeoutException: Connect to /103.xx.xx.xxx:443 timed out - This happens in 15 seconds which is my connect timeout. I am assuming that this happens as client is unable to reach to server due to network slowness or as @JaySoyer pointed out, may be due to network switching.

  2. java.net.SocketTimeoutException: SSL handshake timed out at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method). This is happening at the expiry of socket timeout. I am now using 1 minute as socket timeout for small requests, and 3 and 6 minutes for packets upto 75 KB and higher respectively.

However, these errors have reduced considerably, and I am seeing 1 failure in 100 requests, compared with earlier version of my code where it was 1 in 10 requests.

Wand Maker
  • 18,476
  • 8
  • 53
  • 87
  • 1
    that mab be sometimes since server is unstable and connention would be established..whats the max hit of ur server? – KOTIOS Aug 05 '14 at 05:23
  • The SSLException occurs when the connection was established already but during the SSL handshake the connection times out. Because the timeout occurs after the SSL handshake has started the SSLException is thrown as higher level exception. Hence i would like u to increase ur timeout more by alsmot 20-25mins ..i think that should do – KOTIOS Aug 05 '14 at 05:43
  • @adcom We are getting 1 hit every 5 to 10 minutes....we are having very less traffic currently. Hardly 25 to 30 users of our app – Wand Maker Aug 06 '14 at 06:18
  • 1
    ok try to increase ur timeouts i think it should work – KOTIOS Aug 06 '14 at 06:22
  • @adcom Yeah, I have got it reduced a bit by increasing timeouts (Read timeout & Socket timeout) - also, I noticed nginx rejecting some messages which were above 1 MB, so I increased request size to 5 MB as precaution. But still some issues are around - mainly Read timeout when client is trying to connect to server and occasional broken pipes. Thanks for your imputs. – Wand Maker Aug 06 '14 at 18:16
  • ok i would like to solve this issue , can u provide me ur apk or something – KOTIOS Aug 07 '14 at 03:29
  • @adcom Thanks for the offer, but APK in question is for a client, and we are not at liberty to share – Wand Maker Aug 07 '14 at 08:50
  • hmm i can try researching on the same – KOTIOS Aug 07 '14 at 08:59

2 Answers2

12

I recently had to do an exhaustive analysis of my company's app as we were seeing a bunch of similar errors and didn't know why. We ended up handing out custom apps that literally logged their connection times, errors, signal quality, etc to a file. Did that for weeks. Collect thousands of data points. Keep in mind, we maintain a persistent connection while the app is open.

Turns out most of our errors were from switching networks. This is actually really common for an average user. So lets say a user is using an EDGE cell network, then walks within WIFI range or vice versa. When this occurs, Android literally severs the cell connection and makes an entirely new connection to the WIFI. From the apps perspective, it's similar to turning on airplane mode then flicking it back off again. This even occurs when switching within a cell networks. Eg, LTE to HSPA+. Each time this happens, Android will fire off the network connective changed broadcast.

Of those you listed, this behavior was causing the following similar errors:

  • javax.net.ssl.SSLException: Write error: ssl=0x5e4f4640
  • javax.net.ssl.SSLException: SSL handshake aborted:

Sometimes the network switch was fast, sometimes slow. Turns out, we were not cleaning up our resources in time with the fast switches. As a result we were attempting to re-connect to our servers with stale/old TCP connections that threw even more odd errors.

So I guess the take away is, if you are maintaining a connection for a long period of time, expect to see the phone constantly switch between networks, especially when the signal is weak. When that network switch occurs, you'll see SSLExeptions and it's completely normal. Just gotta make sure you clean up your resources and reconnect properly.

Ifrit
  • 6,791
  • 8
  • 50
  • 79
  • During the connection process, any errors won't require clean up because no connection was ever made. We then have a user authentication process after a successful connection which we try...catch. On any errors thrown, to be safe we try to disconnect, and clean up. We use the SMACK library which pipes all those SLLExceptions and socket errors to one location for us. On any of those errors we try to disconnect if possible, and clean up. So I'd say where ever you are try...catching and detecting these errors is where you'll need to handle it. – Ifrit Aug 10 '14 at 12:36
  • 1
    I am also using Smack 4.1, and facing the Similar problem only in Galaxy-s4 (Android 4.4.2). Please can you elaborate, What do you mean by "clean up your resources", in terms of Smack 4.1 – shanraisshan Jun 30 '15 at 06:48
2

Since you are dealing with what looks like poor network connectivity, consider a more fault-tolerant HTTP client. The one I like is OkHTTP. From their description:

OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to SSLv3 if the handshake fails.

The implementation would be mostly a drop-in replacement.

David S.
  • 6,567
  • 1
  • 25
  • 45
  • I would disagree with you; after my app is left sitting idle for 30 mins (and presumably some network switching taking place), SSL Handshake exceptions trigger until the app is restarted or until okHTTP is reinstantiated ! see http://stackoverflow.com/q/37885391/550471 – Someone Somewhere Jul 11 '16 at 09:24