0

We have a robotic coffee barista deployed at the Dell campus in North Austin, with internet service through Time Warner Cable.

We have been fighting intermittent connection losses between 10PM - 1AM for 3 of the past 5 nights. Time Warner has been analyzing and has issued two tech dispatches. From their end, everything looks perfect.

I am a programmer and we are a startup, so I am nowhere near a networking expert, but I'm the only one even close to capable of figuring this problem out.

Last night I observed the following:

OSX Yosemite Laptop A Connected To Internet Directly Through Cable Modem

  • 20% - 30% packet loss pinging our us-east EC2 instance.
  • 0% packet loss pinging google.com.
  • 0% packet loss pinging wired.com.

iPhone B Connected To Internet Via LTE

  • 0% packet loss all around (one little blip of 2.2% packet loss on 45 packets to google, but every other test to google and wired and our server showed 0%).

Windows 7 Laptop C Connected To Internet through iPhone B's LTE

  • 0% packet loss all around.

Laptop A Through My Home Time Warner Connection

  • 0% packet loss all around.

Ubuntu 14.04.03 Laptop D Through my Home Time Warner Connection

  • 0% packet loss all around.

I know internet routes vary packet-to-packet, but I do not have the networking experience to explain this, let alone solve it. Since the whole point of the business is ordering coffee remotely via phone, our internet connection is our lifeline. We have been engaging our cellular backup as needed, so our customers are unaffected, but I need to figure this out, or hire someone who can help figure it out.

I read that there was a major AWS outage this weekend. We have applied for business support from AWS, but I'm suspicious that will not bear fruit.

Rjak
  • 143
  • 1
  • 1
  • 7
  • Perform a long-running traceroute over the problematic connection to your EC2 endpoint and post the results. This may be able to give you a hint as to where the packet loss is happening. – EEAA Sep 22 '15 at 16:55
  • @EEAA Thank you. I will run one tonight when the problem is bound to come up again (right now we have a perfect connection, so there's probably no point?) – Rjak Sep 22 '15 at 19:12

1 Answers1

0

It's possible it could be a combo issues with the way the Time Warner lines run and all the construction they're doing down Parmer and Dessau. I'm literally right down the road from you and have had odd connection problems with Time Warner around those same times. But the fact that you're not dropping any packets from the same machine to anywhere but the ec2 instance is weird. Were those continuous pings?

Also, ever since I switched to TDub's Arris DG1670A modem I've had ridiculous wifi issues with yosemite, and have oddly found the modem stuck in a continuous update cycle on more than one occasion. If you're hardwired, nix the yosemite wiffy problems, but I'd also randomly check the firmware landing page to make sure it's not stuck in an update while the issues are occurring.

What EEAA suggested is definitely a good step in the right direction to figuring it out.

My EC2 instances are in Oregon and I haven't had any problems there, just for sanity's sake, if it's not too much of a hassle, you may try snapshotting your instance and moving it temporarily just to cross another possibility off the list without having to wait on Amazon.

Seems like moving regions is more difficult, but at least to a different data center on the east coast

http://www.serverwatch.com/server-tutorials/moving-ec2-instances-across-availability-zones-or-aws-regions.html

Mike Padg
  • 11
  • 3
  • I will absolutely try that out and post the results here. I will have to do it during non-business hours. Thank you VERY much! – Rjak Sep 22 '15 at 19:13
  • Sorry: to answer your question, I was running pings of 50-100 packets and then stopping in order to see the packet loss line. I will try EEAA's suggestion tonight. I have never really understood traceroute's output, but I have been doing a ton of reading the past day. – Rjak Sep 22 '15 at 19:15
  • With traceroute, you're really just looking to see where/if the connection is timing out. 1st hop should be your local gateway address 192. or 10. it should never stall on that hop or it's something right there you've got control over. Next should be something like tge3-2.ausutxla02h.texas.rr.com that's Time Warner's Austin NOC. I believe you'll get 2 or 3 hops with rr.com at the end. Those should still be in Time Warner's network. If it's timing out there, it's an issue on their end. If it moves past them fine, chances are you're gonna start timing out toward Amazon's end, last several hops. – Mike Padg Sep 22 '15 at 20:11
  • Last night I set up AWS servers in Oregon and California so that I could do ping and traceroute tests. Unfortunately (fortunately!!), we did not experience any connection outage last night. I will keep trying the next few nights and will update here. Thank you, everyone! – Rjak Sep 23 '15 at 21:10