0

I created a new VPC ca-central. I followed the same procedure as everywhere else:

  • New VPC (this created acl which is wide open)
  • three subnets, one for each availabiltiy zone, CIDR spaced out properly
  • all subnets on routing table
  • that routing table routes 0.0.0.0/0 to internet gateway
  • instances use a secGroup which has port 22 open inbound, all traffic outbound
  • everything attached properly to vpc

I cannot connect to any T3 instance created there via ssh, spot or on-demand. I even tried using factory AIM for ubuntu instead of our AIMs, same thing. Any attempt just times out. For test, I allowed all ports in secGroup, and that did not help. I nuked everything and created from scratch, didn't help.

I do not know what I did wrong. Exactly the same setup is in us-east-1 and it works just fine. Anyone has any idea what I am doing wrong?

PS: Instances have public IPs linked to internal IP on the VNIC attached to vpc

Edit: The CloudFormation script for the VPC: https://pastebin.com/VK3Cb6j8

Edit: VPS is ok, T2 isntaces work, but T3 instances do not work.

mmix
  • 141
  • 5
  • If you've done what you've described then it should work. Public subnet, public IP, routing, internet gateway, security group, and NACL should be sufficient. You would need to post a _lot_ of screenshots for us to check it. Try deleting the default VPC, if it's there, to make sure the resources have been created on the correct VPC. – Tim Apr 25 '23 at 20:02
  • @Tim. Default VPC has been removed before. I added a pastebin of CF script for the VPC. For some reason, the Former2 tool did not enumerate the Network ACL that was auto-created. I wonder if that is the problem. It is present in the console: https://ibb.co/YDzbQxS – mmix Apr 26 '23 at 08:44
  • NACLs look fine. In the absence of extensive screenshots I think you need someone to sit with you (physically or virtually) and go through this with you, it's easy to miss steps. – Tim Apr 26 '23 at 19:42
  • @Tim I did exactly the same steps, side by side, in Ohio region and there it works. I think there is something wrong with my account and ca-central region. Something is broken, and I can't figure out what nor can I raise this with Amazon because apparently I have to pay them the report an issue in their own system. :( Unbelievable how fast you can get stranded in the cloud and break through the "it just works" facade. – mmix Apr 27 '23 at 06:36
  • Do you have a region disabled, or a service control policy affecting services in Canada? This is one of those things that someone experienced with AWS could probably figure out poking around, but difficult to diagnose like this. Try looking for error messages in the CloudTrail logs. – Tim Apr 27 '23 at 08:04
  • @Tim, regions are enabled (some are disabled, but not ca-central), CloudTrail has no errors, and I am literally accessing with root account. This is my personal account, so I do not have the "Organizations" feature enabled, so no control policies. – mmix Apr 27 '23 at 09:33
  • @Tim, ok, a new development. I started experimenting, and I launched a t2.medium instance instead of t3.medium. And t2 instance WORKS. It appears that its about the instance type. The only thing I see different is that t3 is nitro and t2 is not, but obviously the vpc works – mmix Apr 27 '23 at 10:05
  • That's very odd. I think t2 is nitro, just an earlier version. The VPC looks pretty basic and standard. The only difference I've ever found between t2 and t3 is disk device names, and it's pretty rare you need to reference those. – Tim Apr 27 '23 at 22:40
  • 1
    @Tim, in the end I paid for developer support and created a ticket. They confirmed configuration is correct and acknowledged the issue and escalated to internal team. Waiting to see what they come up with. I'll post the resolution as an answer when its over. – mmix Apr 29 '23 at 09:22
  • @Tim, see the answer, not crazy after all :D – mmix May 02 '23 at 20:53

1 Answers1

0

Well, after a few back and forth with customer support, I got this:

"Thank you for your patience. I have worked with my internal team and they have allowed other regions to be fully functional. You should now be able to access the instances over SSH. "

And now it works. Apparently there are some hidden settings not reachable by administrative screens which limit regions and/or instance types. I am leaving this answer here in case other people run into this problem. Unfortunately you'll have to pay 1 month of developer support so you can reach them so they could look into what was obviously their problem that you had no way of correcting yourself. I intend to ask for a refund for CS subscription, but not holding my hopes on that. Either way, the problem was resolved by L2 support.

Edit: They accepted my claim that I shouldn't have had to pay for customer support to resolve this and they refunded me. Ultimately, their recommendation is to raise this issue with Account and Billing issue as they are the ones handling changes to those hidden limitations.

mmix
  • 141
  • 5
  • Interesting. They didn't say what the issue was, just that they had fixed or changed something. Not uncommon with AWS. – Tim May 03 '23 at 01:00
  • 1
    @Tim, I pressed for the source and it appears that my account was briefly suspended for billing back in 2015 (though I do not remember it) and its the only remaining explanation. They still cannot explain why only some services (and not all) were disabled if it was a billing related suspension and why this was not communicated in any way. Not impressed by hidden kill switches honestly. If I were a big crew running dozens+ fleets for revenue, this would freak the hell out of me. I suggested they send this case to some product manager, this case could have been avoided. – mmix May 04 '23 at 06:15