0

I received an email from AWS about one of my multi-AZ RDS instance. They basically said that there was going to be an upgrade during a certain period:

We are contacting you to inform you that one or more of your Amazon RDS DB instances is scheduled to receive system upgrades during your maintenance window between July 21 2:00 PM and July 28 2:00 PM PDT.

The window seems big and I want to reduce the impacts even if we are on a multi-AZ setup. From my experience with EC2 instances, it is possible to reboot the instance and the upgrade will be applied. Is it the same thing for RDS instances?

Thanks a lot!

0x9BD0
  • 103
  • 1
  • 4
  • 2
    The message is ambiguously worded. What it actually means is *one or more of your Amazon RDS DB instances is scheduled to receive system upgrades between July 21 2:00 PM and July 28 2:00 PM PDT **during your maintenance window***. Your maintenance window for each instance is shown in the console. If you are running MySQL < 5.6.34, then there **should be** something you can do to perform the maintenance on your own terms. – Michael - sqlbot Jul 17 '17 at 02:31

2 Answers2

4

If you're on a multi-AZ database you don't have to do anything. AWS will upgrade the standby instance, change DNS so your applications use the standby, then upgrade the primary. Note that it won't change your primary back to the first, but you can do that manually if you think it's worthwhile.

Maintenance will happen some time in that window. It doesn't really matter when, since it's a managed service.

To directly answer your question, no, I don't think a reboot will schedule these upgrades. RDS Instances have a weekly maintenance window. Updates will be applied at the time you specify.

How Failover Works

From here.

In the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes for the failover to complete depends on the database activity and other conditions at the time the primary DB instance became unavailable. Failover times are typically 60-120 seconds. However, large transactions or a lengthy recovery process can increase failover time. When the failover is complete, it can take additional time for the RDS console UI to reflect the new Availability Zone.

The failover mechanism automatically changes the DNS record of the DB instance to point to the standby DB instance. As a result, you will need to re-establish any existing connections to your DB instance.

Conclusion

Based on this, RDS on its own isn't suitable for situations where you can't tolerate a couple of minutes of downtime occasionally. It's probably better than any individual EC2 instance running a single database, but if you really want high availability SQL some kind of clustering may be required.

Tim
  • 31,888
  • 7
  • 52
  • 78
  • That is good to know. From the email they sent us, it seemed like there was going to be a minute or two where the database would be unavailable. But after some tests on a testing environment, I noticed that there was no downtime. I feel better about this update, but do you think there is a way to schedule it at a specific date and time? – 0x9BD0 Jul 16 '17 at 20:55
  • I've expanded my answer for you. Yes, you can choose the day and time with your weekly maintenance window, which you can set on your RDS instance. – Tim Jul 16 '17 at 21:20
  • Great! I figured it out as you were editing the answer. Your answer is complete and I now have faith in the procedure! Thanks a hundred times Tim! – 0x9BD0 Jul 16 '17 at 21:22
  • 1
    *"I hope (without any evidence) that after the standby is promoted to primary and DNS is changed there's a short delay before the primary is taken down."* That would be nice, but no. Testing (RDS for MySQL) shows that the primary goes down, the DNS changes, the secondary comes up. I don't know how Multi-AZ does its magic, but it doesn't use standard replication and it doesn't seem to allow both instances to be up simultaneously. The wait time will be between ~1 and ~5 minutes. – Michael - sqlbot Jul 17 '17 at 02:37
  • The worst case scenario is a recently-created Multi-AZ instance that was created from a snapshot, or an instance recently converted to Multi-AZ. The first switchover in these cases can be 2-3 minutes longer than any subsequent switchover, presumably due to the first-touch penalty of new EBS volumes. – Michael - sqlbot Jul 17 '17 at 02:40
  • @Michael-sqlbot somehow I missed that. AWS says RDS has synchronous replication at the storage level, but I always (incorrectly) thought there were two instances running. Based on what you're saying I guess they're just replicating the disk between AZs and when once instance goes down another comes up. That does suggest there will be downtime during maintenance windows. So for high availability you'd need to implement something yourself. – Tim Jul 17 '17 at 03:15
  • 1
    There's downtime. There are 2 instances running, but apparently MySQL isn't running on the instance until the failover occurs, because after the DNS switches over, you see error 111 ("Connection refused") for a few seconds, then you get a connection and `SHOW STATUS LIKE 'UPTIME';` indicates a fresh restart, although that may be a side effect of the *forced* failover. I've never captured any forensic data from a failover that occurred due to a fault. Avoiding any downtime requires a significantly manual migration process that's complicated and "unofficial," but I'm working on a write-up of it. – Michael - sqlbot Jul 17 '17 at 04:25
  • I'd be interested to see what you come up with @Michael-sqlbot – Tim Jul 17 '17 at 07:15
  • Allright, so I will have to prepare my applications for a small downtime. To know when the maintenance will occur I changed my weekly maintenance window to a specific day in the week where we have a low traffic. You guys helped to clarify things, thanks a lot! – 0x9BD0 Jul 17 '17 at 15:18
1

As stated above, the maintenance may result in the short downtime when traffic is transferred between multi-az master servers.

However, it is also POSSIBLE to avoid any downtime during the maintenance. The way to do it is by briefly launching a new RDS from a read replica snapshot and configure it as active/active Master to Master replication. Once it's configured, you can switch application traffic one APP server at the time without any downtime. We use the approach every time AWS announces RDS maintenances to avoid downtime as well as during our scheduled maintenances.

https://workmarket.tech/zero-downtime-maintenances-on-mysql-rds-ba13b51103c2

Here are the details:

M1 - Orignal Master

R1 - Read Replica of the M1

SNAP1 - Snapshot of the R1

M2 - New Master

M2 creation sequence: M1 → R1 → SNAP1 → M2

  • Since we can’t use SUPER privilege on RDS, we don’t use mysqldump with — master_data2 option on the M1. Instead, we launch R1 to obtain the binlog position of the M1 from it. Then create a snapshot (SNAP1) from the R1 and then launch M2 from the SNAP1.

  • Create two separate RDS parameters groups with the followingt offsets to avoid PK conflcts:

    M1: auto_increment_ increment = 4 and auto_increment_offset = 1

    M2: auto_increment_ increment = 4 and auto_increment_offset = 2

  • Create replication user on M1

    GRANT EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO ‘repl’@’%’ IDENTIFIED BY PASSWORD <secret>;

1. Create R1 from M1

-- Connect to the R1 and stop replication
   CALL mysql.rds_stop_replication;
-- Obtain M1’s (!!) current binlog file and position 
        `mysql> show slave status\G
             Master_Log_File: mysql-bin.000622
             Exec_Master_Log_Pos: 9135555

2. Create SNAP1 from R1

  • Create M2 from the SNAP1 with the attributes obtained from M1

  • Assign a parameter group to M2 with a different auto_increment_ offset from M1 to avoid M/M replication key conflicts

4. Setup M/M replication

-- Configure M2 as a slave of M1
CALL mysql.rds_set_external_master (‘m1.xyxy24.us-east-1.rds.amazonaws.com’, 3306, ‘repl’, ‘mypassword’, ‘mysql-bin.000622, 9135555, 0);
CALL mysql.rds_start_replication;
-- Connect to M2 and obtain its current binlog file and position
         mysql> show master status\G
            File: mysql-bin.004444
            Position: 6666622
-- Connect to M1 and configure it to be a slave of the M2
CALL mysql.rds_set_external_master (‘m2.xyxy24.us-east-1.rds.amazonaws.com’, 3306 , ‘repl’, ‘mypassword’, ‘mysql-bin.004444, 6666622, 0);
CALL mysql.rds_start_replication;

5. Delete R1 and SNAP1 as they’re no longer needed

6. Update M2 via AWS Console

Use the standard procedure to Modify the Instance as per your needs.

7. Perform Graceful Switchover to M2

As M/M replication is set up successfully, we are ready to proceed with DB maintenance without downtime by gracefully switching App servers one at the time.

Here are more details on how it works.

https://workmarket.tech/zero-downtime-maintenances-on-mysql-rds-ba13b51103c2