Reading from the restored volume should be sufficient.
When you create a volume from an existing snapshot, it loads lazily in the background so that you can begin using them right away. If you access a piece of data that hasn't been loaded yet, the volume immediately downloads the requested data from Amazon S3, and then continues loading the rest of the volume's data in the background.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSSnapshots.html
Anecdotally, it seems like sequential "forced-reading" using dd
performs better than the more random reads that would result from reading from the filesystem, but you can of course do both at the same time -- go ahead and mount it and start doing whatever you need to so, but also read and discard from the block device with dd
.
This apparent difference would make sense, particularly if the EBS snapshot infrastructure doesn't actually store the snapshot blocks in "block"-sized (4096 byte) chunks. It seems like that would be a pretty inefficient design, requiring thousands of operations for each megabyte.
It might further improve restoration if you did multiple, sequential reads starting at different offsets. Untested, but gnu dd
can apparently "skip" blocks and begin reading other than from the beginning.
But, you definitely don't need to create "fresh" volumes. Once the blocks are loaded by a read, they're "in" EBS and not from the snapshot.
If there's a stack of 40 snapshots backing the volume then it's presumably having some difficulty quickly locating the block in the most recent snapshot it appears in and fetching it.
It shouldn't really matter how many snapshots were backing it. The data isn't stored "in" the snapshots. Each snapshot contains the complete record of what I'll casually call "pointers" to all of the data blocks comprising it (not just the changed ones) and presumably where they stored in the backing store (S3) that's used by the snapshot infrastructure.
If you have snapshots A, B, and C taken in order from the same volume, and then you delete snapshot B, all of the blocks that changed from A to B but not from B to C are still available for restoring snapshot C, but they are not literally moved from B to C when you delete snapshot B.
When you delete a snapshot, EBS purges the backing store of blocks no longer needed using reference counting. Blocks that are not referenced by any snapshot are handled in the background by a multi-step process that first flags them as not needed, which stops billing you for them, and then actually deletes them a few days later when the fact that they are genuinely at refcount = 0 has been confirmed. Source.
Because of this, the number of snapshots that originally contributed blocks to your restored volume should not have a reason to impact performance.
Additional, possibly useful info: the following does not change the accuracy of the answer, above, but might be of value in certain situations.
In late 2019, EBS announced a new feature called Fast Snapshot Restore that allows volumes created in designated availability zones from designated snapshots to be instantly hot with no warmup required.
Using a credit bucket and based on the size the designated snapshot (that is, the size of the disk volume it was taken from) -- not the size of the target volume (which can be larger than that of the snapshot) -- this feature allows you to create 1024G/size volumes per hour, so a 128 GiB snapshot could create 8 pre-warmed volumes per hour. As snapshots get smaller, the number of volumes you can create per hour per snapshot per availability zone is capped at 10.
The service is also startlingly expensive -- $0.75 per hour, per snapshot, per availability zone (!?) -- however, this may not be something you would need to leave running continuously, and in that light it seems to have some potential value.
When you activate the feature, the service API can tell you when it's actually ready to use, and 60 minutes per TiB is the stated timetable for "optimizing a snapshot" (which, reading between the lines, means building and warming up a hidden primary volume inside EBS from the snapshot, which will subsequently be cloned by the service to create additional EBS volumes; the feature appears to only actually be usable after this stage is complete and volumes created from the same snapshot before that point are just ordinary volumes).
As long as you have time to wait for the "optimizing" stage, and processes in place to terminate the fast restore behavior when you no longer need it (to avoid a very large unexpected billing charge), this does seem to have applicability in limited use cases.