3

I'm running some startup scripts (cmd/bat) on my small azure VM which include a file-transfer operation from a mounted VHD, and normally it finishes in about 3 minutes (copying files and extracting ~500Mb zip file with command-line 7z).

When I scale out to ~150 instances, the same operation is very slow (up to 15 minutes in total, most of which is used by 7z). Also, the nodes which are the slowest to complete the bootup procedure are very hard to access at first using mstsc(animation is lagging and takes a lot of time to log in), but that might not be related.

What could be the problem?

We had the idea to examine the cache, but it would be nice to know of any other potential bottleneck which may be present in the following situation.

UPDATE: I tried extracting on the D:\ drive instead of extracting it on the C:\ and while scaling to 200, the unzip takes about a minute! So it seems like the problem is that C:\ might be on the blob. But again, I have 3GB of data in 40 files, so 60MB/s per blob should be enough to handle it. Or - can it be the case that we have a cap for the all blobs?

user2520968
  • 358
  • 1
  • 3
  • 11

2 Answers2

6

The VM sizes each have their own bandwidth limitations.

| VM Size       | Bandwidth     |
| ------------- |:-------------:|
| Extra Small   | 5 (Mbps)      |
| Small         | 100 (Mbps)    |
| Medium        | 200 (Mbps)    |
| Large         | 400 (Mbps)    |
| Extra Large   | 800 (Mbps)    |

I suspect you always have one copy of your mounted VHD and have ~150 instances hitting it. Increasing the VM size of the VM hosting the VHD would be a good test but an expensive solution. Longer term put the files in blob storage. That means modifying your scripts to access RESTful endpoints.

It might be easiest to create 2-3 drives on 2-3 different VMs and write a script that ensures they have the same files. Your scripts could randomly hit one of the 2-3 mounted VHDs to spread out the load.

Here are the most recent limitations per VM size. Unfortunately this table doesn't include network bandwidth: http://msdn.microsoft.com/en-us/library/windowsazure/dn197896.aspx

-Rich

p.s. I got the bandwidths from a PowerPoint slide in the Microsoft provided Azure Training Kit dated January 2013.

UpAndAdam
  • 4,515
  • 3
  • 28
  • 46
richstep
  • 184
  • 5
  • Actually, when we mount the blob, we create a snapshot, and once we are done working with it, we unmount it. The snapshots files actually remain in the container after the fact - we're also looking for a way to delete them so that they don't pile up, but that's a different story. – user2520968 Jun 25 '13 at 18:11
  • In your script, are you accessing the files using a network path? \\servername\sharename\myscript.cmd or via http? – richstep Jun 25 '13 at 18:31
  • Cloud Storage studio lets you delete snapshots. http://www.red-gate.com/products/azure-development. Otherwise you have to use Azure's API. There is no Microsoft provided PowerShell cmdlet yet. – richstep Jun 25 '13 at 18:35
  • Since the drive is mounted, the path looks like a local address: F:\yada\yada\yada We're working on deleting them (snapshots) using the API, but breaking the lease is not always straightforward. I'll probably post a question one day when that becomes an issue. – user2520968 Jun 25 '13 at 19:02
  • 2
    confirmed, extra small http://www.speedtest.net/result/3168917566.png and for small instance http://www.speedtest.net/result/3168444648.png – ewwink Dec 16 '13 at 16:11
4

one thing to consider is the per-storage-account scalability target of a storage account. With georeplication enabled, you have 10Gbps egress and 20K transactions/sec, which you could be bumping into. Figure with 150 instances, you could potentially be pulling 150 x 100Mbps, or 15Gbps as all of your instances are starting up.

Not sure about the "mounted VHD" part of your question. With Azure's drive-mounting, only one virtual machine instance can mount to a drive at any given time. For this type of file-copy operation, typically you'd grab a file directly from a storage blob, rather than a file stored in a vhd (which, in turn, is stored in a page blob).

EDIT: Just wanted to mention that an individual blob is limited to 60MB/sec (also mentioned in the blog post I referenced). This could also be related to your throttling.

David Makogon
  • 69,407
  • 21
  • 141
  • 189
  • I think I was unclear, sorry. Yes, we are mounting from a blob. We are also creating snapshots first, so effectively it's the read-only snapshot which is being used. Is there a way to speed up the bootup though, if you are right about the 10Gbps cap? – user2520968 Jun 25 '13 at 18:20
  • Got it. One thing worth trying is storing your zip file directly in a blob rather than in your data-disk vhd, and avoiding the low-level disk driver (which effectively takes care of the file-system access for you), instead just doing a straightforward download-from-blob (easily done through PowerShell or via the numerous language SDKs published). – David Makogon Jun 25 '13 at 18:26
  • Yes, that's something that we intend to do in the future. Thank you. could you please give me some references for that 10Gbps cap that you've mentioned? I never new Azure had such a limit (or - I never knew when to expect one). We can scale much bigger than 150 instances (10x more), so it would be nice to know about that. Also, will storing the zip file in the blob storage resolve this limitation? – user2520968 Jun 25 '13 at 18:34
  • Storage account targets (revised): http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx – BrentDaCodeMonkey Jun 25 '13 at 19:08
  • What @BrentDaCodeMonkey said. Also, I linked to it in the 1st sentence of my answer. – David Makogon Jun 25 '13 at 19:10
  • Thanks, was very useful to read. I haven't figured if that was the exact cause, but after scaling to 500 VMs, the startup increased yet more, so maybe you're right. So I tried copying files first and then extracting, but it got even more strange: yes, copying from the vhd is slow, but extracting locally is also much slower when you have more instances! I have no explanation to that whatsoever. All the script has to do is to extract a local zip file to the same local drive AFTER it has been copied, and it still takes some 20 minutes to complete. – user2520968 Jun 26 '13 at 13:39
  • 1
    Resolved. Apparently, the snapshots only contain the differences from the original blob, so all access is done via one single blob, so we're hitting the 60MB/s/blob cap. – user2520968 Jul 02 '13 at 10:41