0

I'm using the azure-sdk-for-python BlobClient start_copy_from_url to copy a remote file to my local storage.

However, the file always ends up as an AppendBlob instead of BlockBlob. I can't see how I can force the destination BlockType to be BlockBlob.

connection_string = "connection string to my dest blob storage account"
container_name = "myContainerName"
dest_file_name = "myDestFile.csv"
remote_blob_url = "http://path/to/remote/blobfile.csv"

client = BlobServiceClient.from_connection_string(connection_string)
dest_blob = client.get_blob_client(container_name, dest_file_name)
dest_blob.start_copy_from_url(remote_blob_url)
Brig
  • 10,211
  • 12
  • 47
  • 71

3 Answers3

2

You can't change blob type as soon as you create it.Please see the Copy Blob From URL REST API,no blob-types header.

You could refer to my code to create block blob from append blob:

from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
from azure.storage.blob import BlockBlobService
import requests
from io import BytesIO

account_name = "***"
account_key = "***"
container_name = "test"
blob_name = "test2.csv"

block_blob_service = BlockBlobService(account_name, account_key)

sas_token = block_blob_service.generate_blob_shared_access_signature(container_name, blob_name,
                                                                     permission=BlobPermissions.READ,
                                                                     expiry=datetime.utcnow() + timedelta(hours=1))
blob_url_with_sas = block_blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)

r = requests.get(blob_url_with_sas, stream=True)
block_blob_service.create_blob_from_stream("test", "jay.block", stream=BytesIO(r.content))

enter image description here

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
2

Here is what you want to do using the latest version (v12) According to the documentation,

The source blob for a copy operation may be a block blob, an append blob, or a page blob. If the destination blob already exists, it must be of the same blob type as the source blob.

Right now, you cannot use start_copy_from_url to specify a blob type. However, you can use the synchronous copy APIS to do so in some cases.

For example, for block to page blob, create the destination page blob first and invoke update_range_from_url on the destination, with each chunk of 4 MB from the source.

Similarly, in your case, create an empty block blob first and the use the stage_block_from_url method.

from azure.storage.blob import ContainerClient
import os

conn_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
dest_blob_name = "mynewblob"
source_url = "http://www.gutenberg.org/files/59466/59466-0.txt"

container_client = ContainerClient.from_connection_string(conn_str, "testcontainer")

blob_client = container_client.get_blob_client(dest_blob_name)
# upload the empty block blob
blob_client.upload_blob(b'')

# this will only stage your block
blob_client.stage_block_from_url(block_id=1, source_url=source_url)
# now it is committed
blob_client.commit_block_list(['1'])

# if you want to verify it's committed now
committed, uncommitted = blob_client.get_block_list('all')
assert len(committed) == 1

Let me know if this doesn't work.

EDIT: You can leverage the source_offset and source_length params to upload blocks in chunks. For example,

stage_block_from_url(block_id, source_url, source_offset=0, source_length=10)

will upload the first 10 bytes i.e. bytes from 0 to 9. So, you can use a counter to keep incrementing the block_id and track your offset and length till you exhaust all your chunks.

EDIT2:

for step in range(....):
    ###
    blob.stage_block_from_url(...)
    ##do not commit it##
#outside the for loop
blob.commit_block_list([j for j in range(i+1)]) (#or i+2?)
rakshith91
  • 682
  • 5
  • 13
  • I get the error: The source request body for copy source is too large and exceeds the maximum permissible limit (100MB). FYI the file is 1.3GB – Brig Nov 20 '19 at 04:07
  • Did you try staging blocks with 4 mb chunks? Let me check something. – rakshith91 Nov 20 '19 at 05:17
  • https://learn.microsoft.com/en-us/rest/api/storageservices/put-block-from-url#remarks Put Block From URL uploads a block for future inclusion in a block blob. A block blob can include a maximum of 50,000 blocks. Each block can be a different size, up to a maximum of 100 MB. The maximum size of a block blob is therefore slightly more than 4.75 TB (100 MB X 50,000 blocks). This means, you have to upload your blob in 100 mb chunks if you are using any synchronous copy APIS – rakshith91 Nov 20 '19 at 06:08
  • look at the EDIT – rakshith91 Nov 20 '19 at 06:35
  • I've created a loop to iterate over the blocks. I have a bug somewhere. It shows an upload of all 1.3 GB but only 4.8 MB ends up in storage. – Brig Nov 20 '19 at 17:58
  • Here is the updated code. I don't want to change the original question. https://gist.github.com/briglx/66b2c478ab531bf3d9fd123cafaa967e – Brig Nov 20 '19 at 21:01
  • You want to commit all the blocks at once. In the for loop, just stage the blocks and commit all of them at once outside the loop. look at the edit – rakshith91 Nov 21 '19 at 19:54
  • Fails when staging blocks get over 100MB before committing them. – Brig Nov 22 '19 at 05:42
0

As I know there is no direct conversion between blob types. To do this you need to download the blob and reupload it as Block Blob.

Sajeetharan
  • 216,225
  • 63
  • 350
  • 396