0

I hope somebody can help me to debug this issue.

I have the following script


from azure.cosmosdb.table.tableservice import TableService,ListGenerator
from azure.storage.blob import BlobServiceClient
from datetime import date
from datetime import *




def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()

for tb in tbs_out:
    #create table with same name in storage2
    table_service_in.create_table(table_name=tb.name, fail_on_exist=False)
    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(tb.name,data,table_service_out,table_service_in,query_size)

this code will check the table in storageA copy them and create the same table in StorageB, and thanks to the marker I can have the x_ms_continuation token if I have more than 1000 rows per requests.

Goes without saying that this works just fine as it is.

But yesterday I was trying to make some changes to the code as follow:

If in storageA I have a table name TEST, I storageB I want to create a table named TEST20210930, basically the table name from storageA + today date

This is where the code start breaking down.


table_service_out = TableService(account_name='', account_key='')
table_service_in = TableService(account_name='', account_key='')
query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()
print(tbs_out)

for tb in tbs_out:
    table = tb.name + today
    print(target_connection_string)
    #create table with same name in storage2
    table_service_in.create_table(table_name=table, fail_on_exist=False)

    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)

What happens here is that the code runs up to the query_size limit but than fails saying that the table was not found.

I am a bit confused here and maybe somebody can help to spot my error.

Please if you need more info just ask

Thank you so so so much.

HOW TO REPRODUCE: In azure portal create 2 storage account. StorageA and StorageB.

In storage A create a table and fill it with data, over 100 (based on the query_size. Set the configuration Endpoints. table_service_out = storageA and table_storage_in = StorageB

Nayden Van
  • 1,133
  • 1
  • 23
  • 70

1 Answers1

1

I believe the issue is with the following line of code:

data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)

If you notice, tb_name is the name of the table in your target account which is obviously not present in your source account. Because you're querying from a table that does not exist, you're getting this error.

To fix this, you should also pass the name of source table to queryAndSaveAllDataBySize and use that when querying entities in that function.

UPDATE

Please take a look at code below:

table_service_out = TableService(account_name='', account_key='')
table_service_in = TableService(account_name='', account_key='')
query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(source_table_name, target_table_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(target_table_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=source_table_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(source_table_name, target_table_name, data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()
print(tbs_out)

for tb in tbs_out:
    table = tb.name + today
    print(target_connection_string)
    #create table with same name in storage2
    table_service_in.create_table(table_name=table, fail_on_exist=False)

    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(tb.name, table,data,table_service_out,table_service_in,query_size)
Gaurav Mantri
  • 128,066
  • 12
  • 206
  • 241
  • Yes that true. But what I have in mind. when I `create` the table in the storageB, I want that table to have the name of table in storageA + Today date. As this whole code will run once a week, I want to have clear to which date that backup is related to. And this is my main issue and I have not clue how to solve this. If in storageA I have a table name `Table` I want the same content to be copied over to a table in storageB with name `Table20210930` – Nayden Van Sep 30 '21 at 15:08
  • 2min and I will update my post with the new code. Which it does exactly what I want, but fails after the query_size reaches the threshold – Nayden Van Sep 30 '21 at 15:11
  • That's completely understandable but there's a logical flaw in your code. The table specified by `tb_name` variable does not exist in your source storage account (storage A). That's what I am trying to say here. – Gaurav Mantri Sep 30 '21 at 15:11
  • Updated my answer with modified code. Please take a look at it. HTH. – Gaurav Mantri Sep 30 '21 at 15:16
  • 1
    Yessss that worked. omg thank you thank you thank you thank you thank you. May god give you a Ferrari brother. – Nayden Van Sep 30 '21 at 15:25