0

Hope I can get some hints about this issue.

I have written the following code, to make a table backup from one storage to another.

query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()
print(tbs_out)

for tb in tbs_out:
    table = tb.name + today
    print(target_connection_string)
    #create table with same name in storage2
    table_service_in.create_table(table_name=table, fail_on_exist=False)
    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)

This should be a simple script to loop over items in a table and copy them into another storage account. I have this exact code already running in an azure function and it is working just fine.

Today I tried to run it against several storage accounts, for a bit runs just fine, but then it stops and throws this error:

Traceback (most recent call last):
  File "/Users/users/Desktop/AzCopy/blob.py", line 205, in <module>
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)
  File "/Users/users/Desktop/AzCopy/blob.py", line 191, in queryAndSaveAllDataBySize
    data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/tableservice.py", line 738, in query_entities
    resp = self._query_entities(*args, **kwargs)
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/tableservice.py", line 801, in _query_entities
    return self._perform_request(request, _convert_json_response_to_entities,
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/tableservice.py", line 1106, in _perform_request
    return super(TableService, self)._perform_request(request, parser, parser_args, operation_context)
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/storageclient.py", line 430, in _perform_request
    raise ex
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/storageclient.py", line 358, in _perform_request
    raise ex
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/storageclient.py", line 343, in _perform_request
    _http_error_handler(
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/_error.py", line 115, in _http_error_handler
    raise ex
azure.common.AzureMissingResourceHttpError: Not Found
{"odata.error":{"code":"TableNotFound","message":{"lang":"en-US","value":"The table specified does not exist.\nRequestId:bbdb\nTime:2021-09-29T16:42:17.6078186Z"}}}

Which I do not understand exactly why is this happening. Because all it has to do is copy from one side to another.

Please if anyone can help to fix this, I am totally burnout and can't think anymore :(

UPDATE: Reading again my code I figured I have this limitation here.

#query 100 items per request, in case of consuming too much menory load all data in one time
query_size = 100

When I check my storage table, in fact I have only 100 rows. But I couldn't find anywhere how I can set the query size to load all the data in one time.

As far as I can understand, is that after I reach the query_size limit, I need to look for the next x_ms_continuation token to get the next batch.

I have this code right now:

query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()
print(tbs_out)

for tb in tbs_out:
    table = tb.name + today
    print(target_connection_string)
    #create table with same name in storage2
    table_service_in.create_table(table_name=table, fail_on_exist=False)

    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)
    

According to Microsoft documentation, the marker should check if there is any continuation token, and if its true, it should rerun the code. But this is not happening in my case, once I reach the query_size the code throws the error.

Anyone can help please?

Nayden Van
  • 1,133
  • 1
  • 23
  • 70

1 Answers1

0

Try to replace the for block with below which is to create the table with the same name in storage 2:

for tb in tbs_out:
    #create table with same name in storage2
    table_service_in.create_table(tb.name)
    #first query 
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(tb.name,data,table_service_out,table_service_in,query_size)

Below is the full sample code:

from azure.cosmosdb.table.tableservice import TableService,ListGenerator

table_service_out = TableService(account_name='', account_key='')
table_service_in = TableService(account_name='', account_key='')

#query 100 items per request, in case of consuming too much menory load all data in one time
query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()

for tb in tbs_out:
    #create table with same name in storage2
    table_service_in.create_table(tb.name)
    #first query 
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(tb.name,data,table_service_out,table_service_in,query_size)

Till here this should work properly, if you still have issue with query_size, get the whole data of the table, get the list of 100 records as below: Instead of setting the query_size = 100, we can follow in the below way which will give us the 100 records:

tasks = table_service.query_entities('tasktable')
lst = list(tasks)
print(lst[99])

Also check for below sample from azure-sdk-for-python

def sample_query_entities_values(self):
    from azure.data.tables import TableClient
    from azure.core.exceptions import HttpResponseError

    print("Entities with 25 < Value < 50")
    # [START query_entities]
    with TableClient.from_connection_string(self.connection_string, self.table_name) as table_client:
        try:
            parameters = {u"lower": 25, u"upper": 50}
            name_filter = u"Value gt @lower and Value lt @upper"
            queried_entities = table_client.query_entities(
                query_filter=name_filter, select=[u"Value"], parameters=parameters
            )

            for entity_chosen in queried_entities:
                print(entity_chosen)

        except HttpResponseError as e:
            print(e.message)
    # [END query_entities]
SaiKarri-MT
  • 1,174
  • 1
  • 3
  • 8
  • Hi thank you so much for your reply. Yasterday I deep dived into debugging and I think I figure out the issue I am facing. I did open a separate post with a better explanation. can I post the topic here so you might help me to understand what am I doing wrong? Because the issue is in the table name – Nayden Van Sep 30 '21 at 14:05
  • You are correct, Issue is with the table name, that's the reason it is throwing us TableNotFound, So to resolve this just try to copy with same table name in storage2. Later, after it is working you can try to modify the tablename and pass it correctly to change. – SaiKarri-MT Sep 30 '21 at 17:54