0

I'm trying to process some data that I have in a bucket, download the file, work with it and then remove it from my disk. For that purpose I have the following function:

def downloadFileinBucket(pathKeyFile, projectID, bucketName, filename, outFolder):
    '''
    Download filename from bucket
    '''
    print('Downloading image from bucket')
    # Initialise a client
    storage_client = storage.Client.from_service_account_json(pathKeyFile, project=projectID)
    # Create a bucket object for our bucket
    bucket = storage_client.bucket(bucketName)
    # Create a blob object from the filepath
    blob = bucket.blob(filename)
    # Download the file to a destination
    print('til here ok')
    print(outFolder)
    print(filename.split('/')[-1])
    blob.download_to_filename(os.path.join(outFolder,filename.split('/')[-1]))
    return print("Image %s downloaded from bucket" % filename)

Where I use the last three prints to make sure everything makes sense. filename is the name of the file in the bucket I want to download and outFolder the path in my VM I want to temporarily store it. I can use this function with a single file or I can call it in a loop to process several files and it works perfectly.

The problem arises when I try to use this function inside a specific code, which structure is:

class prediction:
    def __init__ ():
    def some_function(self, a1, a2):
        ...
        return b
    def pred2table(self, list1, list2, model1, model2, model3, d):
        for item in list1:
            ....
            downloadFileinBucket(self.pathKeyFile, self.projectID, self.bucketName, list2[0], path)

        # more code
        return something

    def main_predict(self):
        # create list1
        # create list2
        # load model: model1, model2, model3

        self.pred2tables(list1, list2, model1, model2, model3)

if __name__ == 'main':
    # arguments with argparse
    data = gcp_tools.list_blobs(bucketName)
    data = [d for d in data if d.startswith('params')]
    dates = []
    for item in data:
        d = item.split('_')[-2]
        dates.append(d)

    for d in dates:
        predict = prediction(arg1, arg2, ..., d)
        predict.main_predict()

When I try to use the function within this structure, it correctly downloads the file, but before returning the print, it yields a Segmentation fault error and stops the code. If I create two loops like in the structure (date loop and list1 loop) in a simple way, the function works.

The thing is that this is a new error, because in my old VM this code worked perfectly, exactly the same. My VM and environment details:

  • Ubuntu 18.04
  • Python 3.8.0
  • gsutil 5.23
  • google-cloud-storage 2.6.0

Any kind of advice would be a big help. Thanks.

  • Based on these suggestions [here](https://stackoverflow.com/questions/73956311/gcp-cloud-run-container-fails-with-uncaught-signal-11-segmentation-fault-node) and [here](https://stackoverflow.com/questions/74284441/cloud-run-python3-7-fatal-python-error-segmentation-fault) wherein you need to increase your memory and/or outdated Python library. You may want to increase your memory limit or update gcloud SDK, Python library, etc. [Reference](https://stackoverflow.com/questions/60041175/segmentation-fault-for-gcloud-cli) – Robert G Apr 24 '23 at 14:08

0 Answers0