I'm trying to process some data that I have in a bucket, download the file, work with it and then remove it from my disk. For that purpose I have the following function:
def downloadFileinBucket(pathKeyFile, projectID, bucketName, filename, outFolder):
'''
Download filename from bucket
'''
print('Downloading image from bucket')
# Initialise a client
storage_client = storage.Client.from_service_account_json(pathKeyFile, project=projectID)
# Create a bucket object for our bucket
bucket = storage_client.bucket(bucketName)
# Create a blob object from the filepath
blob = bucket.blob(filename)
# Download the file to a destination
print('til here ok')
print(outFolder)
print(filename.split('/')[-1])
blob.download_to_filename(os.path.join(outFolder,filename.split('/')[-1]))
return print("Image %s downloaded from bucket" % filename)
Where I use the last three prints to make sure everything makes sense. filename
is the name of the file in the bucket I want to download and outFolder
the path in my VM I want to temporarily store it. I can use this function with a single file or I can call it in a loop to process several files and it works perfectly.
The problem arises when I try to use this function inside a specific code, which structure is:
class prediction:
def __init__ ():
def some_function(self, a1, a2):
...
return b
def pred2table(self, list1, list2, model1, model2, model3, d):
for item in list1:
....
downloadFileinBucket(self.pathKeyFile, self.projectID, self.bucketName, list2[0], path)
# more code
return something
def main_predict(self):
# create list1
# create list2
# load model: model1, model2, model3
self.pred2tables(list1, list2, model1, model2, model3)
if __name__ == 'main':
# arguments with argparse
data = gcp_tools.list_blobs(bucketName)
data = [d for d in data if d.startswith('params')]
dates = []
for item in data:
d = item.split('_')[-2]
dates.append(d)
for d in dates:
predict = prediction(arg1, arg2, ..., d)
predict.main_predict()
When I try to use the function within this structure, it correctly downloads the file, but before returning the print, it yields a Segmentation fault
error and stops the code. If I create two loops like in the structure (date loop and list1 loop) in a simple way, the function works.
The thing is that this is a new error, because in my old VM this code worked perfectly, exactly the same. My VM and environment details:
- Ubuntu 18.04
- Python 3.8.0
- gsutil 5.23
- google-cloud-storage 2.6.0
Any kind of advice would be a big help. Thanks.