I have a bucket that holds a massive amount of data and I want to get only specific objects (files) that contain a string (UUID which is part of the file).
Now what I am doing is listing all the objects from s3 and then filter them by getting summaries which only contains a specific string then I gather all in a list and return the list with needed files.
public List<String>getBucketList(String filterStr) {
List<String>lst = new ArrayList<>();
try {
ListObjectsRequest listObjectsRequest =
new ListObjectsRequest()
.withBucketName(bucketName);
ObjectListing objects = s3client.listObjects(listObjectsRequest);
for (;;) {
List<S3ObjectSummary> summaries = objects.getObjectSummaries();
if (summaries.size() < 1) {
break;
}
for(S3ObjectSummary summary: summaries){
if(summary.getKey().contains(filterStr)){
lst.add(summary.getKey());
}
}
objects = s3client.listNextBatchOfObjects(objects);
}
}
Expected: from the listing I want to get only the objects that are relevant to me which contains 'filterStr'(variable name which its value is UUID number). Actual: After getting all the objects I am filtering the needed files (objects) by verifying them containing the string (variable name filterStr) this action eventually does what I was intending to do but it takes a lot of time which I wonder if I can minimize.
EDIT: Inside My bucket I got multiple folders, for example:
alert
alert_archived
channel
device
Inside each folder I have a date which is represented this way:
alert 2019 08 26
example for a file that I want to get is represented in this convention:
s3://<bucket_name>/<name_of_folder_out_of_many>/2019/08/25/<UUID>_<name_of_the_file>.csv.gz
where I want to iterate over all folders in the bucket and get only files that are with this specific UUID_.csv.gz of course current date is important I want to get only current date.