I've been doing some research on how to move zipped S3 data to Elasticsearch. On Aws website there is information where you could create a lambda that unzips file and re-upload it then move it to ES. Right now since I do not have too large dataset I am downloading data into my local computer and sending it to ElasticSearch with correct format. Both method seems inefficient and I am wondering if there is a way to unzip file then move it to Elasticsearch w/o downloading or re-uploading data.
Right now this is my code:
s3 = boto3.resource('s3')
s3.Bucket(bucket).download_file(key, 'download_path')
ip_pattern = re.compile('(\d+\.\d+\.\d+\.\d+)')
time_pattern = re.compile('\[(\d+\/\w\w\w\/\d\d\d\d:\d\d:\d\d:\d\d\s\+\d\d\d\d)\]')
message_pattern = re.compile('\"(.+)\"')
with gzip.open('download_path') as files:
data = ""
document = {}
for line in files:
line = line.decode("utf-8") # decode byte to str
ip = ip_pattern.search(line).group(0)
timestamp = time_pattern.search(line).group(0)
message = message_pattern.search(line).group(0)
document = { "ip": ip, "timestamp": timestamp, "message": message }
If there isn't any better way I will use above code.