I have a huge txt file and I need to put it on DynamoDB. the file struct is:
223344|blue and orange|Red|16/12/2022
223344|blue and orange|Red|16/12/2022 ...
This file has more than 200M lines
I have tried to convert it on json file using this code bellow:
import json
with open('mini_data.txt', 'r') as f_in:
for line in f_in:
line = line.strip().split('|')
filename = 'smini_final_data.json'
result = {"fild1": line[0], "field2": line[1], "field3": str(line[2]).replace(" ",""),"field4":line[3]}
with open(filename, "r") as file:
data = json.load(file)
data.append(result)
with open(filename, "w") as file:
json.dump(data, file)
But this isn't efficient and it's only the first part of the job ( convert data to Json ), after this I need put the Json in dynamoDB.
I have used this code (it's look good):
def insert(self):
if not self.dynamodb:
self.dynamodb = boto3.resource(
'dynamodb', endpoint_url="http://localhost:8000")
table = self.dynamodb.Table('fruits')
json_file = open("final_data.json")
orange = json.load(json_file, parse_float = decimal.Decimal)
with table.batch_writer() as batch:
for fruit in orange:
fild1 = fruit['fild1']
fild2 = fruit['fild2']
fild3= fruit['fild3']
fild4 = fruit['fild4']
batch.put_item(
Item={
'fild1':fild1,
'fild2':fild2,
'fild3':fild3,
'fild4':fild4
}
)
So, does anyone, have some suggestions to process this txt more efficiently?
Thanks