0

I am attempting to read data from a CSV file and load it into a DynamoDB table. The issue is that description is written in sentences and have commas. How do I read the columns with a comma delimiter, but ignore the commas within the cells?

Currently, I am using this code to read the CSV file and write to the DB:

def import_csv_to_dynamodb(table_name, csv_file_name, col_names, column_types):
'''
Import a CSV file to a DynamoDB table
'''
dynamodb_conn = boto.connect_dynamodb(aws_access_key_id=MY_ACCESS_KEY_ID,
                                      aws_secret_access_key=MY_SECRET_ACCESS_KEY)
dynamodb_table = dynamodb_conn.get_table(table_name)
BATCH_COUNT = 2  # 25 is the maximum batch size for Amazon DynamoDB

items = []

count = 0
csv_file = open(csv_file_name, 'r', encoding="utf-8-sig")
for cur_line in csv_file:
    count += 1
    cur_line = cur_line.strip().split(',')

    row = {}
    for col_number, col_name in enumerate(col_names):
        row[col_name] = column_types[col_number](cur_line[col_number])

    item = dynamodb_table.new_item(
        attrs=row
    )
    items.append(item)

    if count % BATCH_COUNT == 0:
        print
        'batch write start ... ',
        do_batch_write(items, table_name, dynamodb_table, dynamodb_conn)
        items = []
        print
        'batch done! (row number: ' + str(count) + ')'

# flush remaining items, if any
if len(items) > 0:
    do_batch_write(items, table_name, dynamodb_table, dynamodb_conn)

csv_file.close()
Will Simpson
  • 21
  • 1
  • 3
  • generally if there is a comma in the cell. the whole data in csv is inside inverted commas. That's how you differentiate and then you can use python inbuilt csv library like mentioned here - https://stackoverflow.com/questions/8311900/read-csv-file-with-comma-within-fields-in-python – KingJames Mar 28 '18 at 23:30
  • 1
    There are a number of different quasi-standard dialects of CSV that handle this in different ways. If you want to build this from scratch, you need to either find out how the CSV was created and read up on the docs, or reverse-engineer the rules for yourself. But if you use the stdlib `csv` module, it already knows the most popular dialects, which makes things a whole lot easier. – abarnert Mar 28 '18 at 23:41
  • 1
    Anyway. if you actually do want to work out the dialect and parse it yourself, you will need to provide a [mcve]—including enough sample input to work out the dialect. But really, you're almost certainly better off switching to the `csv` module and closing this question. – abarnert Mar 28 '18 at 23:42

1 Answers1

1

The Python built-in csv library is very good. The documentation really needs no extra explanation:

https://docs.python.org/3/library/csv.html

mafrosis
  • 2,720
  • 1
  • 25
  • 34