Import csv file in s3 bucket with semi colon separated fields

Question

I am using AWS Data Pipelines to copy SQL data to a CSV file in AWS S3. Some of the data has a comma between string quotes, e.g.:

{"id":123455,"user": "some,user" .... }

While importing this CSV data into DynamoDB, it takes the comma as the end of the field value. This way it results in errors, as the data given in the mapping does not match the schema we have provided.

My solution for this is - while copying the data from SQL to an S3 bucket - to separate our CSV fields with a ; (semicolon). In that way values within the quotes will be taken as one. And the data would look like (note the blank space within the quote string after the comma):

{"id" : 12345; "user": "some, user";....}

My stack looks like this:

  - database_to_s3:
      name: data-to-s3
      description: Dumps data to s3.
      dbRef: xxx
      selectQuery: >
        select * FROM USER;
      s3Url: '#{myS3Bucket}/xxxx-xxx/'
      format: csv

Is there any way I can use a delimiter to separate fields with a ; (semicolon)?

Thank you!

The question mentions CSV but data samples like `{"id" : 12345; "user": "some, user";....}` are all JSON and not CSV. — ElmoVanKielmo, Dec 13 '21 at 08:13
@Khan, what's your stack snippet referring to? Is it a Cloudformation template (e.g. AWS::DataPipeline::Pipeline PipelineObject) or what? Please, elaborate — maslick, Dec 13 '21 at 10:02
@Khan another question: are you exporting from RDS into S3? Or into DynamoDB? Or do you first export data from RDS into S3, and then from S3 into DynamoDB? — maslick, Dec 13 '21 at 10:05

score 0 · Answer 1 · answered Dec 11 '21 at 14:26

0

give a try to AWS Glue, where you can marshal your data before insert into dynamoDB.

answered Dec 11 '21 at 14:26

MrOverflow

407
3
8

1

Do you have any specific examples for that? – Farukh Khan Dec 11 '21 at 15:37
1

not handy @Khan – MrOverflow Dec 11 '21 at 15:50

Import csv file in s3 bucket with semi colon separated fields

1 Answers1