There is the following scenario:
SQL table needs to be transferred to a MySQL database daily. I tried with Data Pipeline using CopyActivity but the exported CSV has empty spaces instead of \N or NULLs so MySQL import those fields as "" which is not good for our application.
I then tried with a slightly different approach. Export the table to S3 via CopyActivity, ShellCommandActivity downloads the file, executes the script below and uploads the file to s3:
#!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/*.csv |cat ${INPUT1_STAGING_DIR}/*.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv
The script above works perfectly on my test linux instance but nothing happens when it's executed on the temporary EC2 resource. I get no errors whatsoever just the same useless csv with empty spaces on the output s3 datanode.
I'm clueless what i'm doing wrong and why the script doesn't work the same way as on my test linux instance.
Pipeline log:
18 Jul 2016 10:23:06,470 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.ShellCommandActivity@515aa023
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Local File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Download just the root file to the local dir. Updated File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is sqltable.csv
18 Jul 2016 10:23:06,649 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading S3 file s3://s3-bucket/mysqlexport/sqltable.csv to /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd/sqltable.csv
18 Jul 2016 10:23:06,824 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,862 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: #!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/sqltable.csv |cat ${INPUT1_STAGING_DIR}/sqltable.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv
18 Jul 2016 10:23:06,865 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: configure ApplicationRunner with stdErr file: output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdError and stdout file :output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdOutput
18 Jul 2016 10:23:06,866 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: output/tmp/df-09799242T7UHHPMT072T-de05e7a112c440b4a42df69d554d8a9a/ShellCommandActivityId18OqM20160718T101838Attempt1_command.sh with env variables :{INPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd, OUTPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e} with argument : null
18 Jul 2016 10:23:06,952 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Uploading local directory:output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to S3 s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:06,977 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Upload single file to S3:s3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:06,978 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed uploading of all files
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of local dir output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageFromS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:07,050 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageInS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e
18 Jul 2016 10:23:07,051 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @DefaultShellCommandActivity1_2016-07-18T10:18:38_Attempt=1
18 Jul 2016 10:23:07,052 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Work ShellCommandActivity took 0:0 to complete