2

I have a large CSV file that I need to get into Redshift.

It has ~5 million rows. A couple issues:

1) The file's first 10 lines are gibberish that I want deleted/excluded

2) Whenever I try to upload csv files, I always get this weird glitch where it says that it loaded successfully, but 0 rows are uploaded. The way I resolve this is by re-formatting the columns in the CSV (formatting numbers as numbers, dates as custom date "YYYY-MM-DD") and then re-saving it. For whatever reason, it works after that.

Does anyone have similar issues and what is the best way to get this csv file into redshift?

UPDATE - so im using the COPY command and copying my csv file from S3 into a Redshift table. HOWEVER, i get an error and i look in stl_load_errors and the "colname" and the "raw_field_value" are not matching. it's really bizzare. for example, the "colname" will be like "name" but the "raw_field_value" will be a phone number. I double checked my csv file and this is NOT the case. any thoughts? it's not just 1 occurrence but a LOT, and several rows being mixed up.

jc315
  • 51
  • 6
  • use ssis to do this – Daniel Marcus Mar 28 '18 at 19:00
  • SSMS has an import wizard that has an option to skip initial [number] rows. Does Redshift have something like that? – Jan Mar 28 '18 at 19:28
  • Your question mentions CSV but you've tagged this question with "Excel" - which file format are you using? You can't import an Excel workbook directly into Redshift, you need to save the relevant worksheet as a CSV file first. Also you should add an example of the data from your file (to illustrate the format being used, so not actual data, just something with the same CSV format) and state which COPY options you are using - without these it's difficult to say what may be causing your issues. – Nathan Griffiths Mar 28 '18 at 22:29
  • Assuming you are actually using a CSV file, check that the CSV options specified in your COPY statement match the format of the file (https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv) - in particular whether you are specifying CSV or DELIMITER, whether you need to specify QUOTE or ESCAPE. – Nathan Griffiths Mar 28 '18 at 22:32
  • To skip the first 10 rows use IGNOREHEADER 10 – Nathan Griffiths Mar 28 '18 at 22:33

2 Answers2

0

Redshift needs the data in proper format for it to be loaded. As you mentioned your data has gibberish rows in the beginning, you should either fix those rows or skip loading them by specifying IGNOREHEADER x where x is the number of starting rows that you need to skip.
The errors you're getting are because your data has extra columns or less columns or field-separators (comma) in a column which are not double-quoted.

In order to keep COPYing data into Redshift, even upon encountering errors, you can add MAXERROR x in the Redshift COPY command. It will skip 'x' number of errors before it stops COPYing.

Shailesh
  • 2,116
  • 4
  • 28
  • 48
0

I had the same issue. Are you outputting your data into csv files chunk by chunk in order to create multiple csv files (which is what Redshift recommends)? In that case, you need to ensure that each chunk has the columns output in the same order.

A.Ali
  • 749
  • 6
  • 14