0

I am trying to read large CSV files with lots of Newline characters in them.

this is how the data looks like in the CSV file.

                    "LastValueInRow",
"FirstValueInNextRow",

I would like to use " + , + NEWLINE + " as records delimiter to prevent it from reading all other return characters as new records.

The following code reads most CSV records correctly by using NEWLINE (\n) + "

ORGANIZATION EXTERNAL 
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "IMPORT_TEST"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY '\n"'
                         BADFILE SNOW_IMPORT_TEST:'TEST_1.bad'
                         LOGFILE SNOW_IMPORT_TEST:'TEST_1.log'
                         SKIP 1
                         FIELDS TERMINATED BY '","'
                         MISSING FIELD VALUES ARE NULL
                                                                  )
      LOCATION
       ( "IMPORT_TEST":'TEST_1.csv'
       )
    )

Adding any characters before the \n doesn't return any rows, below is what I want which doesn't work:

      ( RECORDS DELIMITED BY '",\n"'

Is it possible to use " + , + \n + " as records delimiter.

Thanks.

jarlh
  • 42,561
  • 8
  • 45
  • 63
Mario Douh
  • 49
  • 6
  • Your example data looks fine, though you might want a `trailing nullcols` directive.. Do you mean you have other data that has newlines *within* a quote-enclosed value? Or something else - it isn't very clear at the moment, more sample data might help, but please try to explain the problem some more too. What doesn't work, and what errors or wrong results do you get, for example? (You *might* be looking for [something like this](https://stackoverflow.com/a/21334074/266304), but... hard to tell.) – Alex Poole Sep 13 '21 at 17:31

1 Answers1

0

After a lot of research I have found that the best solution is to replace the return characters in the CSV file to a different character using Windows PowerShell then update the records delimiter in the external table.

I have created the following Powershell script to remove all the return characters in the CSV file (where $loc is the directory and $file_name is the file name)

(Get-content -raw -path $loc\$file_name".csv") -replace '[\r\n]', '|' | Out-File -FilePath $loc\$file_name"_PP.csv" -Force -Encoding ascii -nonewline

Then I have updated the external table parameter to read the records based on the new delimiter '",||"'.

ORGANIZATION EXTERNAL 
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "IMPORT_TEST"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY '",||"'
                         BADFILE SNOW_IMPORT_TEST:'TEST_1_PP.bad'
                         LOGFILE SNOW_IMPORT_TEST:'TEST_1_PP.log'
                         SKIP 1
                         FIELDS TERMINATED BY '","'
                         MISSING FIELD VALUES ARE NULL
                                                                  )
      LOCATION
       ( "IMPORT_TEST":'TEST_1_PP.csv'
       )
    )

Now the external table is reading all the records correctly.

Mario Douh
  • 49
  • 6