0

I have that kind of file

info1;info2;info3";info4;info5

And after parsing I have that error

Error: [42636] ETL-2106: Error while parsing row=0 (starting from 0) [CSV Parser found at byte 5 (starting with 0 at the beginning of the row) of 5 a field delimiter after an quoted field (with an additional whitespace) in file '~/path'. Please check for correct enclosed fields, valid field separators and e.g. unescaped field delimiters that are contained in the data (these have to be escaped)]

I'm sure that the reason is here info3"; but how can I solve this problem I have no idea

Also I can't rid of quotes, because it should be in report

The main part of python code is

 # Transform data to valid CSV format: remove BOM, remove '=' sign, remove repeating quotes in Size column
    decoded_csv = r.content.decode('utf-8').replace(u'\ufeff', '').replace('=', '')
    
    print(decoded_csv)
    
    cr = csv.reader(decoded_csv.splitlines(), delimiter=';')
    lst = list(cr)[1:]

    f = csv.writer(open(base_folder + 'txt/' + shop, "w+"), delimiter=';')
    for row in lst:
        f.writerow(row[:-2])

After this code I get that kind of file

info1;info2;"info3""";info4;info5

And it is not what I need But when I change code a little by adding "quoting=csv.QUOTE_NONE, quotechar='')"

  # Transform data to valid CSV format: remove BOM, remove '=' sign, remove repeating quotes in Size column
    decoded_csv = r.content.decode('utf-8').replace(u'\ufeff', '').replace('=', '')
    
    print(decoded_csv)
    
    cr = csv.reader(decoded_csv.splitlines(), delimiter=';')
    lst = list(cr)[1:]

    f = csv.writer(open(base_folder + 'txt/' + shop, "w+"), delimiter=';' quoting=csv.QUOTE_NONE, quotechar='')
    for row in lst:
        f.writerow(row[:-2])

I get what I need

info1;info2;info3";info4;info5

It is a 2nd step (exasol) and code returned the error

MERGE INTO hst AS dst
USING (
    SELECT DISTINCT
      ar,
      ar_na,
    FROM (
        IMPORT INTO
        (
            ar                  VARCHAR(100) UTF8 COMMENT IS 'ar',
            ar_na           VARCHAR(100) UTF8 COMMENT IS 'ar na',
        
        )
        FROM CSV /*SS:R*/
        AT '&1'
        USER '&2'
        IDENTIFIED BY '&3'
        FILE '~/path'
        SKIP = 0
        ROW SEPARATOR = 'CRLF'
        COLUMN SEPARATOR = ';'
        TRIM
     )
    GROUP BY
        ar,
        ar_na,
        
) src ON src.ar = dst.ar

WHEN MATCHED THEN UPDATE SET
    dst.ar_na = src.ar_na,
    
WHEN NOT MATCHED THEN
INSERT (
        ar
        ar_na,      
)
VALUES (
        src.ar,
        src.ar_na,

);

If file looks like info1;info2;info3;info4;info5 everything works fine, all scripts work

CyclikP
  • 1
  • 2
  • Can you please post the code that returned the error? It'll make it a lot easier to help you out. – Marco Kurepa Aug 12 '22 at 12:24
  • added code, hope it will help – CyclikP Aug 12 '22 at 16:04
  • It would help more if you identified what tool you are using (or which SQL variant, if that's SQL). And why did you tag your question [tag:python]? – rici Aug 12 '22 at 18:40
  • I added python because at first python prepares file - rid off quotes “info1”;”info2”;”info3”””;”info4” and than on 2nd step in exasol I got the error – CyclikP Aug 13 '22 at 10:23
  • @CyclikP: OK, I changed the python tag to exasol, which is surely much more relevant to the error. Although it's quite possible that you would get a better response if you show the entire processing pipeline, starting with the original file. (Why do you need to strip the spaces at all? Doesn't exasol's CSV parser handle quotes?) – rici Aug 14 '22 at 18:21
  • @rici also added part of python code and clarify what I do and what I get after every step, the main problem that If I delete all quotes info1;info2;info3;info4;info5 everything works well, but when I left one quotes, I get the error – CyclikP Aug 15 '22 at 05:24
  • Did you try just leaving `info1;info2;"info3""";info4;info5` in the CSV file? It looks to me like Exasol is supposed to handle that correctly (i.e. interpret the third field as `info3"`). Certainly, removing the quotes is not going to produce the correct result. – rici Aug 15 '22 at 06:21

1 Answers1

0

By default, Exaosl treats double quotes (") as column delimiter. This enables you to specify values that contain the column separator (in your case that's the semicolon). See the entry "Special characters" in the documentation.

You have two options here:

  1. Disable the column delimiter by passing COLUMN DELIMITER = '' to the import statement.
  2. Duplicate all double quotes in the csv file. Exasol ignores the column delimiter if it occurs twice consecutively.
sirain
  • 918
  • 10
  • 19