3

I am trying to import a html snippet which is part of one of the column in csv. There are double quotes in the html snippet and its is escaped. this csv is created using apache spark.

for illustrating the issue i have just created 2 columns with minimal data.

CREATE TABLE logs.processing ( ts String,text String)  ENGINE = Log

cat sample.csv // Content of the file

"Fri, 01 May 2020 06:47:05 UTC","<html id=\"html-div\">"

The the import command is issued following exception is thrown.

cat sample.csv | clickhouse-client --query="INSERT INTO logs.processing FORMAT CSV"

Exception

Code: 117. DB::Exception: Expected end of line

if i change the content of sample.csv to

"Fri, 01 May 2020 06:47:05 UTC","col2"

It works fine.

Could you please help me on this issue.

Thanks.

Shivakumar ss
  • 653
  • 7
  • 19

2 Answers2

7

The CSV spec requires:

  1. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

It needs to either initially generate valid CSV-file or fix it before passing to CH-client:

cat sample.csv | sed 's/\\"/""/g' | clickhouse-client --query="INSERT INTO logs.processing FORMAT CSV"
Community
  • 1
  • 1
vladimir
  • 13,428
  • 2
  • 44
  • 70
0

I had posted the query in CH github. looks like as of now they have only double quote as escaping character only.

https://github.com/ClickHouse/ClickHouse/issues/10624

Shivakumar ss
  • 653
  • 7
  • 19