1

I have a csv column that has data with \r character. How can write a query to eliminate such data

SELECT rv FROM s3object s

this gives me:

enter image description here

I don't want such rows. Want to eliminate it all.

This query still returns me the same results

SELECT rv FROM s3object s where rv!='\r'
Denil Parmar
  • 154
  • 12
  • what's the datatype of rv column? – eshirvana Jan 11 '22 at 19:52
  • Can you show us an example of the rows with digits? It might be possible to use `WHERE rv > 0` or something like that. – John Rotenstein Jan 12 '22 at 05:37
  • {"rv": "61\r"},{"rv": "62\r"},{"rv": "63\r"} .......... These are just mere numbers in that rv column in the csv file, but with s3 select I get this kind of output. @JohnRotenstein – Denil Parmar Jan 12 '22 at 06:09
  • Some rows contain 2-digit numbers and some rows are empty. And these empty ones have "\r" when parsed as json in the output @eshirvana – Denil Parmar Jan 12 '22 at 06:14
  • Oh! They also have the `\r` in that column! What does the file itself look like, if you open it in a Text Editor? – John Rotenstein Jan 12 '22 at 06:17
  • `CUST_AMT,RV 23.33, 200,61 300,62 400,63 ` This is how it looks like when opened with a text editor @JohnRotenstein – Denil Parmar Jan 12 '22 at 20:01
  • That format looks confusing -- there are spaces in strange places, or perhaps they are the `\r` characters. Do the values have commas to indicate decimals (eg `400,63` is the same as `400.63` using the European standard)? If so, why does `23.33` not follow this format? I was hoping to reproduce your situation, but I still can't see enough to create a similar input file with the \r characters. – John Rotenstein Jan 12 '22 at 21:19
  • https://controlc.com/22ac9244 This is the exact csv data paste file when opened with text editor @JohnRotenstein The first row is the header row and the last column is the RV column causing the problem. – Denil Parmar Jan 14 '22 at 03:18
  • I copied that data into a file and ran the query and there were no `\r` characters. I think this is because they are non-text characters, so when you used Copy & Paste, they didn't copy across. I'd need the actual BINARY file to test it, rather than a Copy & Paste of the visible text. – John Rotenstein Jan 14 '22 at 03:30
  • shorturl.at/fFS49 Here's a link to the file @JohnRotenstein – Denil Parmar Jan 14 '22 at 04:18

1 Answers1

1

Your file has 0x0d 0x0a (CR LF) at the end of each line. This is often generated by Windows software.

It appears that S3 Select doesn't know how to handle the combination, so the \r is treated as part of the last field.

You can 'fix' this by ignoring the last character of the last field:

SELECT
  SUBSTRING(rv FROM 1 FOR CHAR_LENGTH(rv) - 1) AS rv
FROM s3object s
WHERE char_length(rv) > 1 -- Optional
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • `FOR CHAR_LENGTH(rv) - 1` this part can be ignored? as default is till the end of the string – Denil Parmar Jan 14 '22 at 05:11
  • No, you can't ignore that. The `\r` is appended to the **end** of the field, so you need to tell S3 Select to only retrieve the substring up to, but not including, the last character. – John Rotenstein Jan 14 '22 at 05:50