1

I have a dataset (8.5 mill rows), where all values in all columns must be enclosed in quotation symbols (" "). I have discovered that there is a problem - some few records holds values in some columns with the last quotation symbol missing. Now I need to try to get an overview on the issue - which columns have examples of this error (it is due to truncation upstream in the solution).

From the example dummy data inserted below:

How do I write a query, which outputs the columns "Last name" and "Age" due to the missing end quotation in row 2 and 3 in these columns? To be clear - how do I identify columns with sporadic truncated values?

Example data with missing quotation symbols

Thanks,

knn

KNN
  • 13
  • 3

1 Answers1

1

You need to check the last character in the string. It is done using the substring function, passing an argument of -1 means the last character. And you check to see if it is different than the double quote symbol.

SELECT * FROM YourTable
WHERE 
substr("Last Name", -1) <> '"' OR substr(Age,-1) <> '"'

You can play around with it here http://sqlfiddle.com/#!4/10a77e/1

Gabriel Durac
  • 2,610
  • 1
  • 12
  • 13