I have a lot of parquet and Delta files sitting in ADLS Gen2. These files have SQL views pointing against them that are defined in a Synapse serverless SQL pool (across several DBs). I recently discovered that some views are unable to select some of the data in the files due to rows with "bad data". The "bad data" are 0001-01-01 date / timestamp values. This is bad data because Synapse has a bug that prevents selection of these values, as discussed here. I want to programmatically identify all SQL views (or, equivalently, all offending parquet / Delta files in ADLS) that are failing to return data when queried due to this bad data. This is easy to do manually: simply run a query that selects all rows from the given view; if the query errors out, then it is due to bad data. (I can guarantee that any query failures are due to this bad data root cause.) However, how can I programmatically do this? If it matters, I have Azure Databricks available to me, and I have a list of all SQL views that have this potential.
Asked
Active
Viewed 186 times
-1
-
Could you provide your attempt? What is the error/issue in your attempt? – Aswin Oct 18 '22 at 05:29
-
I don't even know where to begin with this. The link that I referenced gives the error, but knowing the error isn't the problem. The problem is being able to identify (en masse) which SQL views are currently affected by this error. My identification strategy for one view is to just execute the view and visually inspect the result to see if the error occurs. I don't know how to programmatically do this for all SQL views in question. – skyline01 Oct 18 '22 at 13:04
-
One approach might be to "programmatically" write a single SQL script that queries all SQL views (and visually review the results). My guess is that there is a better approach, though. – skyline01 Oct 18 '22 at 13:21
1 Answers
1
Maybe this...
select schema_name(schema_id) as schema_name,
name as view_name
from sys.views
order by schema_name,
view_name;

ASH
- 20,759
- 19
- 87
- 200