using the following example on a large table:
pages = spark.sql('select * from table xx')
, I found that the query runs in seconds, but as soon as I want to see the data with pages.show(n=10)
it takes minutes to get the data to have a sample of that data. What is happening under the hood to be so slow.
the SQL (spark.sql
) command takes < 1 second but the pages.show(n=10)
takes minutes.