I'm trying to read a table on Databricks to a DataFrame using the pyspark.pandas.read_table
and receive the following error:
AnalysisException: [UC_COMMAND_NOT_SUPPORTED] AttachDistributedSequence is not supported in Unity Catalog.;
AttachDistributedSequence[__index_level_0__#767L, _c0#734, carat#735, cut#736, color#737, clarity#738, depth#739, table#740, price#741, x#742, y#743, z#744] Index: __index_level_0__#767L
+- SubqueryAlias spark_catalog.default.diamonds
+- Relation hive_metastore.default.diamonds[_c0#734,carat#735,cut#736,color#737,clarity#738,depth#739,table#740,price#741,x#742,y#743,z#744] csv
The table was created following the Databricks Quick Start notebook:
DROP TABLE IF EXISTS diamonds;
CREATE TABLE diamonds
USING csv
OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true")
I'm trying to read the table with
import pyspark.pandas as ps
psdf = ps.read_table("hive_metastore.default.diamonds")
and get the error above.
Reading the table into spark.sql.DataFrame
works fine with
df = spark.read.table("hive_metastore.default.diamonds")
The cluster versions are
Databricks Runtime Version 11.2
Apache Spark 3.3.0
Scala 2.12
I'm familiar with pandas already and would like to use pyspark.pandas.DataFrame
since I assume it will have a familiar API and be quick for me to learn and use.
The questions I have:
- What does the error mean?
- What can I do to read the tables to
pyspark.pandas.DataFrame
? - Alternatively, should I just learn
pyspark.sql.DataFrame
and use that? If so, why?