1

I am unable to create a table from a pandas dataframe.

hdfs = ibis.impala.hdfs_connect(host=host_name, 
  port = n_port,
    protocol = "https",
  auth_mechanism='GSSAPI',
  verify = False
    )
con = ibis.impala.connect(
      host = impala_host,
    port = imapla_port,
    auth_mechanism = "GSSAPI",
    kerberos_service_name="impala",
    use_ssl = True,
    hdfs_client=hdfs,
    database=schema
    )
    df = pd.DataFrame({'foo': [1, 2, 3, 4], 'bar': ['a', 'b', 'c', 'd']})
con.create_table('pandas_table',df)

The log seems to be:

..../pandas_5cefdfabb2ae4c6d870a533bf234b689/0.csv has an invalid Parquet version number: 4,d\n\n

Is it normal for ibis to try to insert a csv to a parquet table? Shouldn't it be a parquet table? The only solution I see is to create a text table and insert it with a select.

0 Answers0