I am unable to create a table from a pandas dataframe.
hdfs = ibis.impala.hdfs_connect(host=host_name,
port = n_port,
protocol = "https",
auth_mechanism='GSSAPI',
verify = False
)
con = ibis.impala.connect(
host = impala_host,
port = imapla_port,
auth_mechanism = "GSSAPI",
kerberos_service_name="impala",
use_ssl = True,
hdfs_client=hdfs,
database=schema
)
df = pd.DataFrame({'foo': [1, 2, 3, 4], 'bar': ['a', 'b', 'c', 'd']})
con.create_table('pandas_table',df)
The log seems to be:
..../pandas_5cefdfabb2ae4c6d870a533bf234b689/0.csv has an invalid Parquet version number: 4,d\n\n
Is it normal for ibis to try to insert a csv to a parquet table? Shouldn't it be a parquet table? The only solution I see is to create a text table and insert it with a select.