Questions tagged [ibis]

Ibis is an open source Python framework to access data and perform analytical computations from different sources (e.g. sqlite, duckdb, postgres, spark, clickhouse, bigquery, and more), in a standard way. Use for questions related to configuring Ibis, or problems using Ibis that are not covered in the official tutorial.

From https://ibis-project.org/docs/3.2.0/tutorial/01-Introduction-to-Ibis:

Introduction to Ibis

Ibis is a Python framework to access data and perform analytical computations from different sources, in a standard way.

In a way, you can think of Ibis as writing SQL in Python, with a focus on analytics, more than simply accessing data. And aside from SQL databases, you can use it with other backends, including big data systems.

Why not simply use SQL instead? SQL is great and widely used. However, SQL has different flavors for different database engines, and SQL is very difficult to maintain when your queries are very complex. Ibis solves both problems by standardizing your code across backends and making it maintainable. Since Ibis is Python, you can structure your code in different files, functions, name variables, write tests, etc.

Based on the ibis/README.md it currently supports the following:

  • Apache Impala
  • Google BigQuery
  • ClickHouse
  • HeavyAI
  • Dask
  • DuckDb
  • MySQL
  • Pandas
  • PostgreSQL
  • PySpark
  • Sqlite

References

34 questions
3
votes
1 answer

Efficiency in using pandas and parquet

People talk a lot about using parquet and pandas. And I am trying hard to understand if we can utilize the entire features of parquet files when used with pandas. For instance say I have a big parquet file (partitioned on year) with 30 columns…
Xion
  • 319
  • 2
  • 11
2
votes
1 answer

Change case of all column names with Ibis

I have an Ibis table named t. Its column names are all lowercase. I want to change them all to uppercase. How can I do that?
ianmcook
  • 537
  • 4
  • 10
1
vote
1 answer

Creating a table from literal values in Ibis

I'd like to use Ibis to create a table from literal values instead of a table. In BigQuery SQL, I might do this with the a combination of the array and struct data types. See this example from the BigQuery docs. WITH races AS ( SELECT "800M" AS…
Tim Swast
  • 14,091
  • 4
  • 38
  • 61
1
vote
1 answer

How to solve module 'pyarrow.lib' has no attribute 'NAType' when importing ibis?

In python notebook, when im importing ibis its giving me the following error; import ibis Error; :219: RuntimeWarning: pyarrow._fs.FileSystem size changed, may indicate binary incompatibility. Expected 48 from C header,…
Sevval Kahraman
  • 1,185
  • 3
  • 10
  • 37
1
vote
0 answers

best practice for ibis database connections

What's the best practice for storing/re-using the ibis database connection between functions in python? Currently, I'm using a function to connect to the db that looks like this: def get_ibis_db(app): try: connection =…
1
vote
0 answers

Ibis cant write from pandas to parquet in impala

I am unable to create a table from a pandas dataframe. hdfs = ibis.impala.hdfs_connect(host=host_name, port = n_port, protocol = "https", auth_mechanism='GSSAPI', verify = False ) con = ibis.impala.connect( host = impala_host, …
1
vote
1 answer

Adding a time interval to a date column

I am using ibis with the bigquery backend. I want to add a time interval to a date, using the .add() method. However I can't figured out how to specify such a time interval: the "which_type_here" variable in the code below. Thanks for your…
jcmincke
  • 41
  • 2
1
vote
0 answers

Specifying datetime64 resolution in Ibis when converting to Pandas DataFrame

I have a MySQL database with datetime values shifted by arbitrary amounts for de-identification purposes. So, for example, I have a date value of datetime.datetime(2644, 1, 17, 0, 0) . If I query these values with pymysql or Pandas I get a fine…
1
vote
0 answers

How to concurently run queries using Impala with Python code?

Context I use Python (3.7) to run several queries on an Hadoop server. After several tests, I think Impala is the most efficent engine to query the database. So I set up a connexion using Ibis framework in order to force the use of Impala (Hive is…
Philippe
  • 11
  • 3
1
vote
3 answers

Using Python to connect to Impala database (thriftpy error)

What I'm trying to do is very basic: connect to an Impala db using Python: from impala.dbapi import connect conn = connect(host='impala', port=21050, auth_mechanism='PLAIN') I'm using Impyla package to do so. I got this error: Traceback (most…
ds_enth
  • 49
  • 3
  • 9
1
vote
0 answers

Trying to connect to Impala server that uses kerberos using the ibis

I'm trying to connect to Impala server that uses kerberos using the ibis.impala.connect like so: import ibis client = ibis.impala.connect(host='grid.company.corp', port=21050, …
ZuluagaSD
  • 73
  • 1
  • 7
1
vote
1 answer

how to set impala namenode rpc port number for python ibis or requests

I'm using 'ibis-framework'. I have hdfs_client = ibis.hdfs_connect(...) impala_client = ibis.impala.connect(..., hdfs_client=hdfs_client) db = impala_client.database('abc') data = pd.DataFrame(...) db.create_table('tb_name', obj=data,…
zpz
  • 354
  • 1
  • 3
  • 16
0
votes
1 answer

Obtaining query plans for Ibis expression

I'm trying to obtain query plans from the DuckDB backend for an Ibis expression using the con.explain interface which is throwing an error. Am I using the con.explain function correctly? def init_ddb_from_csv(db_filename, tablename, csv_filename,…
0
votes
0 answers

Not able to execute Substrait plan using pyarrow

I've created Substrait query plan using Ibis, and stored it as .proto file. Then I successfully executed it using duckDB after reading the matching tables (following the ibis-substrait tutorial) Then I read the same tables using pyarrow and wanted…
Omri
  • 43
  • 1
  • 4
0
votes
1 answer

Replace missing values with mean using Ibis

How can I use Ibis to fill missing values with the mean? For example, if I have this data: import pandas as pd import ibis from ibis import _ ibis.options.interactive = True df = pd.DataFrame(data={'fruit': ['apple', 'apple', 'apple', 'orange',…
ianmcook
  • 537
  • 4
  • 10
1
2 3