Questions tagged [ibis]

Ibis is an open source Python framework to access data and perform analytical computations from different sources (e.g. sqlite, duckdb, postgres, spark, clickhouse, bigquery, and more), in a standard way. Use for questions related to configuring Ibis, or problems using Ibis that are not covered in the official tutorial.

From https://ibis-project.org/docs/3.2.0/tutorial/01-Introduction-to-Ibis:

Introduction to Ibis

Ibis is a Python framework to access data and perform analytical computations from different sources, in a standard way.

In a way, you can think of Ibis as writing SQL in Python, with a focus on analytics, more than simply accessing data. And aside from SQL databases, you can use it with other backends, including big data systems.

Why not simply use SQL instead? SQL is great and widely used. However, SQL has different flavors for different database engines, and SQL is very difficult to maintain when your queries are very complex. Ibis solves both problems by standardizing your code across backends and making it maintainable. Since Ibis is Python, you can structure your code in different files, functions, name variables, write tests, etc.

Based on the ibis/README.md it currently supports the following:

  • Apache Impala
  • Google BigQuery
  • ClickHouse
  • HeavyAI
  • Dask
  • DuckDb
  • MySQL
  • Pandas
  • PostgreSQL
  • PySpark
  • Sqlite

References

34 questions
0
votes
0 answers

connecting to impala by ibis

I tried to connect impala server in jupyter notebook. when I used the following code: con = ibis.impala.connect( host="sz11hdp06a.infra.bird.bi.eb-grp.net", port=21050, auth_mechanism='GSSAPI', use_ssl= True, …
Samira
  • 1
  • 1
0
votes
0 answers

Specify schemaname in table references in ibis

The following code works fine if table t is on the connecting user's search_path. I would like to be able to specify, in code, which schema to search. The only way I can make the code work is by issuing a alter user on the backend. alter user…
Niels Jespersen
  • 410
  • 4
  • 16
0
votes
1 answer

Why can't I connect Ibis to a postgres table with a tsvector column?

When trying to connect to a postgres table with a tsvector column, I get the following error message: KeyError Traceback (most recent call last) File…
0
votes
0 answers

ibis-framework: How to connect to Impala using ODBC / DSN?

I would like to use the Python library ibis (ibis-framework) to run queries against an Impala backend. When I use pyodbc to connect to Impala, I use the DSN: con_odbc = pyodbc.connect("DSN=my_dsn_name", autocommit=True) Unfortunately, I have not…
der_grund
  • 1,898
  • 20
  • 36
0
votes
1 answer

Importing IBIS in Python

I am trying to use ibis in my code and while importing it gives an error mentioned below stating that an attribute is called that doesn't exist and when I checked I found the same thing. It also doesn't have a window attribute inside the rules…
0
votes
1 answer

What are the changes to Ibis 2.0?

The Release Notes page on the Ibis project website does not describe the most recent update to Ibis (from 1.4 to 2.0). What are the changes?
adam.r
  • 247
  • 2
  • 12
0
votes
1 answer

impala.error.HiveServer2Error: Failed after retrying 3 times

I use impyla and ibis to connect hive server, but I got the error. I tried the following code: from impala.dbapi import connect impcur = connect(host="kudu3", port=10000, database="yingda_test", password=None, user='admin',…
jxfruit
  • 1
  • 2
0
votes
1 answer

ibis ImpalaTable to pyspark dataframe

In my case, I need to load impala data to spark(pyspark). Because I want to use FPGrowth of spark mllib. Data is in kudu and it was made by impala. Connecting to directly kudu on spark was rejected by a relevant department. And I also failed…
0
votes
1 answer

Is there a way to iterate over table rows using Ibis (impala)

I have a fairly large Ibis TableExpr for which I would like to iterate over the rows to produce a specialized file output (FASTA nucleotide sequences). Is there any way to do this with Ibis, or should I just call execute to create a pandas DataFrame…
adam.r
  • 247
  • 2
  • 12
0
votes
2 answers

Ibis pandas dataframe connection

How can we work with ibis and pandas dataframe? conn = ibis.pandas.connect({'data': dataframe}) projection = conn.table('data') It is throwing error : module 'ibis' has no attribute 'pandas' Any suggestions would be appreciated
Shivi
  • 9
  • 3
0
votes
1 answer

Ibis create impala table with pandas dataframe and get [Error 61] Connection refused

After doing impyla sql statement, I convert the results into pandas dataframe format. But now I want to auto create a temporary table on impala using Apache Ibis to create table and load a dataframe into it. The following codes are divided into 3…
Eric.XY
  • 11
  • 4
0
votes
0 answers

Unable to connect to Impala through AWS Lambda using Ibis

I have an AWS Lambda function written in python2.7. Using the lambda function I am trying to connect to Impala (installed on an ec2 instance). The py lambda function uses Ibis to connect to Impala. While trying to test from Lambda, I get the below…
SRMara
  • 11
  • 5
0
votes
1 answer

Converting simple impala sql query to ibis

I'm trying to convert a simple Impala sql query to an ibis query in python, but I'm having trouble understanding ibis's syntax when converting from sql. So far, I've tried this: agg = joblist_table_handle.lastupdatedate.max() joblist =…
Trincity
  • 149
  • 9
0
votes
1 answer

Ibis Impala JOIN problem with relabel/name 'column AS newName'

When you use the Ibis API to query impala, for some reason Ibis API forces it to become a subquery (when you join 4-5 tables it suddenly becomes super slow). It simply won't join normally, due to column name overlap problem on joins. I want a way to…
Dexter
  • 6,170
  • 18
  • 74
  • 101
0
votes
1 answer

Trying to load Python dataframe into Hadoop (Impala) using `ibis`, getting "AttributeError: module 'ibis' has no attribute 'impala' "

I'm running the following block of Python commands in a Jupyter notebook to upload my dataframe, labeled df, to Impala: import hdfs from hdfs.ext.kerberos import KerberosClient import pandas as pd import ibis hdfs = KerberosClient('< URL address…
RobertF
  • 824
  • 2
  • 14
  • 40