0

Say that I have a system that does not support SQL queries. This system can store tabular or maybe even non-tabular data.

This system has a REST API that allows me to access it's data objects (a table, for example).

Now, my solution for allowing SQL queries to be executed on this data has been to download the contents of the entire data object (table) into a pandas DataFrame and then use duckdb to execute SQL statements.

The obvious drawback of this method is that I am storing all of this data that I don't even need in a DataFrame before the query is even executed. This can potentially cause memory issues, especially when querying large data objects.

What is a more efficient way to approach this? I am open to approaches using duckdb or otherwise.

Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35

0 Answers0