Questions tagged [duckdb]

Issues related to the usage of DuckDB (www.duckdb.org)

180 questions
2
votes
1 answer

does duckDB create a copy of an R data frame when I register it?

I am trying to learn about using DuckDB in R. In my reading of the docs and what people say online, it sounds as if, when I register a data frame as a virtual table, no copy is made. Rather, a pointer is created that point to the data frame. If I…
2
votes
1 answer

DuckDB - Rank correlation is much slower than regular correlation

Comparing the following two code sections with the only difference as the second one first computes rank, the second section results in much slower performance than the first one (~5x). Although the second section involves a few more extra…
lebesgue
  • 837
  • 4
  • 13
2
votes
1 answer

SQL/DuckDB: how to calculate spearman rank correlation by groups?

I want to calculate spearman (rank) correlation in a groupby context using DuckDB/SQL syntax. I tried the following, but failed. import duckdb import pandas as pd df = pd.DataFrame( { "a": [1, 1, 2, 2, 6, 1, 3, 6, 3], "b": [4,…
Keptain
  • 147
  • 7
2
votes
2 answers

Polars is much slower than DuckDB in conditional join + groupby/agg context

For the following example, where it involves a self conditional join and a subsequent groupby/aggregate operation. It turned out that in such case, DuckDB gives much better performance than Polars (~10x on a 32-core machine). My questions are: What…
lebesgue
  • 837
  • 4
  • 13
2
votes
1 answer

How to alter data constraint in duckdb R

I am trying to alter a Not Null constraint to a Null constraint in duckdb (R api) and can't get it to stick. Here is an example of the problem. drv<- duckdb() con<- dbConnect(drv) dbExecute(con, "CREATE TABLE db(a varchar(1) NOT NULL, b varchar(1)…
matto
  • 77
  • 7
2
votes
1 answer

How to show user schema in a Parquet file using DuckDB?

I am trying to use DuckDB to show the user-created schema that I have written into a Parquet file. I can demonstrate in Python (using the code example at Get schema of parquet file in Python) that the schema is as I desire, but cannot seem to find…
rbmales
  • 143
  • 1
  • 8
2
votes
0 answers

Unsupported result column Struct()[] for DuckDB 0.7.1 from_json

I am trying to get a large set of nested JSON files to load into a table, each file is a single record and there are ~25k files. However when I try to declare the schema it errors out when trying to declare the data type if it is a struct. For…
Mitchell Hamann
  • 313
  • 4
  • 18
2
votes
1 answer

How many threads is DuckDB using?

Using duckDB from within R, e.g. library(duckdb) dbname <- "sparsemat.duckdb" con2 <- dbConnect(duckdb(), dbname) dbExecute(con2, "PRAGMA memory_limit='1GB';") how can I find out how many threads the (separate process) is using? I am aware…
Karsten W.
  • 17,826
  • 11
  • 69
  • 103
2
votes
1 answer

Querying last row of sorted column where value is less than specific amount from parquet file

I have a large parquet file where the data in one of the columns is sorted. A very simplified example is below. X Y 0 1 Red 1 5 Blue 2 8 Green 3 12 Purple 4 15 Blue 5 17 Purple I am interested in querying the last value…
jd0
  • 23
  • 3
2
votes
1 answer

Syntax for Duckdb > Python SQL with Parameter\Variable

I am working on a proof of concept, using Python and Duckdb. I am wanting to use a variable\parameter inside the Duckdb SELECT statement. For example, y = 2 dk.query("SELECT * FROM DF WHERE x > y").to_df() How can y be properly referenced? I was…
2
votes
1 answer

problem with reading partitioned parquet files created by Snowflake with pandas or arrow

ArrowInvalid: Unable to merge: Field X has incompatible types: string vs dictionary ArrowInvalid: Unable to merge: Field X has incompatible types: decimal vs int32 I am trying to write the result of a…
2
votes
1 answer

Add columns to a table or records without duplicates in Duckdb

I have the following code: import time from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler, PatternMatchingEventHandler import duckdb path = "landing/persistent/" global con con =…
Norhther
  • 545
  • 3
  • 15
  • 35
2
votes
1 answer

Does Duck DB support triggers?

I suspect the answer is no, but I just wanted to check if anyone has a way to implement triggers in DuckDB? I have a SQLite database that relies heavily on views with INSTEAD OF INSERT/ UPDATE/ DELETE triggers to mask the underlying table structure…
David
  • 21
  • 1
2
votes
1 answer

DuckDB Not saving huge database

We are trying to embed duckdb in our project but DuckDB doesn't seem to be able to save database after closing connection. Informations: Database size: 16Go Amount of tables: 3 I searched for information about data not persisting and found nothing…
xonturis
  • 98
  • 1
  • 5
2
votes
0 answers

How to determine cause of "RuntimeError: Resource temporarily unavailable" error in Python notebook

In a hosted Python notebook, I'm using the duckdb library and running this code: duckdb.connect(database=":memory:", read_only=False) This returns the following error sometimes: Traceback (most recent call last): File…
JKillian
  • 18,061
  • 8
  • 41
  • 74
1
2
3
11 12