Questions tagged [duckdb]

Issues related to the usage of DuckDB (www.duckdb.org)

180 questions
1
vote
1 answer

R: DuckDB DBconnect is very slow - Why?

I have a *.csv file containing columnar numbers and strings (13GB on disk ) which I imported into a new duckdb (or sqlite) database and saved it so I can access it later in R. But reconnecting duplicates it and is very slow, is this wrong? From…
HCAI
  • 2,213
  • 8
  • 33
  • 65
1
vote
1 answer

PInvoke struct with nested struct array

I'm trying to PInvoke a method which has a struct parameter with nested struct array pointer. The c declaration looks like this: duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result); typedef struct…
Giorgi
  • 30,270
  • 13
  • 89
  • 125
0
votes
0 answers

How to view DuckDB database in DBeaver?

I created a DuckDB database in python and I'm getting: con = duckdb.connect('../data/proposal.db') con.sql("SELECT COUNT(*) FROM proposals") >>> ┌──────────────┐ │ count_star() │ │ int64 │ ├──────────────┤ │ 200000…
ruslaniv
  • 458
  • 1
  • 6
  • 14
0
votes
1 answer

Not seeing file-level pushdown predicate filtering querying hive-partitioned table in S3

I am using DuckDB in DuckDB-WASM. I am creating a view on top of a hive-partitioned table in S3 with SQL like: create or replace view my_view as select Part1 as part_1 , Part2 as part_2 , Column1 as column_1 , Column2 as column_2 from…
Dude0001
  • 3,019
  • 2
  • 23
  • 38
0
votes
0 answers

Write a Dataframe as Parquet file in S3 Bucket with DuckDB-Python API

I have a DuckDB Dataframe with 5 GB of Data, I would like to write the same to S3 Bucket as Parquet file, I see DuckDB Commands, but not able to find the python API for the same, any help he is appreciated
Sandeep540
  • 897
  • 3
  • 13
  • 38
0
votes
1 answer

correlated subqueries in duckdb

I'm writing this correlated subquery in duckdb and I cant figure out why its not working. can someone explain why? Many thanks! select o.__policy, o.__timestamp, q.__filename as lastquotefile from b_n_f b left join q_n_f q on b.__policy =…
smaillis
  • 298
  • 3
  • 12
0
votes
2 answers

Unable to access AWS S3 parquet file from AWS Lambda using duckdb

I have a parquet file stored in AWS S3. Assume the location is s3://bucket/file.parquet. I defined a function in AWS Lambda to access this parquet file by using the code below. import os def lambda_handler(event, context): import…
pass-by-ref
  • 1,288
  • 2
  • 12
  • 20
0
votes
2 answers

How to count total unique user doing transaction each month

Hi there is a order_table, the table contains the following fields order_id, user_id, item_id, gmv, order_time. I already write to find the month from transactions %%sql SELECT DISTINCT bulan AS bulan_transaksi FROM ( SELECT …
Bin Ski.
  • 849
  • 1
  • 8
  • 10
0
votes
1 answer

How to dynamially write a csv in duckdb?

I am running the same analysis across multiple directories with the same file structure. I just change the file_search_path to the right directory and it works great. The issue I'm havin is how to dynamically save my csv files with different…
yake84
  • 3,004
  • 2
  • 19
  • 35
0
votes
1 answer

Writing .parquet from duckdb prefixes column names with "PARGO_PREFIX_"

DuckDB is changing my column names as I write out to .parquet file, and I can't figure out why. In a DuckDB memory only instance (on Ubuntu 23.04) I run: create table mytable (_id int, str varchar, num int); insert into mytable (_id, str, num)…
Jeff Breadner
  • 1,366
  • 9
  • 19
0
votes
2 answers

Replace white space from column name with underscore in DuckDB Python client API

I have a DuckDB table whose column names have white spaces, and I'd like to just specify a blanket rule that says "for all columns with spaces, replace it with an underscore". I know how to do this by converting the table to a Polars DataFrame, but…
prrao
  • 2,656
  • 5
  • 34
  • 39
0
votes
1 answer

Subquery returning multiple columns in duckDB

I would like to group by first_name and for each fist_name get the lowest age. My query which I run in online sql compiler works fine but when I try to use duckDB in python I get error that I try to return multiple columns, but this is exactly what…
Kucharsky
  • 201
  • 3
  • 16
0
votes
1 answer

How can I select or alias a duckdb relation column which has an aggregate function in its column name using the Python-API?

The DuckDB Python API lets you compose complex queries by building it up from chained functions on a relation. For example, to do a group by, one can do a simple select, and then use the aggregate function on the select relation like this: rel =…
tomanizer
  • 851
  • 6
  • 16
0
votes
1 answer

read_json_auto in DuckDb without involving files

I'm looking for a way to build up a DuckDB table akin to read_json_auto with the following constraints: Must work in-memory only. I want to avoid having to load a file from FS Must be cross-plattform compatible Is there a way to do…
Bogey
  • 4,926
  • 4
  • 32
  • 57
0
votes
1 answer

Linking two containers in a single task definition in AWS fargate

Hi I am trying to deploy two containers one for DUCKDB which gets data from my s3 bucket and the other is a streamlit container which displays the frontend with some text box and dashboarding on the data collected from S3( A text box to run some…