Issues related to the usage of DuckDB (www.duckdb.org)
Questions tagged [duckdb]
180 questions
0
votes
0 answers
Clang Error when installing DuckDb as part of a project
Was trying to setup a github repo locally (owid/owid-grapher) but keep running into this clang error issue which leads to node-gyp and node-pre-gyp errors
I call it using npm_config_build_from_source=true yarn install and then everything works up…

You Know Ball
- 1
- 1
0
votes
2 answers
Fastest way to get exact count of rows for a 100GB CSV file stored on S3
What is the fastest way of getting an exact count of rows for a 100GB CSV file stored on Amazon S3 without using Athena nor any Fargate or EC2 VM? I can't use Athena, because the CSV file isn't clean-enough for it. I can't use Fargates or EC2 VMs,…

Ismael Ghalimi
- 3,515
- 2
- 22
- 25
0
votes
0 answers
duckdb nocase collation not working with IN cluase
I am not able to use nocase collation within IN clause.
When I am querying data there are no rows (expected result due to case sensitive collation).
import duckdb
con = duckdb.connect(database=':memory:')
con.execute("SELECT col1, col2 FROM (values…

Peter Trcka
- 1,279
- 1
- 16
- 21
0
votes
0 answers
R crashes when executing SQL query
I have a database that I want to query in R using duckdb. The two tables in question are large, 183 million rows by eight columns. When I execute the following code:
compareid <- dbGetQuery(con, "(SELECT UserId FROM d14072021 EXCEPT SELECT UserId…

dominik hauser
- 1
- 1
0
votes
1 answer
Duckdb_read_csv struggling with with auto detecting column data types in R
I have some very large CSV files (~183mio. rows by 8 columns) that I want to load into a database using R. I use duckdb for this and it its built-in function duckdb_read_csv, which is supposed to auto-detect datatypes for each column. If I enter the…

dominik hauser
- 1
- 1
0
votes
0 answers
How to query a database copied from clipboard using Pandas as DuckDB
I'm trying to test a simple SQL query which should do something like this:
import duckdb
import pandas as pd
df_test = pd.read_clipboard()
duckdb.query("SELECT * FROM df_test").df()
Which works but I can't get the following query to work.
select…

elksie5000
- 7,084
- 12
- 57
- 87
0
votes
1 answer
How to import SQLite data into DuckDB?
How to import SQLite data into DuckDB? Or is it possible to query the SQLite data files directly from DuckDB? A presentation from author of DuckDB mentioned such a feature.

vega77
- 52
- 3
0
votes
2 answers
Importing parquet file in chunks and insert in DuckDB
I am trying to load the parquet file with row size group = 10 into duckdb table in chunks. I am not finding any documents to support this.
This is my work so on: see code
import duckdb
import pandas as pd
import gc
import numpy as np
# connect to…

Sushmitha
- 111
- 5
0
votes
1 answer
DuckDB for reading multiple parquet files on s3
I am trying to use DuckDB with the HTTPFS extension to query around 1000 parquet files with the same schema from an s3 bucket with a similar key.
When I query a single file with duckdb I'm able to get the table
import duckdb
import pandas as…

A Simple Programmer
- 494
- 4
- 15
0
votes
0 answers
How to restrict updates on duckdb view?
I have a table as items, and a view is created from it as below.
# I have a table items as below
dbas_db_con.execute("SELECT *FROM items").df()
# Created a view items_v1 from items
dbas_db_con.execute("CREATE VIEW items_v1 AS SELECT *FROM…

myamulla_ciencia
- 1,282
- 1
- 8
- 30
0
votes
2 answers
Read multiple files into DuckDB in R from CSV, with new variable indicating year from filename
I have several (8) large files (1M rows each) with the same variables/format saved individually by year.
I would like to save to a single table using the duckdb database format in R.
The duck_read_csv() command does this nicely.
The problem: there…

Matt L.
- 2,753
- 13
- 22
0
votes
1 answer
DuckDB Group by Pandas Index
I have a pandas dataframe with a multiindex. How can I reference the indexes in a duck db query?
import duckdb
import pandas as pd
import numpy as np
df = pd.DataFrame({
'i1': np.arange(0, 100),
'i2': np.arange(0, 100),
'c':…

fny
- 31,255
- 16
- 96
- 127
0
votes
0 answers
How to make changes in Arrow and\or DuckDB datasets?
I am working on a project that includes cleaning a large dataset. I learned how to create a dataset using multiple parquet files, but I did not find a way to make changes and overwrite, delete, or mutate new columns to the dataset.
Hope you can help…

Abdullah Abdelaziz
- 185
- 1
- 7
0
votes
0 answers
Efficient SQL Using duckdb
Say that I have a system that does not support SQL queries. This system can store tabular or maybe even non-tabular data.
This system has a REST API that allows me to access it's data objects (a table, for example).
Now, my solution for allowing SQL…

Minura Punchihewa
- 1,498
- 1
- 12
- 35
0
votes
1 answer
DuckDB database causing Catalog Error : Serialisation on Mac but works on Windows
I am trying to run a tkinter+DuckDB program on M2 Macbook Air and the following error is coming:
The program works without any errors on my Windows machine. This is the line specified in the error.
try:
conn =…

pr0grmr
- 57
- 5