Questions tagged [duckdb]

Issues related to the usage of DuckDB (www.duckdb.org)

180 questions
0
votes
0 answers

Clang Error when installing DuckDb as part of a project

Was trying to setup a github repo locally (owid/owid-grapher) but keep running into this clang error issue which leads to node-gyp and node-pre-gyp errors I call it using npm_config_build_from_source=true yarn install and then everything works up…
0
votes
2 answers

Fastest way to get exact count of rows for a 100GB CSV file stored on S3

What is the fastest way of getting an exact count of rows for a 100GB CSV file stored on Amazon S3 without using Athena nor any Fargate or EC2 VM? I can't use Athena, because the CSV file isn't clean-enough for it. I can't use Fargates or EC2 VMs,…
Ismael Ghalimi
  • 3,515
  • 2
  • 22
  • 25
0
votes
0 answers

duckdb nocase collation not working with IN cluase

I am not able to use nocase collation within IN clause. When I am querying data there are no rows (expected result due to case sensitive collation). import duckdb con = duckdb.connect(database=':memory:') con.execute("SELECT col1, col2 FROM (values…
Peter Trcka
  • 1,279
  • 1
  • 16
  • 21
0
votes
0 answers

R crashes when executing SQL query

I have a database that I want to query in R using duckdb. The two tables in question are large, 183 million rows by eight columns. When I execute the following code: compareid <- dbGetQuery(con, "(SELECT UserId FROM d14072021 EXCEPT SELECT UserId…
0
votes
1 answer

Duckdb_read_csv struggling with with auto detecting column data types in R

I have some very large CSV files (~183mio. rows by 8 columns) that I want to load into a database using R. I use duckdb for this and it its built-in function duckdb_read_csv, which is supposed to auto-detect datatypes for each column. If I enter the…
0
votes
0 answers

How to query a database copied from clipboard using Pandas as DuckDB

I'm trying to test a simple SQL query which should do something like this: import duckdb import pandas as pd df_test = pd.read_clipboard() duckdb.query("SELECT * FROM df_test").df() Which works but I can't get the following query to work. select…
elksie5000
  • 7,084
  • 12
  • 57
  • 87
0
votes
1 answer

How to import SQLite data into DuckDB?

How to import SQLite data into DuckDB? Or is it possible to query the SQLite data files directly from DuckDB? A presentation from author of DuckDB mentioned such a feature.
vega77
  • 52
  • 3
0
votes
2 answers

Importing parquet file in chunks and insert in DuckDB

I am trying to load the parquet file with row size group = 10 into duckdb table in chunks. I am not finding any documents to support this. This is my work so on: see code import duckdb import pandas as pd import gc import numpy as np # connect to…
Sushmitha
  • 111
  • 5
0
votes
1 answer

DuckDB for reading multiple parquet files on s3

I am trying to use DuckDB with the HTTPFS extension to query around 1000 parquet files with the same schema from an s3 bucket with a similar key. When I query a single file with duckdb I'm able to get the table import duckdb import pandas as…
0
votes
0 answers

How to restrict updates on duckdb view?

I have a table as items, and a view is created from it as below. # I have a table items as below dbas_db_con.execute("SELECT *FROM items").df() # Created a view items_v1 from items dbas_db_con.execute("CREATE VIEW items_v1 AS SELECT *FROM…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
2 answers

Read multiple files into DuckDB in R from CSV, with new variable indicating year from filename

I have several (8) large files (1M rows each) with the same variables/format saved individually by year. I would like to save to a single table using the duckdb database format in R. The duck_read_csv() command does this nicely. The problem: there…
Matt L.
  • 2,753
  • 13
  • 22
0
votes
1 answer

DuckDB Group by Pandas Index

I have a pandas dataframe with a multiindex. How can I reference the indexes in a duck db query? import duckdb import pandas as pd import numpy as np df = pd.DataFrame({ 'i1': np.arange(0, 100), 'i2': np.arange(0, 100), 'c':…
fny
  • 31,255
  • 16
  • 96
  • 127
0
votes
0 answers

How to make changes in Arrow and\or DuckDB datasets?

I am working on a project that includes cleaning a large dataset. I learned how to create a dataset using multiple parquet files, but I did not find a way to make changes and overwrite, delete, or mutate new columns to the dataset. Hope you can help…
0
votes
0 answers

Efficient SQL Using duckdb

Say that I have a system that does not support SQL queries. This system can store tabular or maybe even non-tabular data. This system has a REST API that allows me to access it's data objects (a table, for example). Now, my solution for allowing SQL…
Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35
0
votes
1 answer

DuckDB database causing Catalog Error : Serialisation on Mac but works on Windows

I am trying to run a tkinter+DuckDB program on M2 Macbook Air and the following error is coming: The program works without any errors on my Windows machine. This is the line specified in the error. try: conn =…
pr0grmr
  • 57
  • 5