Questions tagged [large-data-volumes]
302 questions
7
votes
3 answers
Looking for an easy to use embedded key-value store for C++
I need to write a C++ application that reads and writes large amounts of data (more than the available RAM) but always in a sequential way.
In order to keep the data in a future proof and easy to document way I use Protocol Buffer. Protocol buffer…

rodrigob
- 2,891
- 3
- 30
- 34
7
votes
5 answers
How to load 1 milion records from database fast?
Now we have a firebird database with 1.000.000 that must be processed after ALL are loaded in RAM memory. To get all of those we must extract data using (select * first 1000 ...) for 8 hours. What is the solution for this?

Leonard P.
- 71
- 1
- 4
6
votes
5 answers
mysql tables structure - one very large table or separate tables?
I'm working on a project which is similar in nature to website visitor analysis.
It will be used by 100s of websites with average of 10,000s to 100,000s page views a day each so the data amount will be very large.
Should I use a single table with…

Nir
- 24,619
- 25
- 81
- 117
6
votes
4 answers
NTFS directory has 100K entries. How much performance boost if spread over 100 subdirectories?
Context
We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (e.g. up to 100,000). The problem: we store all fs entries in one "cache directory". Very…

user331465
- 2,984
- 13
- 47
- 77
6
votes
5 answers
Practical size limitations for RDBMS
I am working on a project that must store very large datasets and associated reference data. I have never come across a project that required tables quite this large. I have proved that at least one development environment cannot cope at the…

grenade
- 31,451
- 23
- 97
- 126
6
votes
5 answers
Processing apache logs quickly
I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow.
Here…

konr
- 1,173
- 12
- 26
5
votes
5 answers
"Simulating" a 64-bit integer with two 32-bit integers
I'm writing a very computationally intense procedure for a mobile device and I'm limited to 32-bit CPUs. In essence, I'm performing dot products of huge sets of data (>12k signed 16-bit integers). Floating point operations are just too slow, so I've…

Phonon
- 12,549
- 13
- 64
- 114
5
votes
7 answers
Large primary key: 1+ billion rows MySQL + InnoDB?
I was wondering if InnoDB would be the best way to format the table? The table contains one field, primary key, and the table will get 816k rows a day (est.). This will get very large very quick! I'm working on a file storage way (would this be…

James Hartig
- 1,009
- 1
- 9
- 20
5
votes
1 answer
How to pick a chunksize for python multiprocessing with large datasets
I am attempting to to use python to gain some performance on a task that can be highly parallelized using http://docs.python.org/library/multiprocessing.
When looking at their library they say to use chunk size for very long iterables. Now, my…

Sandro
- 2,219
- 4
- 27
- 41
5
votes
1 answer
MySql: Operate on Many Rows Using Long List of Composite PKs
What's a good way to work with many rows in MySql, given that I have a long list of keys in a client application that is connecting with ODBC?
Note: my experience is largely SQL Server, so I know a bit, just not MySQL specifically.
The task is to…

ErikE
- 48,881
- 23
- 151
- 196
5
votes
4 answers
How to design a Real Time Alerting System?
I have an requirement where I have to send the alerts when the record in db is not updated/changed for specified intervals. For example, if the received purchase order doesn't processed within one hour, the reminder should be sent to the delivery…

Siva Arunachalam
- 7,582
- 15
- 79
- 132
4
votes
3 answers
Optimizing MySQL Aggregation Query
I've got a very large table (~100Million Records) in MySQL that contains information about files. One of the pieces of information is the modified date of each file.
I need to write a query that will count the number of files that fit into specified…

Zenshai
- 10,307
- 2
- 19
- 18
4
votes
2 answers
Trivial task - complex solution?
There is a trivial problem:
assign uniqueidentifier to any externalId
do not overwrite the uniqueidentifier once it is assigned - just return existing uniqueidentifier
Imagine a table
ExternalId | Guid
--------------------------------
…

Piotr
- 817
- 1
- 8
- 21
4
votes
1 answer
Python - Search for items in hundreds of large, gzipped files
Unfortunately, I'm working with an extremely large corpus which is spread into hundreds of .gz files -- 24 gigabytes (packed) worth, in fact. Python is really my native language (hah) but I was wondering if I haven't run up against a problem that…

Georgina
- 311
- 4
- 11
4
votes
1 answer
Storing Large Number of Graph Data Structures in a Database
This question asks about storing a single graph in a relational database. The solution is clear in that case: one table for nodes, one table for edges.
I have a graph data structure that evolves over time, so I would like to store "snapshots" of…

Alan Turing
- 12,223
- 16
- 74
- 116