Questions tagged [large-data-volumes]
302 questions
4
votes
4 answers
Processing large amounts of data using multithreading
I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database.
Processing each record is also a fairly complex operation. I need to perform a lot of…

Sennin
- 1,001
- 2
- 10
- 17
4
votes
2 answers
IDEAs: how to interactively render large image series using GPU-based direct volume rendering
I'm looking for idea's how to convert a 30+gb, 2000+ colored TIFF image series into a dataset able to be visualized in realtime (interactive frame rates) using GPU-based volume rendering (using OpenCL / OpenGL / GLSL). I want to use a direct volume…

bastijn
- 5,841
- 5
- 27
- 43
4
votes
6 answers
Computing token counters on huge dataset
I need to go over a huge amount of text (> 2 Tb, a Wikipedia full dump) and keep two counters for each seen token (each counter is incremented depending on the current event). The only operation that I will need for these counters is increase. On a…

smola
- 863
- 8
- 15
4
votes
3 answers
Cost of serialization in web service
My next project involves the creation of a data API within an enterprise framework. The data will be consumed by several applications running on different software platforms. While my colleagues generally favour SOAP, I would like to use a RESTful…

srmark
- 7,942
- 13
- 63
- 74
4
votes
1 answer
Fetch only N rows at a time (MySQL)
I'm looking for a way to fetch all data from a huge table in smaller chunks.
Please advise.

akosch
- 4,326
- 7
- 59
- 80
4
votes
4 answers
Common Lisp: What is the downside to using this filter function on very large lists?
I want to filter out all elements of list 'a from list 'b and return the filtered 'b. This is my function:
(defun filter (a b)
"Filters out all items in a from b"
(if (= 0 (length a)) b
(filter (remove (first a) a) (remove (first a)…

schellsan
- 2,164
- 1
- 22
- 32
4
votes
3 answers
Creating a large sitemap on Google App Engine?
I have a site with around 100,000 unique pages.
(1) How do I create a Sitemap for all these links? Should I just list them flat in a large sitemap protocol compatible file?
(2) Need to implement this on Google App Engine where there is a 1000 item…

demos
- 2,630
- 11
- 35
- 51
4
votes
1 answer
Count the occurrence of each element in large data stream
I have a simulation, with N particles, running over T timesteps. At each timestep, each particle calculates some data about itself and the other particles nearby (within radius), which is bitpacked into a c-string of 4-22 bytes long (depending on…

user3734029
- 85
- 7
4
votes
3 answers
Need to compare very large files around 1.5GB in python
"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2"
"Rail","00000.POO@GMAIL.COM","NR251764697478","24JUN2011","B2C","2025"
"DF","0000650000@YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792"
"Bus","00009.GAURAV@GMAIL.COM","N…

Geek
- 1,369
- 1
- 14
- 25
4
votes
1 answer
Mayavi visualizing huge 3D arrays
I have a 3D dataset with around 6 million points. Is there any way to plot it using contour3D? Every time tried, mayavi goes out of memory.
Else, is there a way to increase the number of colors in the volume() ctf to more than 256 colors. I have…

pitc
- 71
- 4
4
votes
4 answers
Getting random results from large tables
I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this…

Brett
- 19,449
- 54
- 157
- 290
4
votes
1 answer
How do I update the database in the most efficient manner?
I am building a price comparison site that holds about 300.000 products and several hundred clients.
On a daily basis the site needs updating of prices and vendor stock availability.
When a vendor needs updating I was thinking about deleting all the…

user937635
- 59
- 5
3
votes
4 answers
I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB). How to store them and access them very efficiently?
This is an interview question :
Suppose:
I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB).
How to store them and access them very efficiently ?
My ideas :
They want to test the knowledge…

user1002288
- 4,860
- 10
- 50
- 78
3
votes
2 answers
SQL Database design for huge datasets
I have a customer that has the following data structure... for each patient, there may be multiple samples, and each sample may, after processing, have 4 million data objects. The max number of samples per patient is 20. So a single patient may…

Nicros
- 5,031
- 12
- 57
- 101
3
votes
6 answers
How do I count the number of rows in a large CSV file with Perl?
I have to use Perl on a Windows environment at work, and I need to be able to find out the number of rows that a large csv file contains (about 1.4Gb).
Any idea how to do this with minimum waste of resources?
Thanks
PS This must be done within the…

Alex Wong
- 761
- 3
- 9
- 15