Questions tagged [large-data-volumes]

302 questions
4
votes
4 answers

Processing large amounts of data using multithreading

I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database. Processing each record is also a fairly complex operation. I need to perform a lot of…
Sennin
  • 1,001
  • 2
  • 10
  • 17
4
votes
2 answers

IDEAs: how to interactively render large image series using GPU-based direct volume rendering

I'm looking for idea's how to convert a 30+gb, 2000+ colored TIFF image series into a dataset able to be visualized in realtime (interactive frame rates) using GPU-based volume rendering (using OpenCL / OpenGL / GLSL). I want to use a direct volume…
bastijn
  • 5,841
  • 5
  • 27
  • 43
4
votes
6 answers

Computing token counters on huge dataset

I need to go over a huge amount of text (> 2 Tb, a Wikipedia full dump) and keep two counters for each seen token (each counter is incremented depending on the current event). The only operation that I will need for these counters is increase. On a…
smola
  • 863
  • 8
  • 15
4
votes
3 answers

Cost of serialization in web service

My next project involves the creation of a data API within an enterprise framework. The data will be consumed by several applications running on different software platforms. While my colleagues generally favour SOAP, I would like to use a RESTful…
srmark
  • 7,942
  • 13
  • 63
  • 74
4
votes
1 answer

Fetch only N rows at a time (MySQL)

I'm looking for a way to fetch all data from a huge table in smaller chunks. Please advise.
akosch
  • 4,326
  • 7
  • 59
  • 80
4
votes
4 answers

Common Lisp: What is the downside to using this filter function on very large lists?

I want to filter out all elements of list 'a from list 'b and return the filtered 'b. This is my function: (defun filter (a b) "Filters out all items in a from b" (if (= 0 (length a)) b (filter (remove (first a) a) (remove (first a)…
schellsan
  • 2,164
  • 1
  • 22
  • 32
4
votes
3 answers

Creating a large sitemap on Google App Engine?

I have a site with around 100,000 unique pages. (1) How do I create a Sitemap for all these links? Should I just list them flat in a large sitemap protocol compatible file? (2) Need to implement this on Google App Engine where there is a 1000 item…
demos
  • 2,630
  • 11
  • 35
  • 51
4
votes
1 answer

Count the occurrence of each element in large data stream

I have a simulation, with N particles, running over T timesteps. At each timestep, each particle calculates some data about itself and the other particles nearby (within radius), which is bitpacked into a c-string of 4-22 bytes long (depending on…
4
votes
3 answers

Need to compare very large files around 1.5GB in python

"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2" "Rail","00000.POO@GMAIL.COM","NR251764697478","24JUN2011","B2C","2025" "DF","0000650000@YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792" "Bus","00009.GAURAV@GMAIL.COM","N…
Geek
  • 1,369
  • 1
  • 14
  • 25
4
votes
1 answer

Mayavi visualizing huge 3D arrays

I have a 3D dataset with around 6 million points. Is there any way to plot it using contour3D? Every time tried, mayavi goes out of memory. Else, is there a way to increase the number of colors in the volume() ctf to more than 256 colors. I have…
pitc
  • 71
  • 4
4
votes
4 answers

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category. Now, as you would imagine doing random sorting on a table this…
Brett
  • 19,449
  • 54
  • 157
  • 290
4
votes
1 answer

How do I update the database in the most efficient manner?

I am building a price comparison site that holds about 300.000 products and several hundred clients. On a daily basis the site needs updating of prices and vendor stock availability. When a vendor needs updating I was thinking about deleting all the…
user937635
  • 59
  • 5
3
votes
4 answers

I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB). How to store them and access them very efficiently?

This is an interview question : Suppose: I have 100 trillion elements, each of them has size from 1 byte to 1 trillion bytes (0.909 TiB). How to store them and access them very efficiently ? My ideas : They want to test the knowledge…
user1002288
  • 4,860
  • 10
  • 50
  • 78
3
votes
2 answers

SQL Database design for huge datasets

I have a customer that has the following data structure... for each patient, there may be multiple samples, and each sample may, after processing, have 4 million data objects. The max number of samples per patient is 20. So a single patient may…
Nicros
  • 5,031
  • 12
  • 57
  • 101
3
votes
6 answers

How do I count the number of rows in a large CSV file with Perl?

I have to use Perl on a Windows environment at work, and I need to be able to find out the number of rows that a large csv file contains (about 1.4Gb). Any idea how to do this with minimum waste of resources? Thanks PS This must be done within the…
Alex Wong
  • 761
  • 3
  • 9
  • 15