Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions
15
votes
4 answers

How much data can R handle?

By "handle" I mean manipulate multi-columnar rows of data. How does R stack up against tools like Excel, SPSS, SAS, and others? Is R a viable tool for looking at "BIG DATA" (hundreds of millions to billions of rows)? If not, which statistical…
AME
  • 5,234
  • 23
  • 71
  • 81
15
votes
2 answers

Computing the null space of a bigmatrix in R

I can not find any function or package to calculate the null space or (QR decomposition) of a bigmatrix (from library(bigmemory)) in R. For example: library(bigmemory) a <- big.matrix(1000000, 1000, type='double', init=0) I tried the following but…
Mahin
  • 193
  • 8
14
votes
2 answers

Most efficient way to store and access a huge data matrix in MySQL

I am going to store a huge amount of matrix data in a mysqlDB what is the most efficient way to store and access the data? The efficiency is most important when getting the data, the table will not be updated regularly. The matrix is about 100.000…
david
  • 141
  • 1
  • 1
  • 4
14
votes
5 answers

Handling very large amount of data in MyBatis

My goal is actually to dump all the data of a database to an XML file. The database is not terribly big, it's about 300MB. The problem is that I have a memory limitation of 256MB (in JVM) only. So obviously I cannot just read everything into memory.…
Alvin
  • 10,308
  • 8
  • 37
  • 49
14
votes
4 answers

Is it possible to generate PDF with StreamingHttpResponse as it's possible to do so with CSV for large dataset?

I have a large dataset that I have to generate CSV and PDF for. With CSV, I use this guide: https://docs.djangoproject.com/en/3.1/howto/outputting-csv/ import csv from django.http import StreamingHttpResponse class Echo: """An object that…
good_evening
  • 21,085
  • 65
  • 193
  • 298
13
votes
5 answers

split large csv text file based on column value

I have CSV files that have multiple columns that are sorted. For instance, I might have lines like…
user788171
  • 16,753
  • 40
  • 98
  • 125
13
votes
5 answers

What's the best way to paginate and filters large set of data in Firebase?

I have a large Firestore collection with 10,000 documents. I want to show these documents in a table by paging and filtering the results at 25 at a time. My idea, to limit the "reads" (and therefore the costs), was to request only 25 documents at a…
13
votes
5 answers

Time performance in Generating very large text file in Python

I need to generate a very large text file. Each line has a simple format: Seq_numnum_val 12343234 759 Let's assume I am going to generate a file with 100million lines. I tried 2 approaches and surprisingly they are giving very different time…
doubleE
  • 1,027
  • 1
  • 12
  • 32
13
votes
1 answer

Using index on inner join table in MySQL

I have table Foo with 200 million records and table Bar with 1000 records, they are connected many-to-one. There are indexes for columns Foo.someTime and Bar.someField. Also in Bar 900 records have someField of 1, 100 have someField of 2. (1) This…
Yurii Shylov
  • 1,219
  • 1
  • 10
  • 19
13
votes
3 answers

How to sample large database and implement K-means and K-nn in R?

I'm a new user to R, trying to move away from SAS. I'm asking this question here as I'm feeling a bit frustrated with all the packages and sources available for R, and I cant seem to get this working mainly due to data size. I have the following: A…
erichfw
  • 344
  • 3
  • 15
13
votes
4 answers

PHP and the million array baby

Imagine you have the following array of integers: array(1, 2, 1, 0, 0, 1, 2, 4, 3, 2, [...] ); The integers go on up to one million entries; only instead of being hardcoded they've been pre-generated and stored in a JSON formatted file (of…
Mahn
  • 16,261
  • 16
  • 62
  • 78
12
votes
2 answers

file based merge sort on large datasets in Java

given large datasets that don't fit in memory, is there any library or api to perform sort in Java? the implementation would possibly be similar to linux utility sort.
user775187
  • 22,311
  • 8
  • 28
  • 36
12
votes
3 answers

How to get a sorted subvector out of a sorted vector, fast

I have a data structure like this: struct X { float value; int id; }; a vector of those (size N (think 100000), sorted by value (stays constant during the execution of the program): std::vector values; Now, I want to write a function void…
etarion
  • 16,935
  • 4
  • 43
  • 66
12
votes
4 answers

When writing a large array directly to disk in MATLAB, is there any need to preallocate?

I need to write an array that is too large to fit into memory to a .mat binary file. This can be accomplished with the matfile function, which allows random access to a .mat file on disk. Normally, the accepted advice is to preallocate arrays,…
Flyto
  • 676
  • 1
  • 7
  • 18
12
votes
3 answers

Read large data from csv file in php

I am reading csv & checking with mysql that records are present in my table or not in php. csv has near about 25000 records & when i run my code it display "Service Unavailable" error after 2m 10s (onload: 2m 10s) here i have added code // for set…
Andy Martin
  • 179
  • 1
  • 2
  • 9