Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions
0
votes
1 answer

process large csv file at a transaction level

I have to work on processing a large CSV file (~1GB) looks as below using java. Trans1, 1, 2, 3, 4 Trans1, 2, 3, 4, 5 Trans1, 4, 5, 2, 1 Trans2, 1, 2, 3, 4 Trans2, 2, 3, 4, 5 Trans2, 4, 5, 2, 1 Trans2, 1, 2, 3, 4 Trans3, 2, 3, 4, 5 Trans3, 4, 5,…
PraveenM
  • 23
  • 7
0
votes
1 answer

Is there an R package for working with very large graphs?

I'm trying to find maxflow/mincut in a very large graph using R Language. I tried using RBGL package, which is a wrapper for some C library, so it's supposed to be much faster than pure-R packages, but I'm getting stuck on creating a graph object.…
0
votes
0 answers

How to fix: "Error: cannot allocate vector of size 5.1 Gb" when running a spatial error model?

I'm running a spatial error model on a large dataset (n=26,000) for a hedonic price analysis. I have built a nearest neighbor (k=10) spatial weights file and listw object. However, when I try running the actual "errorsarlm" function, I get the…
emereif
  • 1
  • 1
  • 1
0
votes
1 answer

How to reshape a 183,223,040x4 matrix into 140 matrices of dimensions 1145x1145 without MemoryError?

I have a matrix of dimensions 183,223,040x4 with the variables showed below. There are 140 different values in 'REG', and 1145 different values of both 'SAMAC' and 'SAMAC.1' I want to iterate over REG to get 140 matrices of size 1145*1145, with the…
Cuisilopez
  • 39
  • 1
  • 7
0
votes
0 answers

handling large non-sparse matrices for computing SVD

I have a large matrix (right now about 450000 x 50, might be even larger) that I want to compute its SVD decomposition. The matrix isn't sparse and numpy can't seem to handle it and exits with MemoryError. I tried using np.float16 and it didn't…
HadarM
  • 113
  • 1
  • 9
0
votes
2 answers

Handling extremely large Numpy arrays

I want to create a Numpy kernel matrix of dimensions 25000*25000. I want to know what is the most efficient way to handle such large matrix in terms of saving it on disk and loading it. I tried dumping it with Pickle, but it threw an error saying it…
0
votes
2 answers

AWS lambda extract large data and upload to s3

I am trying to write a nodeJS lambda function to query data from our database cluster and upload this to s3, we require this for further analysis. But my doubt is, if the data to be queried from the db is large (9GB), how does the lambda function…
Sreerag
  • 1,381
  • 3
  • 11
  • 16
0
votes
0 answers

Large logs in JSON - data processing and analysis

I am new here and I am asking for your understanding. I am a beginner in the field of data processing and analysis. I would like to ask for help in my task. I have three datasets(logs) in the json format. Each of them has a size of approximately 1.5…
AWL
  • 31
  • 1
  • 2
0
votes
1 answer

WPF ComboBox Large Data Source MVVM

I have this combobox which is bound to a list of customers with around 5k entries
0
votes
2 answers

R: Match values in two data frames like vlookup but for multiple criteria without Key [large data]

I have two large data frames (500k rows) from two separate sources without a key. Instead of being able to merge using a key, I want to merge the two data frames by matching other columns. Such as age and amount. It is not a perfect match between…
Raz89
  • 45
  • 1
  • 6
0
votes
0 answers

Send server side data rows in chunks to client

I have an app which is nodejs + express + reactjs. On Nodejs side I am fetching large volume of data rows using mysql stored procedure and sending the response to client asynchronously. But this causing some delays on browser because of large…
C. T.
  • 95
  • 1
  • 13
0
votes
1 answer

How long should an select into query take when selecting from a table with ~200 million rows in sql server 2005?

I have a table with 193,569,270 rows in a SQL Server 2005 database. The table houses activities that are performed by users of our website. The table is defined as: Name DataType ID int (identity) …
Wayne E. Pfeffer
  • 245
  • 1
  • 7
  • 15
0
votes
2 answers

How to output a large image to the browser using PHP?

I have a very large image generated on the fly with PHP and outputted to the browser. (it's 5000px wide and 1000-2000px tall. It's a plot of the daily user activity on my site). The problem is that nowadays the plot is too big and the PHP script…
Calmarius
  • 18,570
  • 18
  • 110
  • 157
0
votes
0 answers

Jquery: How to loop through large data volume xml array

I have an XML file, with 54000 address lines and I want to make an autocomplete input to search an address with help of jQuery. My jQuery code does not work fine because it is a large data volume and it takes a lot of time to show me results. How…
0
votes
2 answers

Displaying very large data sets more efficently

I have a logic analyser project that records several hundred million 16bit values (~100-500 million) and I need to display anything from a few hundred samples to the entire capture as the user zooms. When you zoom out the whole system gets a huge…
uMinded
  • 595
  • 1
  • 9
  • 21
1 2 3
99
100