Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions

votes

1 answer

compare values in different chunks using pandas

Say I have in memory a large file, loaded using chunksize in pandas. Now I have to compare every value with the ones ajdacent to it. My problem is that I can't seem to select at the same time the extreme values (in first and last position) of two…

asked May 10 '20 at 19:33

apocalypsis

votes

1 answer

database/sql rows.scan hangs after 350K rows

I have a task to pull data from an Oracle Database and I am trying to pull huge data > 6MM records with 100 columns for processing. Need to convert the data to a Map. I was successfully able to process them for 350K records in less than 35 seconds.…

go large-data large-data-volumes

asked May 06 '20 at 03:09

Jaya

votes

1 answer

Efficient way to read and write a Networkx Graph

For an open-source project, I am trying to use NetworkX in order to find attractors of Graph (called State Transition Graph). The thing is for nearly 2**33 loops, a function with a variety of inputs returns a list of tuples(nearly 5000 tuples) in…

python graph networkx large-data

asked May 04 '20 at 05:18

Uday

votes

0 answers

Setting the Dask DataFrame index from a column with size is larger than available memory

I have a large parquet file (~1TB on disk) that I would like to process with Dask, and 512GB RAM available. One of the processing steps requires a join with a smaller DataFrame. I would like to join the DataFrames on indexes, as this should be more…

python dataframe data-science dask large-data

asked Apr 26 '20 at 19:35

Steve OB

votes

1 answer

Error using writeRaster on a large RasterStack

I have a RasterStack in R called "preds2" that is 4.1 GB and was outputted from 4 RasterStacks and 2 RasterLayers (wveg, wfps_lag, wfps, ndvi, swt, lu): cl <- makeCluster(4) registerDoSNOW(cl) preds<-foreach(j = 1:nlayers(ndvi))%dopar%{ …

r raster large-data r-raster

asked Apr 25 '20 at 17:49

rachell

votes

3 answers

ASPX.NET application out of memory exception for no reason

Here is the deal: when my web server starts up, it creates a couple of lengthy (20M of elements) arrays with really small objects (like 1-2-3 ints). The accumulative size of any individual array is NOT larger than 2GB (the limitation of CLR, see the…

c# .net webserver out-of-memory large-data

asked May 25 '11 at 17:58

Schultz9999

8,717
8
48
87

votes

0 answers

Parsing large JSON file, and download URL's of every object in Python

In Python I'm trying to download every single URL which is contained in a 180 MB JSON file. Even though it is only 180 MB, when I'm trying to open it with text-editor it uses 5.9 GB memory. So Jupyter is crashing when I try to read the JSON and…

python json large-data largenumber

asked Apr 17 '20 at 17:17

erikci

votes

2 answers

MySQL - Executing intensive queries on live server

I'm having some issues dealing with updating and inserting millions of row in a MySQL Database. I need to flag 50 million rows in Table A, insert some data from the marked 50 million rows into Table B, then update those same 50 million rows in…

mysql database large-data myisam large-data-volumes

asked May 25 '11 at 16:23

Ryan

17,511
23
63
88

votes

1 answer

Is it possible to use mongoDB geospacial indexes with grid FS

I have a large geojson feature collection which is over 16MB. I am hoping to insert the data into MongoDB so that I can utilize the geospatial functionality that MongoDB offers ($geoIntersects, $geoWithin, etc). Due to the large size of the file, I…

mongodb geospatial large-data gridfs mongodb-geospatial

asked Apr 12 '20 at 11:28

sfraser

votes

0 answers

Large scale linearly-constrained convex quadratic optimization - R/Python/Gurobi

I have a series of linearly-constrained convex quadratic optimization problems that have around 100.000 variables, 1 linear constraint and 100.000 bound constraints (the same as the number of variables - the solution has to be positive). I am…

large-data gurobi convex-optimization quadratic-programming

asked Apr 11 '20 at 22:41

Michael De Santa

votes

1 answer

How to store a TB size C++ array on a cluster

I want to do a huge simulation that requires ~ 1 TB of data to describe a bunch of interacting particles (each has different interactions). Is it possible to store this data in a C++ array? I have access to a 60 node cluster. Each node has 2 CPUs…

c++ memory-management heap-memory large-data hpc

asked Apr 11 '20 at 03:59

Thermodynamix

votes

1 answer

Django Postgres migration: Fastest way to backfill a column in a table with 100 Million rows

I have a table in Postgres Thing that has 100 Million rows. I have a column that was populated over time that stores some keys. The keys were prefixed before storing. Let's call it prefixed_keys. My task is to use the values of this column to…

django postgresql large-data bulkupdate

asked Apr 03 '20 at 13:07

Mohammad Sadeq Badakhshanfard

votes

0 answers

Solving Massive Latency Issues with SQL Left Join

My computer is currently working toward hour 48 of a left join statement. This left join statement is to concatenate two matrices one is 47 million x 3 the other is 45 million x 2. The computer I'm running it on is a 9th gen i7 with 32gb memory and…

mysql sql large-data latency

asked Apr 03 '20 at 11:08

Theodora Cantacuzene

votes

1 answer

H5py, Merge matched lines from huge hdf5 file to smaller datasets

I have two huge hdf5 files, each with an index of ids, and each containing different information about each of those ids. I have read one into a small masked dataset (data), using only a select few ids. I now want to add to the dataset, using…

python hdf5 large-data h5py pytables

asked Mar 31 '20 at 11:02

tom davison

votes

0 answers

Manipulating large sets of data in Matlab, asking for advice on a few things, cells and numeric array operations, with performance in mind

This is a cross-post from here: Link to post in the Mathworks community Currently I'm working with large data sets, I've saved those data set as matlab files with the two biggest files being 9.5GB and 5.9GB. These files contain a cell array each of…

matlab matrix interpolation large-data cell-array

asked Mar 26 '20 at 16:32

Bob van de Voort

Prev 1 2 3

…

99 100 Next