Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions
0
votes
1 answer

Python Multiprocessing: How to set up the number of max_workers properly?

Background: I have a huge DataFrame with 40 million rows. I have to run some functions on some columns. The loops were taking too long, so I decided to go with Multiprocessing. CPU: 8 cores 16 threads RAM: 128 GB Question: How many chunks should I…
0
votes
0 answers

Loading and merging large csv files python

I'm trying to open 10 csv files in pandas as a dataframe using the read_csv() function however I keep getting the following error- "MemoryError: Unable to allocate 207. MiB for an array with shape (10, 2718969) and data type int64". 8 of the csv…
Aastha Jha
  • 153
  • 1
  • 2
  • 14
0
votes
2 answers

4D plot with gnuplot

I have large data set arranged in four column in a file like this # X Y Z f 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0 ... where (x,y,z) is coordinate of each point in 3D mesh (between [0,1] in each direction and each direction…
mehdi_bm
  • 381
  • 1
  • 3
  • 16
0
votes
1 answer

How to create an array based on a generator?

Say I have a large array: A = 2*np.ones([100, 100, 100]) I want to do some calculations on it, for example: def squared_elements(M): yield M**2 I choose to use a generator function because my array is very big and I don't need all the results.…
0
votes
0 answers

How to chunk and generate pdf using dompdf in codeigniter for large data set

I have to generate a pdf file for very large data set (more than 1M). Can anyone explain how to chunk those data set into smaller units and download all data in one single pdf file. when i try to generate like below it gives memory exhausted error…
sanji
  • 1,310
  • 1
  • 12
  • 21
0
votes
1 answer

Reading in 6GB SPSS (.dta) dataset into R

I have a large data file that is 6.1 GB on my iMac (OS: Catalina 10.15.4) Processor (3.1 GHz) I have tried multiple ways to read in the file into my R global environment. library(foreign) data <- read.dta(file = "File.dta", missing.type =…
0
votes
2 answers

Jqgrid huge data load problems

I am doing some investigation on jqgrid, everything works fine, until I load the huge data which contains about 1M lines in database, jqgrid don't display now, when I downsize the lines of the database to 100K, the data will display, but I still…
Victor
  • 1
  • 1
  • 2
0
votes
1 answer

Impact of large dataset on the trained model size?

If the dataset is large does it mean that the model size will also be large?
0
votes
1 answer

Numpy memory error with 1GB matrix, 64bit Python and load of RAM

[Note: although there are already some posts about dealing with large matrices in numpy, they do not address my specific concerns.] I am trying to load a 30820x12801 matrix stored in a .txt file of size 1.02G with numpy.loadtxt(). I get a Memory…
Soap
  • 309
  • 2
  • 14
0
votes
2 answers

why does a function with setTimeout not lead to a stack overflow

I was writing a test for handling huge amounts of data. To my surprise, if I added a setTimeout to my function, it would no longer lead to a stack overflow (how appropriate for this site). How is this possible, the code seems to be really…
Corno
  • 5,448
  • 4
  • 25
  • 41
0
votes
1 answer

Update large number of row in MySQL table

I'm using relational database(MySQL 5.7). On this database i have a table called customer_transaction. On this table i have 4 columns: id, customer_id, type, amount |id|customer_id |type |amount| |--|------------|---------|------| |1 |44 …
Hasan Hafiz Pasha
  • 1,402
  • 2
  • 17
  • 25
0
votes
0 answers

Clustering a very large dataset of discrete-valued samples

I am trying to cluster (AgglomerativeCluster, kMeans) a very large dataset of the following type: [0, 0 , 0, 0, 1, 2, 2, 2, 2, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5] That is, a sample of integers that repeat multiple times. In short, I would like to…
0
votes
1 answer

Join two large spark dataframes persisted in parquet using Scala

I'm trying to join two large Spark dataframes using Scala and I can't get it to perform well. I really hope someone can help me. I have the following two text files: dfPerson.txt (PersonId: String, GroupId: String) 2 million rows (100MB) …
0
votes
0 answers

How to deal with 10 million rows in a winforms list or grid from an in-memory List?

I have to build a small viewer application that can import a large archive table from a USB stick, several million row with 4 fields (two Uint64 ID's, a timestamp and an Int32 ID). The application needs to simply show the data, allow for sorting by…
Rob
  • 11,492
  • 14
  • 59
  • 94
0
votes
1 answer

Best way to store large amounts of high resolution images for my React portfolio?

I am developing a personal portfolio for myself using React and Gatsby, and I'm looking for a way to implement a gallery there with all my photography in it. I need a way to efficiently store and retrieve large amounts of high-res images to use in…
PRR
  • 153
  • 4
  • 13