Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions
0
votes
2 answers

Best way to run macro for over 500K rows?

I have a file with a bunch of rows that contains data for certain part numbers from different configurations. Some of these part numbers are repeated throughout the file, and in those duplicated part numbers may contain certain data and some may…
Jeff
  • 43
  • 1
  • 6
0
votes
1 answer

Powershell question - Looking for fastest method to loop through 500k objects looking for a match in another 500k object array

I have two large .csv files that I've imported using the import-csv cmdlet. I've done a lot of searching and trying and am finally posting to ask for some help to make this easier. I need to move through the first array that will have anywhere from…
bribri
  • 1
  • 1
0
votes
0 answers

Handle 47gb JSON File

I have a 47gb JSON file which has the following form { "data": { "Key_A": {small JSON ~50mb}, "Key_B": {small JSON ~50mb}, "Key_C": {large JSON ~47gb} } } The exact structure and content of file.data.Key_C is unknown and I'd like to…
user101
  • 476
  • 1
  • 4
  • 9
0
votes
0 answers

Appending large image dataset to an array

I am doing a classification using CNN on fake images. My data contains 100K+ images of two classes. I'm using Google Colab for doing the work. I already increased the RAM to 25 GB, but while appending the images to the array it keeps crashing. The…
0
votes
1 answer

Efficient way of storing a large amount of character data between transactions in C++

For our application we have the following scenario: Firstly, we get a large amount of data (on cases, this can be more than 100MB) through a 3rd party API into our class via a constructor, like: class DataInputer { public: DataInputer(int id,…
Ferenc Deak
  • 34,348
  • 17
  • 99
  • 167
0
votes
2 answers

Filling in missing value based on values in both preceding and succeeding rows

I have a dataset analogous to the one below where for a website I have the number of views every month for two years (2001-2002). However, due to the way the data was gathered, I only have information for a website if it had > 0 views. So, I am…
ucmom
  • 1
0
votes
2 answers

Export a large SQL Server result to excel (Row Count = 1 Million) using c#

I need to export SQL Server result to excel file (at least a million rows and minimum 50 columns) constraints: We can't use Inter-op Data should be in only one-sheet The process should not use more than 5GB ram (Time consumption up to 45 min) Excel…
Dhrup
  • 71
  • 11
0
votes
2 answers

Large records table insertion issue Mysql

I am a developer and I am facing an issue while managing table which has large amount of records. I am executing a cron job to fill up data in primary table (Table A) which has 5-6 columns and approx 4,00,000 to 5,00,000 rows and then creating…
Suketu
  • 371
  • 1
  • 7
0
votes
0 answers

Sparse Matrix Coding in Matlab

I have a dataset in which I have a 25,000 by 25,000 matrix for each timepoint with around 60% of the cells being zeroes. I need to be able to take the eigenvalues of my matrices and reorder the columns and rows. Currently, I am running into memory…
vins
  • 1
0
votes
0 answers

MySQL alter table partition by hash month/year - Error . A PRIMARY KEY must include all columns in the table's partitioning function

I have a MySQL table (with large data): CREATE TABLE `rider_orders` ( `id` NOT NULL AUTO_INCREMENT PRIMARY KEY, `date` date NOT NULL, `shift_id` INT NOT NULL, `rider_id` INT NOT NULL, `product_id` INT NOT NULL ) I want to add partitions…
0
votes
2 answers

Is there any way to test the highcharts with large data-set with Jest and React.?

I am trying the write test for highcharts. But how to test the highchart with larger data set, to check the performance and to test if highchart works with larger data set. Library : Highchart-react-official, Jest, React
0
votes
2 answers

matplotlib wxPython backend crashing after large image stack processing

I got stuck with an wxAssertionError (shown below) when processing a large image stack. Let me explain with an example. I have made an interface with wxPython with just a panel, a button and a gauge bar. Once user clicks on the button, the code…
0
votes
1 answer

Combining overlapping matrix data to create a single matrix

I have measurement data from four sensors, each gives current speed at a given water depth (d) and time (t). Below is the matrix dimension of these four current measurements: cs1 = [d1 x t1]; cs2 = [d2 x t2]; cs3 = [d3 x t3]; cs4 = [d4 x t4] The…
Amitava
  • 431
  • 2
  • 6
  • 21
0
votes
0 answers

Is there overhead by file_get_contents with offset and length vs splitting the original file and reading those single files when needed?

I have a very large file (50 GByte), that I could either split in many single 2 MByte chunk files, or that I could access with file_get_contents using offsets and length of 2 Mbyte, where the used offsets are not neccessarily continuous. So I wonder…
JSmith
  • 1
  • 1
0
votes
1 answer

How to Update/Merge two huge List with hundreds of properties in C#, based on common matching Key in most efficient way

I have two large set of collection List with hundreds of properties. e.g. Original Collection List and Updated Collection List UpdatedCollection will contain value in certain columns which most probably…
Maulik
  • 510
  • 1
  • 7
  • 22