Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions
8
votes
1 answer

import/export very large mysql database in phpmyadmin

i have a db in phpmyadmin having 3000000 records. i want to export this to another pc. now when i export this only 200000 entries exported into .sql file and that is also not imported on the other pc.
Aamir
  • 273
  • 2
  • 5
  • 10
8
votes
2 answers

What is the fastest way to perform FFT on a large file?

I am working on a C++ project that needs to perform FFT on a large 2D raster data (10 to 100 GB). In particular, the performance is quite bad when applying FFT for each column, whose elements are not contiguous in memory (placed with a stride of the…
rooot
  • 103
  • 4
8
votes
2 answers

DynamoDB larger than 400KB items

I am planning to create a merchant table, which will have store locations of the merchant. Most merchants are small businesses and they only have a few stores. However, there is the odd multi-chain/franchise who may have hundreds of locations. What…
Bluetoba
  • 885
  • 1
  • 9
  • 16
8
votes
3 answers

square root of a number greater than 10^2000 in Python 3

I'd like to calculate the square root of a number bigger than 10^2000 in Python. If I treat this number like a normal integer, I will always get this result back: Traceback (most recent call last): File "...", line 3, in print(…
cubeAD
  • 83
  • 1
  • 1
  • 5
8
votes
7 answers

What are best practices for collecting, maintaining and ensuring accuracy of a huge data set?

I am posing this question looking for practical advice on how to design a system. Sites like amazon.com and pandora have and maintain huge data sets to run their core business. For example, amazon (and every other major e-commerce site) has millions…
Kyle West
  • 8,934
  • 13
  • 65
  • 97
8
votes
2 answers

How to update one table from another one without specifying column names?

I have two tables with identical structure and VERY LARGE number of fields (about 1000). I need to perform 2 operations 1) Insert from the second table all rows into the fist. Example: INSERT INTO [1607348182] SELECT * FROM _tmp_1607348182; 2)…
amuliar
  • 1,318
  • 15
  • 26
8
votes
3 answers

Components with large datasets runs slow on IE11/Edge only

Consider the code below. and imagine that rows.length would amount to any value 2000 or more with each array has about 8 columns in this example. I use a more expanded version of this code to render a part of a table that…
Perfection
  • 721
  • 4
  • 12
  • 36
8
votes
1 answer

R - Why adding 1 column to data table nearly doubles peak memory used?

After getting help from 2 kind gentlemen, I managed to switch over to data tables from data frame+plyr. The Situation and My Questions As I worked on, I noticed that peak memory usage nearly doubled from 3.5GB to 6.8GB (according to Windows Task…
NoviceProg
  • 815
  • 1
  • 10
  • 22
8
votes
1 answer

python - repeating numpy array without replicating data

This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer. How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to…
8
votes
3 answers

How to Calculate a Hash of a file that is 1 Terabyte and over?

So, I have a couple of system backup image files that are around 1 terabyte, and i want to calculate fast the hash of each one of them (preferably SHA-1). At first i tried to calculate the md5 hash, 2 hours had passed and the hash hadn't been…
Light Flow
  • 539
  • 1
  • 6
  • 18
8
votes
4 answers

Errors due to vowpal wabbit's dependencies on boost library

I'm trying real hard to install vowpal wobbit and it fails when i run the make file, throwing: cd library; make; cd .. g++ -g -o ezexample temp2.cc -L ../vowpalwabbit -l vw -l allreduce -l boost_program_options -l z -l pthread ld:…
madCode
  • 3,733
  • 5
  • 26
  • 31
8
votes
6 answers

Generating large Excel files from MySQL data with PHP from corporate applications

We're developing and maintaining a couple of systems, which need to export reports in Excel format to the end user. The reports are gathered from a MySQL database with some trivial processing and usually result in ~40000 rows of data with 10-15…
lostcontrol
  • 81
  • 1
  • 1
  • 4
8
votes
3 answers

Which data structure should I use for geocoding?

I am trying to create a Python script which will take an address as input and will spit out its latitude and longitude, or latitudes and longitudes in case of multiple matches, quite like Nominatim. So, the possible input and outputs could be:- In:…
AppleGrew
  • 9,302
  • 24
  • 80
  • 124
7
votes
1 answer

Sampling Permutations of [1,2,3,...,N] for large N

I have to solve the Travelling Salesman Problem using a genetic algorithm that I will have to write for homework. The problem consists of 52 cities. Therefore, the search space is 52!. I need to randomly sample (say) 1000 permutations of range(1,…
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
7
votes
0 answers

FirebaseError: [code=resource-exhausted]: Resource has been exhausted (e.g. check quota)

I have an array of size 10000. All those are document ids. I am running an array and need to get the document data from Firestore and need to add the new fields in each document. But I am facing the below errors like @firebase/firestore: Firestore…
Lakshmi
  • 278
  • 5
  • 19