Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions
0
votes
1 answer

Reading only specific portion of a text file and output them to diffrent text files

I have a a big text file which content something of the following: 158 lines of Text 2000 lines of Data 140 lines of Text 2000 lines of Data 140 lines of Text . . . There is a total of 5 set of 2000 lines of data which I would like python to read…
0
votes
0 answers

How to upload large files with POST without loading it in the RAM?

I need to upload large (2-20 GB) videos to a streaming service without loading them into the ram.The code i wrote is working but i don't have the resources to handle large files. Is there a way in python to that? def upload(file): files =…
0
votes
1 answer

Loop through a really large list

I need to go through a really large vcf file to find matching information (matching rows according to column values). Here is something I tried so far, but it is not working and really problematic. target_id=('id1' 'id2' 'id3' ...) awk '!/#/'…
lambda
  • 97
  • 2
  • 7
0
votes
1 answer

Read large text file in chunks in a loop

I have a large 30GB file that I want to process. I am trying to read it line-by-line in chunks since it cannot be loaded into memory. base::readLines and readr::read_lines_chunked are only able to read in chunks starting from the first line and…
upabove
  • 1,057
  • 3
  • 18
  • 29
0
votes
1 answer

Is there any way to fetch huge amount of data stored in realm in batches of specific size, say 50?

I'm trying to load simple data stored in realm into uitableview in swift. But I don't want realm to load all of it at once, instead I want it to load chunk by chunk, everytime the user reaches the bottom of the table. I've gone through all the…
robben
  • 637
  • 1
  • 7
  • 14
0
votes
0 answers

Handling tracked values in rails within a background job

In our application, we have user, tracked locations of users and locations can be filtered with some applied filters. In user.rb class User has_many :tracked_locations end In tracked_locations.rb class TrackedLocations belongs_to…
Aarthi
  • 1,451
  • 15
  • 39
0
votes
2 answers

Difference between server caching and Client Caching for a large dataset?

I am implementing a project in PHP with mysql. Right now i don't have much data but i was wondering that in future when i have a large dataset. It will slow down my search in the table. So to decrease that searching time, i was thinking for…
insomiac
  • 5,648
  • 8
  • 45
  • 73
0
votes
2 answers

How to Split a large parquet file to multiple parquet and save in different hadoop path by time column

My sparquet file like this id, name, date 1, a, 1980-09-08 2, b, 1980-09-08 3, c, 2017-09-09 Hope the output file like this the folder 19800908 contains data id, name, date 1, a, 1980-09-08 2, b, 1980-09-08 and the folder 20170909 contains data id,…
free斩
  • 421
  • 1
  • 6
  • 18
0
votes
1 answer

Merge multiple large float32 matrices into one and store them

I am trying to merge 34 matrices each sized 256 x 6000000 and of type numpy.float32 into a single matrix and store it on my system. Each matrix is stored in a separate .npy file. This is the script I am using: import numpy as np import os #…
deathracer
  • 305
  • 1
  • 20
0
votes
1 answer

Interactive plotting of large data (few millions) with R

I am trying to visualize several hours of neuronal recordings sampled at 500Hz using R in Ubuntu 16.04. Simply I want to have a 2D plot that shows a value (voltage) over time. Its important for to have the plot in an interactive way. I need to have…
Ali Nouri
  • 67
  • 7
0
votes
1 answer

Increase performance of PowerShell function removing duplicates from CSV

I have a requirement to leverage PowerShell for a problem where I have a large dataset contained within a CSV. I need to read a CSV into memory and process removing all the duplicates from the CSV. The primary problem with this outside of using…
BDubs
  • 73
  • 1
  • 14
0
votes
0 answers

Speed of chunk process slows down with each chunk

Speed of process each chunk slows on each next chunk I tried process chunks with Numpy.vectorize functions but is wasn't succesfull def f(s): try: a = s s = s.replace('\\',' ') s = s.replace('=',':') s =…
0
votes
0 answers

processing large data sets in javascript optimization

I am sending a get request and receiving a JSON object with an array of around 50000 elements. I need to loop through the 50000 array elements and group together all elements with the same username then add that data to another array in my code so…
0
votes
2 answers

Datatable recommandation for django

I want to load millions of data from Django with search bars on each column to search all the database quickly on server-side. However, there is no datatable example for django with server-side and column search bar. I tried django-datatable-view…
Elegant Lin
  • 25
  • 2
  • 10
0
votes
1 answer

matrix index matching for large raster data

I have a large raster data (X) with a dimension of 32251*51333. The values of X are repetitions of another array (Y), which has a size of 3*10^6. Now I want to change the values of X by matching it against each value of Y, for example I can program…
uPhone
  • 313
  • 4
  • 13