Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions
0
votes
2 answers

Search string from large amount of data(millions of record in CSV file)

I have millions of record in csv file and i need to do string comparison and show the filtered record in Bootstrap data table. CSV files are updated on daily basis with millions of record. Note: If i import csv file into sql database and apply…
Nauman
  • 218
  • 1
  • 3
  • 11
0
votes
1 answer

In Python using Pandas, is it possible to read 4B rows chunkwise and filter each chuck against a 30M row dataframe already in memory?

Have a 4B row table in Oracle and a 30M row CSV, both tables share 2 columns on which I want to filter the large table using the smaller table. Due to security restrictions, I cannot load the 30M row CSV into Oracle and run a single join which would…
0
votes
0 answers

Reading 39GB CSV filde data in chunk size. I couldn't append the chunks in one df format

read as follows c_size=10000000 c_chunk = pd.read_csv("CHARTEVENTS.csv", index_col=0, chunksize=c_size) wanted here each chunk in df format c_list= [] for chunk in c_chunk: chunk.columns=['ROW_ID', 'SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID',…
Abe
  • 1
  • 2
0
votes
1 answer

Most efficient way for decoding 1D array with specific structure to 3D or 4D array (with Python)

I have a 1D array of around 63M integer elements that represent a 4D dataset with axes x, y, channel, frame (all positive integers). The shape of the dataset is (512, 512, 4069, 239). The dataset represents an x-ray spectrum stream, where x and y…
DIN14970
  • 341
  • 2
  • 8
0
votes
3 answers

How to run supervised ML models on a large dataset (15GB) in R?

I have a dataset (15 GB): 72 million records and 26 features. I would like to compare 7 supervised ML models (classification problem): SVM, random forest, decision tree, naive bayes, ANN, KNN and XGBoosting. I created a sample set of 7.2 million…
0
votes
2 answers

NodeJS crashes after sometime of leading a csv file

I've been working on a project that outputs xml upon reading a csv, I use the fs.createReadStream() method to read the csv file but after some time, the terminal just crashes. And I get C:\Users\username\Documents\Programming\Node Projects\DAE…
Himanshu Sardana
  • 123
  • 2
  • 10
0
votes
1 answer

What is the proper way to continuously/lazy load data from Django into Angular

I am trying to create a blog where all the comments get loaded on each blog post page. The issue is that some posts can contain a few comments which takes seconds to load while others can contain well over 100 which will take a lot longer. I want to…
0
votes
4 answers

Python: How to quickly create a pandas data frame with only specific columns from a big excel sheet?

I have an excel file with only one sheet. The size of the excel file is ~900 Mb and contains thousands of rows and hundreds of columns I want to extract only a few columns (say Name, Numbers & Address) from the excel sheet and do data…
Prashant Kumar
  • 501
  • 8
  • 26
0
votes
1 answer

Dealing with large data?

I'm using WinForms and C# for my application and my data is mainly some strings, integers and many lists. Now I store them in xml and text files but I just found out that reading the data takes too long. I'm using XmlWriter and XmlReader. For…
user579674
  • 2,159
  • 6
  • 30
  • 40
0
votes
1 answer

Importing only a few columns of a csv as a python pandas dataframe?

I would like to only import a subset of a csv as a dataframe as it is too large to import the whole thing. Is there a way to do this natively in pandas without having to set up a database like structure? I have tried only importing a chunk and then…
Bstampe
  • 689
  • 1
  • 6
  • 16
0
votes
2 answers

How to design HTTP API to push massive data?

I need to provide an HTTP API for clients to push massive data, in the shape of a set of records. My first idea was to provide a set of three calls, like: "BeginPushData" (no parameters, returns an Id), "PushSomeData" (parameters: id, subset of…
Starnuto di topo
  • 3,215
  • 5
  • 32
  • 66
0
votes
3 answers

Faster way to extract data from large file

I have file containing about 40000 frames of Cartesian coordinates of 28 atoms. I need to extract coordinates of atom 21 to 27 from each frame. I tried using bash script with for-loop. for i in {0..39999} do cat $1 | grep -A 27 "frame $i " |…
Juicce
  • 33
  • 5
0
votes
3 answers

How to use large dataset in multiple processes without copying it?

I need to run multiple (1000-10000s) search queries on a large dataset (>10GB) in python. To speed things up, I want to run the individual queries in parallel. However, as far as I understand, parsing the dataset to different processes copies it…
Manish Goel
  • 843
  • 1
  • 8
  • 21
0
votes
1 answer

Sending large/big data in MPI (Java OpenMPI)

I want to send large multiple objects in my JAVA MPI program, and wondering what is the most efficient way to do this? Currently, my program involves serializing the objects first into bytes before sending byte [] objects to another processor.…
jnarag
  • 1
  • 1
0
votes
1 answer

How to hold a python list containing millions of dictionaries in RAM at once?

I am storing huge csv file as list of dictionaries like below: dictlist=[{ 'col-1' : 'data-1','col-2' : 'data-2','col-3' : 'data-3'}, { 'col-1' : 'data-1','col-2' : 'data-2','col-3' : 'data-3'}] where keys 1 and 2 are row numbers and…
p.durga shankar
  • 967
  • 8
  • 18