Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions

votes

4 answers

Sorting gigantic binary files with C#

I have a large file of roughly 400 GB of size. Generated daily by an external closed system. It is a binary file with the following format: byte[8]byte[4]byte[n] Where n is equal to the int32 value of byte[4]. This file has no delimiters and to…

c# binary binaryfiles large-data

asked Sep 30 '11 at 00:48

Jeffrey Kevin Pry

3,266
3
35
67

votes

1 answer

pivot_longer with a very big data.frame, memory efficient approaches

I have a data.frame of hospital data with 11 million rows. Columns: ID (chr), outcome (1|0), 20x ICD-10 codes (chr). Rows: 10.6 million I wish to make the data tidy to allow modelling of diagnostic codes to a binary outcome. I would normally use…

r tidyverse tidyr large-data

asked Mar 15 '22 at 11:52

JisL

votes

1 answer

Using dplyr in r with large dataset (4 million rows)

I m doing some data manipulation with dplyr that with my huge data(b) frame. I have been able to work successfully on smaller subsets of my data. I guess my problem is with the size of my data frame. I have data frame that has 4 million rows and 34…

r dplyr large-data

asked Sep 07 '20 at 23:57

Ozgur Alptekın

votes

3 answers

How to handle large yet not big-data datasets?

I have a ~200gb dataset of approx 1.5 bln observations, on which I need to run some conditional analysis and data aggregation*. The thing is that I'm not used to (nor trained to handle) large datasets. I usually work on R or Python (with some Julia…

dataset large-data

asked Feb 05 '20 at 15:46

SomePhDStudentGuy

votes

2 answers

numpy.memmap for an array of strings?

Is it possible to use numpy.memmap to map a large disk-based array of strings into memory? I know it can be done for floats and suchlike, but this question is specifically about strings. I am interested in solutions for both fixed-length and…

python string numpy memory-mapped-files large-data

asked May 05 '11 at 11:10

NPE

486,780
108
951
1,012

votes

0 answers

How do I load a large dataset into Python from MS SQL Server?

Setup: I have a pre-processed dataset on an MS SQL Server that is about 500.000.000 rows and 20 columns, where one is a rather long text column (varchar(1300)), which amounts to about 35gb data space on the SQL database. I'm working on the physical…

python sql-server performance odbc large-data

asked May 24 '19 at 07:33

iraserd

votes

3 answers

How to identify all sequential numbers not covered by 'to' and 'from' positions?

I have a data table that defines the start and end coordinates for a set of sequnces. For example: df1 <- data.frame(from = c(7, 22, 35, 21, 50), to = c(13, 29, 43, 31, 60)) Given start and end coordinates (ie 1 and 100), I am trying…

r sequence large-data

asked Apr 16 '19 at 15:27

Powege

votes

1 answer

NodeJS socket.IO disconnects when sending large Json

I'm writing a multilayer card game (like hearthstone) with Nodejs back-end and an angular front-end. I tried to connect the two with Socket.IO, but it turned out that if I send a JSON object over about 8000char(the gameState object), then the…

node.js typescript socket.io large-data

asked Jun 27 '18 at 18:29

Ez Az

votes

1 answer

Optimal CLion VM memory settings for very large projects

Im currently working on a fork of a VERY LARGE project with about 7-8 * 10^6 LoC and 100000+ classes. The problem is, of course, that the indexer or CLion in general runs out of memory or is very slow and not responsive. I already saw the blog entry…

memory jvm settings clion large-data

asked Apr 28 '17 at 08:26

p0w3r

votes

2 answers

All k nearest neighbors in 2D, C++

I need to find for each point of the data set all its nearest neighbors. The data set contains approx. 10 million 2D points. The data are close to the grid, but do not form a precise grid... This option excludes (in my opinion) the use of KD Trees,…

c++ algorithm nearest-neighbor large-data

asked Nov 13 '10 at 11:50

Ian

votes

2 answers

Maintaining a large table of unique values in MySQL

This is probably a common situation, but I couldn't find a specific answer on SO or Google. I have a large table (>10 million rows) of friend relationships on a MySQL database that is very important and needs to be maintained such that there are no…

mysql large-data

asked Nov 11 '10 at 08:21

eric

1,453
2
20
32

votes

1 answer

How to efficiently store a large Java map?

I am brute-forcing one game and I need to store data for all positions and outcomes. Data will likely be hundreds of Gb in size. I considered SQL, but I am afraid that lookups in a tight loop might kill performance. Program will iterate over…

java sql large-data

asked Nov 24 '16 at 21:30

Stepan

1,391
18
40

votes

1 answer

MongoDB server freeze - large amount of collections

We have large MongoDB database (about 1,4mln collections), MongoDB 3.0, engine rocksDB, operating system Ubuntu 14.04. This DB is located on virtual machine (VmWare vCloud) with 16 cores and 108 GB RAM (currently mongoDB used 70GB of memory without…

mongodb freeze rocksdb large-data

asked Jun 14 '16 at 13:37

Kenny6

votes

1 answer

Memory mapped file for numpy arrays

I need to read in parts of a huge numpy array stored in a memory mapped file, process the data and repeat for another part of the array. The whole numpy array takes up around 50 GB and my machine has 8 GB of RAM. I initially created the memory…

numpy memory large-files large-data

asked Oct 05 '14 at 15:19

KartMan

votes

1 answer

SQL query on H2 database table throws ArrayIndexOutOfBoundsException

I have a H2 database on which some queries work, while others are throwing an ArrayIndexOutOfBoundsException. For example: SELECT COLUMN_1 FROM MY_TABLE; // works fine SELECT COUNT(COLUMN_1) FROM MY_TABLE; // gives following error message: [Error…

sql h2 indexoutofboundsexception dbvisualizer large-data

asked Aug 21 '14 at 14:17

Kaadzia

1,393
1
14
34

Prev 1 2 3

…

99 100 Next