Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions

votes

3 answers

Best of breed indexing data structures for Extremely Large time-series

I'd like to ask fellow SO'ers for their opinions regarding best of breed data structures to be used for indexing time-series (aka column-wise data, aka flat linear). Two basic types of time-series exist based on the sampling/discretisation…

asked Apr 02 '12 at 06:24

Xander Tulip

1,438
2
17
32

votes

3 answers

Possibility to apply online algorithms on big data files with sklearn?

I would like to apply fast online dimensionality reduction techniques such as (online/mini-batch) Dictionary Learning on big text corpora. My input data naturally do not fit in the memory (this is why i want to use an online algorithm) so i am…

scikit-learn online-algorithm large-data

asked Sep 17 '12 at 13:18

votes

4 answers

How to store extremely large numbers?

For example I have a factorial program that needs to save really huge integers that can be 50+ digits long. The absolute maximum primitive data type in C++ is unsigned long long int with a maximum value 18446744073709551615 which is only 20 digits…

c++ data-structures int large-data

asked Aug 05 '13 at 14:06

Oleksiy

37,477
22
74
122

votes

2 answers

How do you encrypt large files / byte streams in Go?

I have some large files I would like to AES encrypt before sending over the wire or saving to disk. While it seems possible to encrypt streams, there seems to be warnings against doing this and instead people recommend splitting the files into…

go encryption aes large-data

asked Mar 29 '18 at 01:16

Xeoncross

55,620
80
262
364

votes

5 answers

C Programming File Reading/Writing Technique

It is my first time to create a program with file reading and writing involved. Actually I'm wondering what is the best technique on doing this. Because when I compared my work with my classmate, our logic are very different from each other. You…

c file input large-data

asked Dec 04 '10 at 14:26

newbie

14,582
31
104
146

votes

1 answer

Split large file according to value in single column (AWK)

I would like to split a large file (10^6 rows) according to the value in the 6th column (about 10*10^3 unique values). However, I can't get it working because of the number of records. It should be easy but it's taking hours already and I'm not…

awk split genetic large-data

asked May 19 '13 at 13:56

Elmer

votes

3 answers

Removing duplicates on very large datasets

I'm working on a 13.9 GB csv file that contains around 16 million rows and 85 columns. I know there are potentially a few hundred thousand rows that are duplicates. I ran this code to remove them import…

python duplicates large-data

asked Sep 19 '18 at 13:49

Vlad

votes

4 answers

Large fixed effects binomial regression in R

I need to run a logistic regression on a relatively large data frame with 480.000 entries with 3 fixed effect variables. Fixed effect var A has 3233 levels, var B has 2326 levels, var C has 811 levels. So all in all I have 6370 fixed effects. The…

r logistic-regression mixed-models microsoft-r large-data

asked Feb 20 '15 at 16:47

Phil

votes

6 answers

With Haskell, how do I process large volumes of XML?

I've been exploring the Stack Overflow data dumps and thus far taking advantage of the friendly XML and “parsing” with regular expressions. My attempts with various Haskell XML libraries to find the first post in document-order by a particular user…

xml haskell large-data haskell-tagsoup

asked Feb 18 '10 at 22:30

Greg Bacon

134,834
32
188
245

votes

3 answers

Python tools for out-of-core computation/data mining

I am interested in python mining data sets too big to sit in RAM but sitting within a single HD. I understand that I can export the data as hdf5 files, using pytables. Also the numexpr allows for some basic out-of-core computation. What would come…

python numpy data-mining large-data database

asked Jan 23 '13 at 14:53

user17375

votes

4 answers

How can I analyse ~13GB of data?

I have ~300 text files that contain data on trackers, torrents and peers. Each file is organised like this: tracker.txt time torrent time peer time peer ... time torrent ... I have several files per tracker and much of the information…

java database matlab large-data

asked Jul 12 '12 at 10:32

WilliamMayor

votes

6 answers

check 1 billion cell-phone numbers for duplicates

It's an interview question: There are 1 billion cell-phone numbers which has 11 digits, they are stored randomly in a file, for example 12345678910, the first digit gotta be 1. Go through these numbers to see whether there is one with…

algorithm large-data

asked Oct 09 '11 at 11:02

Alcott

17,905
32
116
173

votes

3 answers

Dealing with huge data in select boxes

Hi I am using jQuery and retrieving "items" from one of my mySQL tables. I have around 20,000 "items" in that table and it is going to be used as a search parameter in my form. So basically they can search for "purchases" which contain that…

jquery html-select large-data

asked Feb 18 '11 at 00:07

Girish Dusane

1,120
4
12
19

votes

1 answer

Moore-Penrose generalized inverse of a large sparse matrix

I have a square matrix with a few tens of thousands rows and columns with only a few 1 beside tons of 0, so I use the Matrix package to store that in R in an efficient way. The base::matrix object cannot handle that amount of cells, as I run out of…

r sparse-matrix linear-algebra matrix-inverse large-data

asked May 23 '14 at 09:24

daroczig

28,004
7
90
124

votes

2 answers

Why does MongoDB takes up so much space?

I am trying to store records with a set of doubles and ints (around 15-20) in mongoDB. The records mostly (99.99%) have the same structure. When I store the data in a root which is a very structured data storing format, the file is around 2.5GB for…

mongodb large-data database

asked Nov 20 '13 at 05:11

xcorat

1,434
2
17
34

Prev 1

…

99 100 Next