Questions tagged [large-files]

Large files, whether binary or text, can sometimes be problematic even for an experienced programmer. This tag should be used if issues arise relating to opening and/or writing large files in a text editor, managing resources that run to gigabytes, or strategic decisions for large amounts of data.

Large files, whether binary or text, can sometimes be problematic even for an experienced programmer. This tag should be used if issues arise relating to opening and/or writing large files in a text editor, managing resources that run to gigabytes, or strategic decisions for large amounts of data.

Think about how notepad slows down appreciably when working with files that are hundreds of megabytes in size or larger. Some form of strategy needs to be used to work around such resource constraints, especially when data collection is so easy these days.

Processing large amounts of text can also cause bottlenecks if there is much processing to be done. Including this tag could also help elaborate on the optimisations that can be suggested to one's code.

1690 questions
13
votes
2 answers

How to open large files with PhpStorm 8?

I'm dealing with large SQL or XML files (can be up to 3GB) that I would like to open in my editor. I get the message: File is too large for PhpStorm editor I have 32GB of RAM and Windows 7 Pro - 64bits. Can I override that…
OwNAkS
  • 271
  • 1
  • 3
  • 10
13
votes
3 answers

Python - How to gzip a large text file without MemoryError?

I use the following simple Python script to compress a large text file (say, 10GB) on an EC2 m3.large instance. However, I always got a MemoryError: import gzip with open('test_large.csv', 'rb') as f_in: with gzip.open('test_out.csv.gz', 'wb')…
shihpeng
  • 5,283
  • 6
  • 37
  • 63
13
votes
4 answers

Random access gzip stream

I'd like to be able to do random access into a gzipped file. I can afford to do some preprocessing on it (say, build some kind of index), provided that the result of the preprocessing is much smaller than the file itself. Any advice? My thoughts…
jkff
  • 17,623
  • 5
  • 53
  • 85
13
votes
4 answers

grep -f alternative for huge files

grep -F -f file1 file2 file1 is 90 Mb (2.5 million lines, one word per line) file2 is 45 Gb That command doesn't actually produce anything whatsoever, no matter how long I leave it running. Clearly, this is beyond grep's scope. It seems grep…
cmo
  • 3,762
  • 4
  • 36
  • 64
13
votes
2 answers

Opening a large file in Java is very slow

I have a large (12GB) file and I need to extract small pieces of data (a few kilobytes each) from it, using Java. Seeking and reading the data, once the file is open, is very fast, but opening the file itself takes a long time - about 90 seconds. Is…
Little Bobby Tables
  • 5,261
  • 2
  • 39
  • 49
12
votes
3 answers

How to detect X-Accel-Redirect (Nginx) / X-Sendfile (Apache) support in PHP?

About Application I am working on an e-commerce application in PHP. To keep URL's secure, product download links are kept behind PHP. There is a file, say download.php, which accepts few parameter via GET and verifies them against a database. If all…
rahul286
  • 959
  • 2
  • 10
  • 28
12
votes
1 answer

How to deal with a huge, one-line file in Java

I need to read a huge file (15+GB) and perform some minor modifications (add some newlines so a different parser can actually work with it). You might think that there are already answers for doing this normally: Reading a very huge file in…
Jeutnarg
  • 1,138
  • 1
  • 16
  • 28
12
votes
1 answer

upload a large file over 1GB to 2GB using jQuery File Upload - blueimp (Ajax based) php / yii it showing error in Firefox Browser

I am trying to upload a large file over 1GB to 2GB using jQuery File Upload - blueimp (Ajax based) php / yii Framework 1.15 i have set these values to upload larger file memory_limit = 2048M upload_max_filesize = 2048M post_max_size = 2048M…
12
votes
3 answers

Hadoop put performance - large file (20gb)

I'm using hdfs -put to load a large 20GB file into hdfs. Currently the process runs @ 4mins. I'm trying to improve the write time of loading data into hdfs. I tried utilizing different block sizes to improve write speed but got the below…
Irvo
  • 131
  • 1
  • 1
  • 4
12
votes
6 answers

Finding k-largest elements of a very large file (while k is very LARGE)

Let's assume that we have a very large file which contains billions of integers , and we want to find k largest elements of these values , the tricky part is that k itself is very large too , which means we cannot keep k elements in the memory (for…
Arian
  • 7,397
  • 21
  • 89
  • 177
12
votes
3 answers

memory exhausted : for large files using diff

I am trying to create a patch using two large size folders (~7GB). Here is how I'm doing it : $ diff -Naurbw . ../other-folder > file.patch But maybe due to file sizes, patch is not getting created and giving an error: diff: memory exhausted I…
pritam
  • 2,452
  • 1
  • 21
  • 31
12
votes
8 answers

12 mb text data download from url and save to sd card ... Heap memory prob occur . any solution?

I use both of following , but not working for huge data..of 12 mb char[] chars = new char[1024]; int len; while((len=buffer.read(chars))>0) { data.append(chars,0,len); } and while ((line = reader.readLine()) != null) { sb.append(line +…
user1468129
11
votes
4 answers

Poor performance with large Java lists

I'm trying to read a large text corpus into memory with Java. At some point it hits a wall and just garbage collects interminably. I'd like to know if anyone has experience beating Java's GC into submission with large data sets. I'm reading an 8…
Jay Hacker
  • 1,835
  • 2
  • 18
  • 23
11
votes
4 answers

Reading and processing big text file of 25GB

I have to read a big text file of, say, 25 GB and need to process this file within 15-20 minutes. This file will have multiple header and footer section. I tried CSplit to split this file based on header, but it is taking around 24 to 25 min to…
user1142292
  • 111
  • 1
  • 1
  • 3
11
votes
1 answer

Computing MD5SUM of large files in C#

I am using following code to compute MD5SUM of a file - byte[] b = System.IO.File.ReadAllBytes(file); string sum = BitConverter.ToString(new MD5CryptoServiceProvider().ComputeHash(b)); This works fine normally, but if I encounter a large file…
spkhaira
  • 821
  • 7
  • 18