Questions tagged [large-files]

Large files, whether binary or text, can sometimes be problematic even for an experienced programmer. This tag should be used if issues arise relating to opening and/or writing large files in a text editor, managing resources that run to gigabytes, or strategic decisions for large amounts of data.

Large files, whether binary or text, can sometimes be problematic even for an experienced programmer. This tag should be used if issues arise relating to opening and/or writing large files in a text editor, managing resources that run to gigabytes, or strategic decisions for large amounts of data.

Think about how notepad slows down appreciably when working with files that are hundreds of megabytes in size or larger. Some form of strategy needs to be used to work around such resource constraints, especially when data collection is so easy these days.

Processing large amounts of text can also cause bottlenecks if there is much processing to be done. Including this tag could also help elaborate on the optimisations that can be suggested to one's code.

1690 questions
15
votes
2 answers

Uploading Large files to AWS S3 Bucket with Django on Heroku without 30s request timeout

I have a django app that allows users to upload videos. Its hosted on Heroku and the uploaded files stored on an S3 Bucket. I am using JavaScript to directly upload the files to S3 after obtaining a presigned request from Django app. This is due to…
Basel J. Hamadeh
  • 935
  • 1
  • 7
  • 19
15
votes
2 answers

How can I process a large file via CSVParser?

I have a large .csv file (about 300 MB), which is read from a remote host, and parsed into a target file, but I don't need to copy all the lines to the target file. While copying, I need to read each line from the source and if it passes some…
Alex Orlov
  • 171
  • 1
  • 1
  • 6
15
votes
4 answers

Good and effective CSV/TSV Reader for Java

I am trying to read big CSV and TSV (tab-separated) Files with about 1000000 rows or more. Now I tried to read a TSV containing ~2500000 lines with opencsv, but it throws me an java.lang.NullPointerException. It works with smaller TSV Files with…
Robin
  • 3,512
  • 10
  • 39
  • 73
15
votes
2 answers

Large file not flushed to disk immediately after calling close()?

I'm creating large file with my python script (more than 1GB, actually there's 8 of them). Right after I create them I have to create process that will use those files. The script looks like: # This is more complex function, but it basically does…
Vyktor
  • 20,559
  • 6
  • 64
  • 96
15
votes
4 answers

Java: InputStream too slow to read huge files

I have to read a 53 MB file character by character. When I do it in C++ using ifstream, it is completed in milliseconds but using Java InputStream it takes several minutes. Is it normal for Java to be this slow or am I missing something? Also, I…
pflz
  • 1,891
  • 4
  • 26
  • 32
15
votes
2 answers

Parsing large (20GB) text file with python - reading in 2 lines as 1

I'm parsing a 20Gb file and outputting lines that meet a certain condition to another file, however occasionally python will read in 2 lines at once and concatenate them. inputFileHandle = open(inputFileName, 'r') row = 0 for line in…
James
  • 1,397
  • 3
  • 21
  • 30
14
votes
6 answers

Get Large File Size in C

Before anyone complains of "duplicate", I've been checking SO quite thoroughly, but there seem to be no clean answer yet, although the question looks quite simple. I'm looking for a portable C code, which is able to provide the size of a file, even…
Cyan
  • 13,248
  • 8
  • 43
  • 78
14
votes
8 answers

Reading Huge File in Python

I have a 384MB text file with 50 million lines. Each line contains 2 space-separated integers: a key and a value. The file is sorted by key. I need an efficient way of looking up the values of a list of about 200 keys in Python. My current approach…
moinudin
  • 134,091
  • 45
  • 190
  • 216
14
votes
7 answers

Is there a distributed VCS that can manage large files?

Is there a distributed version control system (git, bazaar, mercurial, darcs etc.) that can handle files larger than available RAM? I need to be able to commit large binary files (i.e. datasets, source video/images, archives), but I don't need to be…
joelhardi
  • 11,039
  • 3
  • 32
  • 38
14
votes
8 answers

How to scan through really huge files on disk?

Considering a really huge file(maybe more than 4GB) on disk,I want to scan through this file and calculate the times of a specific binary pattern occurs. My thought is: Use memory-mapped file(CreateFileMap or boost mapped_file) to load the file to…
Jichao
  • 40,341
  • 47
  • 125
  • 198
13
votes
5 answers

Time performance in Generating very large text file in Python

I need to generate a very large text file. Each line has a simple format: Seq_numnum_val 12343234 759 Let's assume I am going to generate a file with 100million lines. I tried 2 approaches and surprisingly they are giving very different time…
doubleE
  • 1,027
  • 1
  • 12
  • 32
13
votes
4 answers

Error tokenizing data. C error: out of memory pandas python, large file csv

I have a large csv file of 3.5 go and I want to read it using pandas. This is my code: import pandas as pd tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False) df = pd.concat(tp,…
Amal Kostali Targhi
  • 907
  • 3
  • 11
  • 22
13
votes
16 answers

Large File Download

Internet Explorer has a file download limit of 4GB (2 GB on IE6). Firefox does not have this problem (haven't tested safari yet) (More info here: http://support.microsoft.com/kb/298618) I am working on a site that will allow the user to download…
TonyB
  • 3,882
  • 2
  • 25
  • 22
13
votes
6 answers

how to extract files from a large (30Gb+) zip file on linux server

1) extract from large zip file I want to extract files from a large zip file (30Gb+) on the linux server. There is enough free disk space. I've tried jar xf dataset.zip. However, there's an error that push button is full, and it failed to extract…
Irene W.
  • 679
  • 1
  • 6
  • 15
13
votes
2 answers

Is O_LARGEFILE needed just to write a large file?

Is the O_LARGEFILE flag needed if all that I want to do is write a large file (O_WRONLY) or append to a large file (O_APPEND | O_WRONLY)? From a thread that I read titled "Cannot write >2gb index file" on the CLucene-dev mailing list, it appears…
Daniel Trebbien
  • 38,421
  • 18
  • 121
  • 193