Questions tagged [large-files]

Large files, whether binary or text, can sometimes be problematic even for an experienced programmer. This tag should be used if issues arise relating to opening and/or writing large files in a text editor, managing resources that run to gigabytes, or strategic decisions for large amounts of data.

Large files, whether binary or text, can sometimes be problematic even for an experienced programmer. This tag should be used if issues arise relating to opening and/or writing large files in a text editor, managing resources that run to gigabytes, or strategic decisions for large amounts of data.

Think about how notepad slows down appreciably when working with files that are hundreds of megabytes in size or larger. Some form of strategy needs to be used to work around such resource constraints, especially when data collection is so easy these days.

Processing large amounts of text can also cause bottlenecks if there is much processing to be done. Including this tag could also help elaborate on the optimisations that can be suggested to one's code.

1690 questions
33
votes
3 answers

Are there any good workarounds to the GitHub 100MB file size limit for text files?

I have a 190 MB plain text file that I want to track on github. The text file is a pronounciation lexicon file for our text-to-speech engine. We regularly add and modify lines in the text files, and the diffs are fairly small, so it's perfect for…
josteinaj
  • 477
  • 1
  • 4
  • 11
33
votes
8 answers

large amount of data in many text files - how to process?

I have large amounts of data (a few terabytes) and accumulating... They are contained in many tab-delimited flat text files (each about 30MB). Most of the task involves reading the data and aggregating (summing/averaging + additional…
hatmatrix
  • 42,883
  • 45
  • 137
  • 231
30
votes
6 answers

Python: How to read huge text file into memory

I'm using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file $ ls -l links.csv; file links.csv; tail links.csv -rw-r--r-- 1 user user 469904280 30 Nov 22:42 links.csv links.csv: ASCII text, with CRLF line…
asmaier
  • 11,132
  • 11
  • 76
  • 103
30
votes
8 answers

Reading very large files in PHP

fopen is failing when I try to read in a very moderately sized file in PHP. A 6 meg file makes it choke, though smaller files around 100k are just fine. i've read that it is sometimes necessary to recompile PHP with the -D_FILE_OFFSET_BITS=64 flag…
user5564
28
votes
6 answers

Processing large JSON files in PHP

I am trying to process somewhat large (possibly up to 200M) JSON files. The structure of the file is basically an array of objects. So something along the lines of: [ {"property":"value", "property2":"value2"}, {"prop":"val"}, ... …
The Mighty Rubber Duck
  • 4,388
  • 5
  • 28
  • 27
28
votes
7 answers

Best way to process large XML in PHP

I have to parse large XML files in php, one of them is 6.5 MB and they could be even bigger. The SimpleXML extension as I've read, loads the entire file into an object, which may not be very efficient. In your experience, what would be the best way?
Petruza
  • 11,744
  • 25
  • 84
  • 136
27
votes
5 answers

Read lines by number from a large file

I have a file with 15 million lines (will not fit in memory). I also have a small vector of line numbers - the lines that I want to extract. How can I read-out the lines in one pass? I was hoping for a C function that does it on one pass.
Aleksandr Levchuk
  • 3,751
  • 4
  • 35
  • 47
26
votes
13 answers

PHP x86 How to get filesize of > 2 GB file without external program?

I need to get the file size of a file over 2 GB in size. (testing on 4.6 GB file). Is there any way to do this without an external program? Current status: filesize(), stat() and fseek() fails fread() and feof() works There is a possibility to…
Honza Kuchař
  • 629
  • 2
  • 9
  • 16
26
votes
8 answers

ERROR: could not stat file "XX.csv": Unknown error

I run this command: COPY XXX FROM 'D:/XXX.csv' WITH (FORMAT CSV, HEADER TRUE, NULL 'NULL') In Windows 7, it successfully imports CSV files of less than 1GB. If the file is more then 1GB big, I get an “unknown error”. [Code: 0, SQL State: XX000] …
亚军吴
  • 391
  • 1
  • 3
  • 7
24
votes
4 answers

Is it possible to slim a .git repository without rewriting history?

We have a number of git repositories which have grown to an unmanageable size due to the historical inclusion of binary test files and java .jar files. We are just about to go through the exercise of git filter-branching these repositories,…
Mark Booth
  • 7,605
  • 2
  • 68
  • 92
24
votes
6 answers

How do I download a large file (via HTTP) in .NET?

I need to download a large file (2 GB) over HTTP in a C# console application. Problem is, after about 1.2 GB, the application runs out of memory. Here's the code I'm using: WebClient request = new WebClient(); request.Credentials = new…
Nick Cartwright
  • 8,334
  • 15
  • 45
  • 56
22
votes
1 answer

How does HTTP file upload work for large files?

I just want to elaborate this question: How does HTTP file upload work?. This is the form from the question:
nxh
  • 1,073
  • 1
  • 11
  • 22
21
votes
2 answers

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory when processing large files with fs

I have a nodeJs script that process a bunch of large .csv files (1.3GB for all). It run for a moment and throw this error: FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory I have tried to put…
TOPKAT
  • 6,667
  • 2
  • 44
  • 72
21
votes
7 answers

Python Random Access File

Is there a Python file type for accessing random lines without traversing the whole file? I need to search within a large file, reading the whole thing into memory wouldn't be possible. Any types or methods would be appreciated.
Mantas Vidutis
  • 16,376
  • 20
  • 76
  • 92
21
votes
3 answers

Downloading a Large File - iPhone SDK

I am using Erica Sadun's method of Asynchronous Downloads (link here for the project file: download), however her method does not work with files that have a big size (50 mb or above). If I try to download a file above 50 mb, it will usually crash…
lab12
  • 6,400
  • 21
  • 68
  • 106