3

If my FTP client intends to upload files over 4 gigs in size, assuming I'm streaming the data, my pointer is going to hit the wall at around 4 gigs, if it's a 32 bit pointer, right? I'm trying to imagine what's going on behind the scenes and am not able to visualize how this could work... however it MUST work, since I have downloaded files larger than this in the past.

So, my question is two fold... what happens on the client (and does it need to be a 64 bit client, on a 64 bit machine) and what happens on the server (and does IT have to also be a 64 bit machine?)

I realize that the file will be broken into smaller files for transmission, but isn't the program going to explode just trying to address the parts of the file beyond the 4,294,967,295 mark?

I think this is a related post, but I'm not sure what conclusion they come to. The answers seem to point both to the limitations of the pointer (in their case PERL) and the OS. Why can't my Perl program create files over 4 GB on Windows?

Community
  • 1
  • 1
Yevgeny Simkin
  • 27,946
  • 39
  • 137
  • 236
  • In Java you are limited to reading/writing 2^31-1 bytes ~ 2 GB at a time. However, the file can be any length using repeated writes/reads. – Peter Lawrey Jan 21 '11 at 16:31
  • if you look at that question you mention, look at the update at the top, it was a matter of the poster using FAT32, which has a limitation on file size. Not really the same issue. – Evan Teran Jan 21 '11 at 16:36
  • I did see that, but then some of the answers seemed to point to an addressing problem, so, I thought I'd add it as a reference, I guess I can remove it, if it only serves to muddy the waters. – Yevgeny Simkin Jan 21 '11 at 16:44

3 Answers3

9

The client or server should read the data in chunks (I would do a multiple of the page size or something similar) and write the chunks to disk. There is no need to have the whole file in RAM all at once.

Something like this psuedo code (error checking and similar omitted) on the receiving end:

chunk = new byte[4096];
while(int size = recv(socket, chunk, 4096)) {
    write(file, chunk, size);
}

So the above sample is for the server, the client would do something similar too.

chunk = new byte[4096];
while(int size = read(file, chunk, 4096)) {
    send(sock, chunk, size);
}

EDIT:

To address your comment. One thing you have to keep in mind is that the offset in the file isn't neccessarily 32-bit on a 32-bit system, it can be 64-bit since it is not actually a pointer, it is simply an offset from the beginning of the file. If the OS supports 64-bit offsets (and modern windows/linux/osx all do), then you don't have to worry about it. As noted elsewhere, the filesystem the OS is trying to access is also a factor, but I figure if you have a file that is greater than 4GB, then it is clearly on a filesystem that supports it ;-).

Evan Teran
  • 87,561
  • 32
  • 179
  • 238
  • Evan, I understand the chunky part... what I'm concerned about is the reading part. I know it doesn't need to load the file into memory, but it still needs to read it, so that it can write any given chunk, right? So, as the file pointer is moving across the file (and copying those sections into a new file) what happens when the pointer exceeds the 4 gig mark? Maybe there's some kind of magic that happens behind the scenes that makes this a non-worry for me? If that's the case, is that true for all languages and OSs? – Yevgeny Simkin Jan 21 '11 at 16:26
  • 1
    That's a very system-specific question. Many operating systems and many standard libraries/APIs do not support files larger than 4GB--even 2GB in some cases. The same goes for file systems. There's no universal answer to your question. – Jonathan Grynspan Jan 21 '11 at 16:32
  • Ok, so just to be 100%... you're saying I simply don't have to worry about it? The internal mechanism of whatever read() method I use (probably java) will not toss its cookies as I move past the 4g mark, and I'll be able to just create as many chunk files as necessary as I read my way through the entire thing? – Yevgeny Simkin Jan 21 '11 at 16:41
  • Just to be clear, you can reuse the chunk so only a small portion is in memeory at a time. But anyway, if the OS and filesystem support it, then you should be good to go. I'm not aware of any issue with java and >4GB files. – Evan Teran Jan 21 '11 at 16:47
  • 1
    @Dr.Dredel with Java you don't have to care, if the underlying platform supports that large files, it'll handle it fine. With e.g. C/C++ on a 32 bit *nix you'll often have to compile your code with a special define to enable largefile support (_FILE_OFFSET_BITS=64 to be specific) – nos Jan 21 '11 at 20:42
3

I think your confusion may stem from the overloaded use of the word "pointer". A file's current position pointer is not the same as a pointer to an object in memory. Modern 32-bit OSes support 64-bit file pointers just fine.

Ferruccio
  • 98,941
  • 38
  • 226
  • 299
2

32 or 64 bit client has nothing to do with file size, 32 bit OS supports files larger then 4GB, the only thing needed is the underlying file system must support it. FAT16 does not support files bigger then 4GB, however FAT32 and NTFS does.

Every programming SDK supports 64 bit addressing for files, even inside 32 bit operating system. So even if you have 32 bit server and client you can still transfer file more then 4GB.

The handle of file used inside program maintains LONG integer(8 bytes), http://www.cplusplus.com/reference/clibrary/cstdio/ftell/ you can see that long is 8 bytes in most systems.

However if your SDK or OS only supports 32 bit file pointers, then you have problem.

Akash Kava
  • 39,066
  • 20
  • 121
  • 167
  • nitpick: `long int` is 32-bit on pretty much all mainstream 32-bit operating systems, and it isn't a pointer. – Evan Teran Jan 21 '11 at 16:32
  • 1
    int is 32 bit, not long, long is 64 bit – Akash Kava Jan 21 '11 at 16:35
  • this is highly depending on the language and compiler in C and C++, `long` is typically 32-bit on a 32-bit system. Since you reference the cplusplus.com, I have to stand by my statement. – Evan Teran Jan 21 '11 at 16:39
  • Just a follow up: MSVC using 32-bit `long`s (http://msdn.microsoft.com/en-us/library/s3f49ktz(v=vs.80).aspx) and gcc uses word size (32-bit on a 32-bit machine) for `long`s (http://gcc.gnu.org/onlinedocs/gccint/Type-Layout.html). – Evan Teran Jan 21 '11 at 18:31