2

I have been reducing the memory footprint of a program that uses several large arrays I have been working on by using file mapping. Since I don't know the desired size of these arrays I am overestimating the size to a level I know the arrays won't reach, and then truncating the files down to the final size I am finished with the arrays. Fortunately the code I am using to create the mapped files (at the bottom of this post), creates sparse files on every machine I have tried. If it didn't there would be a disk space problem.

Question is: is calling lseek to extend the file prior to mapping guaranteed to create a sparse file, or can it at least be relied on to do so with any reasonable Linux distro as well as Solaris.

Also is there any way of checking that the created file is sparse since it's probably better to exit than attempt to create several hundred GB of non-sparse files.

output_data_file_handle = open(output_file_name,O_RDWR | O_CREAT ,0600);
lseek(output_data_file_handle,output_file_size,SEEK_SET);
write(output_data_file_handle, "", 1);
void * ttv = mmap(0,(size_t)output_file_size,PROT_WRITE | PROT_READ, MAP_SHARED,output_data_file_handle,0);
alk
  • 69,737
  • 10
  • 105
  • 255
camelccc
  • 2,847
  • 8
  • 26
  • 52

2 Answers2

4

Referring your 2nd question: To test whether the file is (partially) a sparse file you can use the stat() command.

Example:

#include <stdio.h>
#include <sys/stat.h>

...

struct stat st = {0};

int result = stat("filename", &st);
if (-1 == result)
  perror("stat()");
else
{
  printf("size/bytes: %ld", st.st_size); /* 'official' size in bytes */
  printf("block size/bytes: %ld", st.st_blksize);
  printf("blocks: %ld", st.st_blocks); /* number of blocks actually on disk */

  if (st.st_size > (st.st_blksize * st.st_blocks))  
       printf("file is (at least partially) a sparse file");
}

...
Thor
  • 45,082
  • 11
  • 119
  • 130
alk
  • 69,737
  • 10
  • 105
  • 255
  • I tested the above code snippet on my sparse file and it seems that it will not work. My sparse file takes 8 blocks(4096bytes per block) and it's literal size is 10005 bytes, 8 * 4096 / 10005 = 3.27, which is classified into non-sparse file by this code. If you are interested about my test case, check this [post](https://stackoverflow.com/q/72908715/14092446) – Steve Lau Jul 09 '22 at 01:09
4

The manual lseek specifies the behaviour, when seeking beyond the end of a file, but it doesn't mention sparse files. So it depends on the OS and especially the file system used.

To test, whether you can create sparse files on your system, you can

dd if=/dev/zero of=/path/to/sparse.txt bs=1k seek=1024 count=1
du /path/to/sparse.txt

This skips 1024 1k blocks and then writes 1024 bytes. du should show only a few kB, if it's a sparse file, and around 1.1 MB if not.

Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198