When we allocate an array in Fortran or C, my understanding is that the memory is first allocated in the so-called virtual memory, while the physical memory is allocated only when we write data onto (some part of) the array (e.g., based on this page). Does this mean that, if we allocate a very large array (say 10^9 elements) and use only a fraction of it (say the first 10^6 elements), do we need only the physical memory for the latter? If so, is it practically no problem to utilize this feature to accommodate data of unknown (but not too large) size in a very large, pre-allocated array?
For example, the following Fortran code first allocates a large array of size 10^9, write data onto the first 10^6 elements, and perform reallocation to adjust the array size.
integer, allocatable :: a(:)
integer :: nlarge, nsmall, i
nlarge = 1000000000 !! 10^9
nsmall = 1000000 !! 10^6
allocate( a( nlarge ) ) !! allocate in virtual memory
print *, "after allocation"
call system( "ps aux | grep a.out" )
do i = 1, nsmall
a( i ) = i !! write actual data (we assume that "nsmall" is not known a priori)
enddo
print *, "after assignment"
call system( "ps aux | grep a.out" )
a = a( 1 : nsmall ) !! adjust the array size by reallocation
print *, "after reallocation"
call system( "ps aux | grep a.out" )
The output on my machine (Linux x86_64 with gfortran) is
after allocation
username 29064 0.0 0.0 3914392 780 pts/3 S+ 01:15 0:00 ./a.out
after assignment
username 29064 0.0 0.0 3914392 5188 pts/3 S+ 01:15 0:00 ./a.out
after reallocation
username 29064 0.0 0.0 12048 4692 pts/3 S+ 01:15 0:00 ./a.out
which shows that only ~5 MB of physical memory is used. Is it possible to utilize this feature to accommodate temporary data of unknown size (but below physical-memory size)?
Edit
To be more specific, my supposed system is a typical workstation running Linux x86_64 (e.g. CentOS) with tens of GB RAM, and the program is written in Fortran. The motivation for the Question is that, when I wish to store data of unknown size into an array, I usually need to know its size somehow and allocate an array appropriately. However, this approach is a bit tedious unless we have a built-in dynamic array. Typically, this situation occurs in two cases: (1) When one reads data from an external file that contains data of unknown size; and (2) when one collects data that match specific conditions over multi-dimensional loops. In case 1, we typically scan a file twice (once to get the data size and next to read the data), or alternatively pre-allocate a sufficiently large array as a buffer. So, I was interested in whether the virtual memory system helps simplify this task by allowing an allocation of very large arrays (without caring too much about the size).
However, from more experiment I got to know that this approach is rather limited... For example, if I change the size of arrays as follows, ifort
complains about "insufficient virtual memory" above ~80 GB, which corresponds probably to the sum of physical memory + swap region on my system. So, although "ulimit -a" says that virtual memory is "unlimited", it seems not unlimited in practice...
! compiled with: ifort -heap-arrays -assume realloc_lhs
use iso_fortran_env, only: long => int64
integer, allocatable :: a(:)
integer(long) :: nlarge, nsmall, i
! nlarge = 10_long**9 !! OK: 4 GB
! nlarge = 10_long**10 !! OK: 40 GB
nlarge = 2 * 10_long**10 !! OK: 80 GB
! nlarge = 3 * 10_long**10 !! NG: insufficient virtual memory (120 GB)
! nlarge = 4 * 10_long**10 !! NG: insufficient virtual memory (160 GB)
! nlarge = 10_long**11 ! NG: insufficient virtual memory (400 GB)
nsmall = 10**6 !! 4 MB
Conclusion: It seems better to use the traditional approaches (i.e., allocate an array with necessary size, or re-allocate an allocatable array repeatedly as needed, or utilize a user-defined dynamic array). I'm sorry for this trivial conclusion...