As can be seen from the code below, the time for reading the big data will eat a large portion of the total CPU time. In my opinion, there should be some way to efficiently enhance the efficiency of data-reading. For instance, when one thread is reading the data the other threads could at the same time do some data processing.
I have tried to use the OpenMP to increase the efficiency of data-processing (part two) but need further help to figure out a way to further optimize the part one (TASK OR SECTION).
---------------updated-----------------------
At the current stage, I did not want to do multiple read/write, which could be possibly realized by MPI (MPI_FILE_WRITE_ALL). All I expected is as follows: one thread read the data of the next time step and the other threads could do the rest of the work of the current time step by using task or section constructs. Any suggestion in this direction.
Program main
Implicit none
Integer i,j,k, Count, rl
Integer, Parameter :: Nxt=961, Nyt=526, Nzt=100
Integer OMP_GET_THREAD_NUM, TID, OMP_GET_NUM_THREADS, NTHREADS
Real(4), Dimension(Nxt,Nyt,Nzt) :: Ui, Vi, Wi, Pi
Real(4), Dimension(Nxt*4,Nyt,Nzt) :: Utotal
real*8:: start, finish, OMP_GET_WTIME
Character(len=50) :: filename
call OMP_SET_NUM_THREADS(6)
!---------=====OpenMP Number Threads=======------------
!$OMP PARALLEL PRIVATE(NTHREADS, TID)
!$ TID = OMP_GET_THREAD_NUM()
! Only master thread does this
!$ IF (TID .EQ. 0) THEN
!$ NTHREADS = OMP_GET_NUM_THREADS()
!$ PRINT *, 'Number of threads = ', NTHREADS
!$ END IF
!$OMP END PARALLEL
Do ii = 200000, 700000, 20
1912 format('../../../volume7/20_40/WI_Inst3Dsub_UVWP',I7.7)
1913 format('../../../volume8/40_60/WI_Inst3Dsub_UVWP',I7.7)
1914 format('../../../volume5/60_70/WI_Inst3Dsub_UVWP',I7.7)
if(ii .le. 400000) Write(filename,1912) ii
if(ii .gt. 400000) Write(filename,1913) ii
if(ii .ge. 600000) Write(filename,1914) ii
!$ start=OMP_GET_WTIME()
!---------Part 1---------------
inquire(iolength=rl) Utotal(:,:,:)
OPEN(10,FILE=trim(filename)//".dat",FORM='UNFORMATTED',&
ACCESS='DIRECT', RECL=rl, STATUS='OLD')
!,CONVERT='big_endian'
COUNT = 1; READ(10,REC=COUNT) Utotal(:,:,:)
CLOSE(10)
!---------Part 2 ---------------
!$OMP PARALLEL DO PRIVATE(i,j,k) SHARED(Ui,Vi,Wi,Pi)
DO k = 1, Nzt
DO j = 1, Nyt
DO i = 1, Nxt
Ui(i,j,k) = Utotal(i+Nxt*0,j,k)
Vi(i,j,k) = Utotal(i+Nxt*1,j,k)
Wi(i,j,k) = Utotal(i+Nxt*2,j,k)
Pi(i,j,k) = Utotal(i+Nxt*3,j,k)
END DO; End Do; End Do
!$OMP END PARALLEL DO
!$ finish=OMP_GET_WTIME()
!$ Write(*,*) ii,'Time cost per step', finish-start
! THERE ARE ALSO OTHER WOKRS
End DO
End program