0

As can be seen from the code below, the time for reading the big data will eat a large portion of the total CPU time. In my opinion, there should be some way to efficiently enhance the efficiency of data-reading. For instance, when one thread is reading the data the other threads could at the same time do some data processing.

I have tried to use the OpenMP to increase the efficiency of data-processing (part two) but need further help to figure out a way to further optimize the part one (TASK OR SECTION).

---------------updated-----------------------

At the current stage, I did not want to do multiple read/write, which could be possibly realized by MPI (MPI_FILE_WRITE_ALL). All I expected is as follows: one thread read the data of the next time step and the other threads could do the rest of the work of the current time step by using task or section constructs. Any suggestion in this direction.

Program main
  Implicit none  
  Integer i,j,k, Count, rl
  Integer, Parameter :: Nxt=961, Nyt=526, Nzt=100

  Integer OMP_GET_THREAD_NUM,  TID, OMP_GET_NUM_THREADS,  NTHREADS
  Real(4), Dimension(Nxt,Nyt,Nzt)      :: Ui, Vi, Wi, Pi
  Real(4), Dimension(Nxt*4,Nyt,Nzt)    :: Utotal       
  real*8:: start, finish, OMP_GET_WTIME
  Character(len=50) :: filename  

  call OMP_SET_NUM_THREADS(6)
!---------=====OpenMP Number Threads=======------------
!$OMP PARALLEL PRIVATE(NTHREADS, TID)
!$  TID = OMP_GET_THREAD_NUM()
! Only master thread does this
!$ IF (TID .EQ. 0) THEN
!$  NTHREADS = OMP_GET_NUM_THREADS()
!$ PRINT *, 'Number of threads = ', NTHREADS
!$ END IF
!$OMP END PARALLEL

 Do ii = 200000, 700000, 20  

  1912 format('../../../volume7/20_40/WI_Inst3Dsub_UVWP',I7.7)
  1913 format('../../../volume8/40_60/WI_Inst3Dsub_UVWP',I7.7)
  1914 format('../../../volume5/60_70/WI_Inst3Dsub_UVWP',I7.7)

  if(ii .le. 400000) Write(filename,1912) ii
  if(ii .gt. 400000) Write(filename,1913) ii
  if(ii .ge. 600000) Write(filename,1914) ii    

!$ start=OMP_GET_WTIME()       
!---------Part 1---------------  
  inquire(iolength=rl) Utotal(:,:,:)

  OPEN(10,FILE=trim(filename)//".dat",FORM='UNFORMATTED',&
          ACCESS='DIRECT', RECL=rl, STATUS='OLD')
      !,CONVERT='big_endian'
      COUNT = 1; READ(10,REC=COUNT) Utotal(:,:,:)
  CLOSE(10)

!---------Part 2 --------------- 
!$OMP PARALLEL DO PRIVATE(i,j,k) SHARED(Ui,Vi,Wi,Pi)
  DO k = 1, Nzt   
  DO j = 1, Nyt   
  DO i = 1, Nxt
    Ui(i,j,k) = Utotal(i+Nxt*0,j,k)
    Vi(i,j,k) = Utotal(i+Nxt*1,j,k)
    Wi(i,j,k) = Utotal(i+Nxt*2,j,k)
    Pi(i,j,k) = Utotal(i+Nxt*3,j,k)
  END DO; End Do; End Do
!$OMP END PARALLEL DO

!$ finish=OMP_GET_WTIME()
!$ Write(*,*) ii,'Time cost per step', finish-start   

! THERE ARE ALSO OTHER WOKRS

 End DO 

 End program  
DaMi
  • 75
  • 1
  • 7
  • Parallel I/O is hard - you will need hardware that is capable of supporting multiple read/writes to have any hope of success, and this is not the case on a laptop or most workstations, you need Lustre, GPFS or similar. If you have access to that then basic Fortran I/O is (to my knowledge) not guaranteed to be thread safe so simple OpenMP won't help you. To make sure it works you may have to use MPI-IO or one of the I/O libraries derived from it, such as HDF-5 or netcdf-4. – Ian Bush Jun 03 '19 at 14:22
  • @IanBushi Thanks for your comments. At the current stage, I did not want to multiple read/write, which could be possibly realized by MPI (MPI_FILE_WRITE_ALL). All I expected is as follows: one thread read the data of the next time step and the other threads could do the rest of the work of the current time step by using task or section constructs. Any suggestion in this direction? – DaMi Jun 03 '19 at 15:10
  • @HighPerformanceMark These statements actually work well when using gfortran and Intel Fortran. So I guess it can be OK although I have to admit that it can be amended. – DaMi Jun 03 '19 at 15:17
  • @HighPerformanceMark I see. if(ii .ge. 600000), then 1914 format('../../../volume5/60_70/WI_Inst3Dsub_UVWP',I7.7) can be excuted. I am pretty sure about this point. – DaMi Jun 03 '19 at 15:27
  • You can use all of OpenMP an MPI on any hardware. But the I/O will NOT be more efficient unless you have parallel filesystems or you make you own special effort to overlay I/O and computation. The *"All I expected is as follows: one thread read the data of the next time step and the other threads could do the rest of the work of the current time step by using task or section constructs"* will not happen automagically, you have to implement that yourself. – Vladimir F Героям слава Jun 03 '19 at 19:59

0 Answers0