I am currently bumping into stack overflow issues while trying to use large high-dimensional arrays in an Abaqus user material (UMAT) subroutine written in Modern Fortran. To indicate the magnitude, there are about 15 4-D and 5-D double-precision arrays and derived types with sizes like (100,12,52,10), (6,7,100,100,6) among many other double precision and integer variables local to the sub-program units (smaller subroutines contained in modules). I am testing it on a single element model resulting in 32 calls to the UMAT subroutine, which is clearly not that intensive. Below is the error message from the Abaqus/Standard solver.
*** Error: Runtime stack limit has been exceeded.
This may be caused by user subroutines with large data structures allocated on
the stack or recursion. For suggestions on how to resolve this problem, please
refer to the chapter "Ensuring thread safety" in the ABAQUS documentation.
*** ERROR CATEGORY: ELEMENT LOOP
I learnt that automatic arrays could be the issue (Stack overflow in Fortran 90, Anything helpful about Fortran stack overflow?) and looked at the memory management solutions to use heap instead of stack memory. So far I have tried dynamic allocation and de-allocation using Fortran allocatable arrays and Abaqus-specific thread-safe allocatable arrays (https://abaqus-docs.mit.edu/2017/English/SIMACAESUBRefMap/simasub-c-localarrays.htm). Solver executes successfully when running with smaller (both dynamic/static) arrays e.g. (20,12,52,3),(6,7,20,20,6) but none of them solved the issue when trying those large arrays.
I don't have much background in memory and process management and have tried all options that I could find on the internet. I am unable to provide the code blocks as it is a large one. Hope the provided info would suffice to get a reasonable picture. What would be the stack size for the above array sizes? Could it be possible that the other double precision variables or the intermediate calculations performed with the arrays are still too many to exceed the stack limit within 32 calls to the subroutine? What else could be the reason for the stack memory bottleneck? Any suggestions to resolve this would be helpful.