2

I have an application that uses OpenMPI and launch it on Windows and Linux. The version for Windows is working fine, however, running on a Linux cause memory allocation error. The problem occurs for certain app arguments, that require more calculations. To eliminate memory leaks I checked Linux version app by using Valgrind and got some output. After all, I tried to search information about the output and found some posts on stack overflow and GitHub(not enough reputation to attach links). After all, I updated openMPI to 2.0.2 and check app again. New output. Is it memory leaks in OpenMPI or I'm doing something wrong?

A piece of output:

==16210== 4 bytes in 1 blocks are definitely lost in loss record 5 of 327
==16210==    at 0x4C2DBB6: malloc (vg_replace_malloc.c:299)
==16210==    by 0x5657A59: strdup (strdup.c:42)
==16210==    by 0x51128E6: opal_basename (in /home/vshmelev/OMPI_2.0.2/lib/libopen-pal.so.20.2.0)
==16210==    by 0x7DDECA9: ???
==16210==    by 0x7DDEDD4: ???
==16210==    by 0x6FBFF84: ???
==16210==    by 0x4E4EA9E: orte_init (in /home/vshmelev/OMPI_2.0.2/lib/libopen-rte.so.20.1.0)
==16210==    by 0x4041FD: orterun (orterun.c:818)
==16210==    by 0x4034E5: main (main.c:13)

OpenMPI version:Open MPI: 2.0.2
Valgrind version: valgrind-3.12.0
Virtual mashine characteristics: Ubuntu 16.04 LTS x64

In case of using MPICH, the Valgrind output is:

==87863== HEAP SUMMARY:
==87863==     in use at exit: 131,120 bytes in 2 blocks
==87863==   total heap usage: 2,577 allocs, 2,575 frees, 279,908 bytes allocated
==87863== 
==87863== 131,120 bytes in 2 blocks are still reachable in loss record 1 of 1
==87863==    at 0x4C2DBB6: malloc (vg_replace_malloc.c:299)
==87863==    by 0x425803: alloc_fwd_hash (sock.c:332)
==87863==    by 0x425803: HYDU_sock_forward_stdio (sock.c:376)
==87863==    by 0x432A99: HYDT_bscu_stdio_cb (bscu_cb.c:19)
==87863==    by 0x42D9BF: HYDT_dmxu_poll_wait_for_event (demux_poll.c:75)
==87863==    by 0x42889F: HYDT_bscu_wait_for_completion (bscu_wait.c:60)
==87863==    by 0x42863C: HYDT_bsci_wait_for_completion (bsci_wait.c:21)
==87863==    by 0x40B123: HYD_pmci_wait_for_completion (pmiserv_pmci.c:217)
==87863==    by 0x4035C5: main (mpiexec.c:343)
==87863== 
==87863== LEAK SUMMARY:
==87863==    definitely lost: 0 bytes in 0 blocks
==87863==    indirectly lost: 0 bytes in 0 blocks
==87863==      possibly lost: 0 bytes in 0 blocks
==87863==    still reachable: 131,120 bytes in 2 blocks
==87863==         suppressed: 0 bytes in 0 blocks
==87863== 
==87863== For counts of detected and suppressed errors, rerun with: -v
==87863== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 
  • Links to related issues: [link 1](https://github.com/open-mpi/ompi/issues/2166) and [link 2](http://stackoverflow.com/questions/11218056/can-someone-explain-this-valgrind-error-with-open-mpi) – Vadim Shmelev Feb 16 '17 at 08:43
  • Valgrind [output](https://drive.google.com/file/d/0B871wCRylUoWQlMtTVI5WV9OWTg/view?usp=sharing) by using MPICH Version: 3.2. – Vadim Shmelev Feb 16 '17 at 13:16
  • [Link](https://drive.google.com/file/d/0B871wCRylUoWT2RvV0ZuZ3ZzVFk/view?usp=sharing) to sources – Vadim Shmelev Feb 16 '17 at 13:24
  • See here https://www.open-mpi.org/faq/?category=debugging#valgrind_clean – alfC Jul 22 '20 at 01:59

2 Answers2

1

These outputs point to some memory leak in the MPI library, not your application code. You can safely ignore them.

More specifically, these leaks are coming from the launchers. ORTE is the runtime environment for OpenMPI responsible for launching and managing MPI processes. Hydra is the launcher and process manager for MPICH.

Sourav
  • 379
  • 7
  • 13
0

The term "definitely lost" means the main function of your program at line 13 (As far as i see in the output) is leaking memory directly or calls some other function (orterun) which causes memory leak . you must fix those leaks or provide some more of your code.

take a look here before everything.

Rezaeimh7
  • 1,467
  • 2
  • 23
  • 40