While running some hadoop commands I noticed that running hadoop
binary from some machines (but not others!) takes a whole lot of time. E.g., a simple fs -ls /tmp
is ~20min (otherwise this was seconds). When stack-trace-dumping the hadoop process, I found that they systematically hang at e.g.,
futex(0x7facd6asb9d0, FUTEX_WAIT, 18650, NULL)
Not sure how to further debug this/what are suggested steps here. It seems that there is some kind of deadlock/mutex in place, but it makes little sense this works on some machines and not others (that slow). Any ideas?