-3

I saw: How do I customize nvidia-smi 's output to show PID username? but doesn't do what I want. I want the output to look:

USER        GPU PID    hostname %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
brando9       0 1234   ampere3  ... etc... whatever don't really care

but instead I see:

(metalearning_gpu) brando9~ $ nvidia-smi; ps -up `nvidia-smi -q -x | grep pid | sed -e 's/<pid>//g' -e 's/<\/pid>//g' -e 's/^[[:space:]]*//'`; hostname
Mon Feb  6 19:19:59 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|

|   0  NVIDIA A100-SXM...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   31C    P0    67W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM...  On   | 00000000:0A:00.0 Off |                    0 |
| N/A   28C    P0    61W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM...  On   | 00000000:44:00.0 Off |                    0 |
| N/A   29C    P0    63W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM...  On   | 00000000:4A:00.0 Off |                    0 |
| N/A   32C    P0    65W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM...  On   | 00000000:84:00.0 Off |                    0 |
| N/A   33C    P0    65W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   30C    P0    71W / 400W |  66729MiB / 81920MiB |     14%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM...  On   | 00000000:C0:00.0 Off |                    0 |
| N/A   30C    P0    62W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM...  On   | 00000000:C3:00.0 Off |                    0 |
| N/A   32C    P0    64W / 400W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    5   N/A  N/A     49854      C   .../envs/a100_env/bin/python    66727MiB |
+-----------------------------------------------------------------------------+
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
kexinh     49854  359  0.3 130510112 6749364 ?   Rsl  18:16 226:30 /dfs/user/kexinh/miniconda3/envs/a100_env/bin/python -m ipykernel_launcher -f /afs/cs.stanford.edu/u/kexinh/.local/share/jupyter/runtime/kernel-bbc9f45e-4513-4643-82c3-0f67dde751
ampere3

How do I add a column such that I can see the gpu id, pid and user name easily in bash/the terminal?

Even a command using python is fine e.g.

python -c 'some one liner python script that works'

related:


Here is the answer: (can't reopen question)

Answer:

(echo "GPU_ID PID UID APP" ; for GPU in 0 1 2 3 ; do for PID in $( nvidia-smi -q --id=${GPU} --display=PIDS | awk '/Process ID/{print $NF}') ; do echo -n "${GPU} ${PID} " ; ps -up ${PID} | awk 'NR-1 {print $1,$NF}' ; done ; done) | column -t

credit: https://www.reddit.com/r/HPC/comments/10x9w6x/comment/j7sg7w2/?utm_source=share&utm_medium=web2x&context=3

this solves my issues: https://stackoverflow.com/a/75403918/1601580

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323

1 Answers1

0

Here is the answer

Answer:

(echo "GPU_ID PID UID APP" ; for GPU in 0 1 2 3 ; do for PID in $( nvidia-smi -q --id=${GPU} --display=PIDS | awk '/Process ID/{print $NF}') ; do echo -n "${GPU} ${PID} " ; ps -up ${PID} | awk 'NR-1 {print $1,$NF}' ; done ; done) | column -t

credit: https://www.reddit.com/r/HPC/comments/10x9w6x/comment/j7sg7w2/?utm_source=share&utm_medium=web2x&context=3

this solves my issues: https://stackoverflow.com/a/75403918/1601580


This one is also nice and adds memory utililization:

(echo "GPU_ID PID MEM% UTIL% UID APP" ; for GPU in 0 1 2 3 ; do for PID in $( nvidia-smi -q --id=${GPU} --display=PIDS | awk '/Process ID/{print $NF}') ; do echo -n "${GPU} ${PID} " ; nvidia-smi -q --id=${GPU} --display=UTILIZATION | grep -A4 -E '^[[:space:]]*Utilization' | awk 'NR=0{gut=0 ;mut=0} $1=="Gpu"{gut=$3} $1=="Memory"{mut=$3} END{printf "%s %s ",mut,gut}' ; ps -up ${PID} | gawk 'NR-1 {print $1,$NF}' ; done ; done) | column -t

output:

GPU_ID  PID      MEM%  UTIL%  UID      APP
0       319310   16    58     minkai   exp_cond_lumo_latent1
1       320206   11    38     minkai   exp_cond_mu_latent1
3       59140    0     0      kexinh   --wandb
3       1202222  0     0      brando9  5CNN_opt_as_model_for_few_shot
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323