0

I have been trying to run the dga detection streamz on the rapidsai clx streamz docker container for the last few days without any resolution.I'm following the instructions on the rapids website: https://docs.rapids.ai/api/clx/legacy/intro-clx-streamz.html. I'm able to build the cocker container from the Dockerfile and run the container, but when I try to run the dga streamz it fails with the following error:

/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_cuda/utils.py:257: UserWarning: Cannot get CPU affinity for device with index 0, setting default affinity
  warnings.warn(
Creating local cuda cluster as no dask scheduler is provided.
2023-05-02 20:22:18,194 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-05-02 20:22:18,194 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-05-02 20:22:20,839 - distributed.worker - WARNING - Run Failed
Function: worker_init
args:     ()
kwargs:   {}
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/worker.py", line 3233, in run
    result = function(*args, **kwargs)
  File "/opt/clx_streamz/python/dga_detection.py", line 37, in worker_init
    worker = dask.distributed.get_worker()
AttributeError: module 'dask' has no attribute 'distributed'
<Client: 'tcp://127.0.0.1:33711' processes=1 threads=1, memory=15.25 GiB>
Traceback (most recent call last):
  File "/opt/clx_streamz/python/dga_detection.py", line 53, in <module>
    dga_detection.start()
  File "/opt/clx_streamz/python/clx_streamz_tools/streamz_workflow.py", line 141, in start
    client.run(self.worker_init)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/client.py", line 2916, in run
    return self.sync(
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/utils.py", line 338, in sync
    return sync(
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/utils.py", line 405, in sync
    raise exc.with_traceback(tb)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/utils.py", line 378, in f
    result = yield future
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/tornado/gen.py", line 769, in run
    value = future.result()
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/client.py", line 2821, in _run
    raise exc
  File "/opt/clx_streamz/python/dga_detection.py", line 37, in worker_init
    worker = dask.distributed.get_worker()
AttributeError: module 'dask' has no attribute 'distributed'
2023-05-02 20:22:20,849 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:37355'.
2023-05-02 20:22:20,849 - distributed.worker - ERROR - Failed to communicate with scheduler during heartbeat.
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/comm/tcp.py", line 225, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/worker.py", line 1215, in heartbeat
    response = await retry_operation(
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/utils_comm.py", line 419, in retry_operation
    return await retry(
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/utils_comm.py", line 404, in retry
    return await coro()
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/core.py", line 1221, in send_recv_from_rpc
    return await send_recv(comm=comm, op=key, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/core.py", line 986, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/comm/tcp.py", line 241, in read
    convert_stream_closed_error(self, e)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/distributed/comm/tcp.py", line 144, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) ConnectionPool.heartbeat_worker local=tcp://127.0.0.1:52698 remote=tcp://127.0.0.1:33711>: Stream is closed

Here is my GPU specs:

nvidia-smi
Tue May  2 16:31:18 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50                 Driver Version: 531.79       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 T...    On | 00000000:01:00.0  On |                  N/A |
| N/A   39C    P8               11W /  N/A|    464MiB / 16384MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

The docker image that I'm using is:

docker.io/rapidsai/rapidsai-clx:23.02-cuda11.8-runtime-ubuntu20.04-py3.10
Swooz
  • 5
  • 3

1 Answers1

1

I would suggest trying Nvidia Morpheus instead of Rapids AI CLX, as the latter is being deprecated. We have a Morpheus experimental repository for DGA Detection using AppShield plugin data as an input. This can give you a good idea of its capabilities.

Here are the instructions for setting up Morpheus.

More examples on getting started with Morpheus.

In addition, I will take a closer look at the Docker container to see to fix the issue.

  • "We have"—are you affiliated with this project? If so, you _must_ disclose that fact. Please read [How to not be a spammer](https://stackoverflow.com/help/promotion) – ChrisGPT was on strike May 03 '23 at 21:39