0

I have a Jupyter Notebook running on AWS SageMaker. One of the cells in the notebook was reading data row by row from a large (~5m rows) datastore.

I ran the cell and then stopped it after confirming that it was reading the data.

The code is using a while loop (sample code from the docs):

import pulsar

client = pulsar.Client('pulsar://localhost:6650')

consumer = client.subscribe('my-topic', 'my-subscription')

while True:
    msg = consumer.receive()
    try:
        print("Received message '{}' id='{}'".format(msg.data(), msg.message_id()))
        # Acknowledge successful processing of the message
        consumer.acknowledge(msg)
    except Exception:
        # Message failed to be processed
        consumer.negative_acknowledge(msg)

client.close()

I am unable to open the notebook despite having enough memory (32gb) and clear the output from notebook memory / disk / kernel. The notebook size is not >350mb from a few kBs before. How do I clear the output / disk space and optimize my code for better performance.

snap

free -h
              total        used        free      shared  buff/cache   available
Mem:           7.7G        1.0G        4.6G        688K        2.0G        6.4G
Swap:            0B          0B          0B

https://pulsar.apache.org/docs/2.2.1/client-libraries-python/

kgh
  • 59
  • 4
  • The Python code looks straightforward and has no apparent inefficiencies or memory leaks. The doc link you posted is from a relatively old version of Pulsar (2.2). Can you try upgrading the 2.11 and reloading the notebook? – David Kjerrumgaard Feb 17 '23 at 17:32

0 Answers0