2

I'm working with faust and would like to leverage concurrency feature. The example listed doesn't quite demonstrate the use of concurrency.

What I would like to do is, read from kafka producer and unnest json. Then the shipments are sent to a process to calculate billing etc. I should send 10 shipments at one time to a function which does the calculation. For this i'm using concurrency so 10 shipments can calculate concurrently.

import faust
import time
import json
from typing import List
import asyncio

class Items(faust.Record):
    name: str
    billing_unit: str
    billing_qty: int


class Shipments(faust.Record, serializer="json"):
    shipments: List[Items]
    ship_type: str
    shipping_service: str
    shipped_at: str


app = faust.App('ships_app', broker='kafka://localhost:9092', )
ship_topic = app.topic('test_shipments', value_type=Shipments)


@app.agent(value_type=str, concurrency=10)
async def mytask(records):
# task that does some other activity
    async for record in records:
        print(f'received....{record}')
        time.sleep(5)


@app.agent(ship_topic)
async def process_shipments(shipments):
    # async for ships in stream.take(100, within=10):
    async for ships in shipments:
        data = ships.items
        uid = faust.uuid()
        for item in data:
            item_uuid = faust.uuid()
            print(f'{uid}, {item_uuid}, {ships.ship_type}, {ships.shipping_service}, {ships.shipped_at}, {item.name}, {item.billing_unit}, {item.billing_qty}')
            await mytask.send(value=("{} -- {}".format(uid, item_uuid)))

            # time.sleep(2)
        # time.sleep(10)


if __name__ == '__main__':
    app.main()

user3327034
  • 395
  • 3
  • 13

1 Answers1

1

Ok I figured out how it works. The problem with the example you gave was actually with the time.sleep bit, not the concurrency bit. Below are two silly examples that show how an agent would work with and without concurrency.

import faust
import asyncio

app = faust.App(
    'example_app',
    broker="kafka://localhost:9092",
    value_serializer='raw',
)

t = app.topic('topic_1')

# @app.agent(t, concurrency=1)
# async def my_task(tasks):
#   async for my_task in tasks:
#       val = my_task.decode('utf-8')
#       if (val == "Meher"):
#           # This will print out second because there is only one thread.
#           # It'll take 5ish seconds and print out right after Waldo
#           print("Meher's a jerk.")
#       else:
#           await asyncio.sleep(5)
#           # Since there's only one thread running this will effectively
#           # block the agent.
#           print(f"Where did {val} go?")

@app.agent(t, concurrency=2)
async def my_task2(tasks):
    async for my_task in tasks:
        val = my_task.decode('utf-8')
        if (val == "Meher"):
            # This will print out first even though the Meher message is 
            # received second. 
            print("Meher's a jerk.")
        else:
            await asyncio.sleep(5)
            # Because this will be sleeping and there are two threads available.
            print(f"Where did {val} go?")

# ===============================
# In another process run

from kafka import KafkaProducer

p = KafkaProducer()
p.send('topic_1', b'Waldo'); p.send('topic_1', b'Meher')

BWStearns
  • 2,567
  • 2
  • 19
  • 33
  • Interestingly, this doesn't work for me with concurrency=2 ... higher values do work though. minor hiccup :) – chad Oct 06 '21 at 20:53
  • 1
    Weird. I haven't been using faust recently but what's happening with `concurrency=2`? – BWStearns Oct 06 '21 at 21:52