0

Summary

I'm making Client service that interacts with other MicroService Server.

By using gRPC Channel, this client get some data from server. The connection is used frequently, and participants are fixed, so I reuse gRPC Channel and Stub to reduce channel creating cost.

This service performs well for each requests, and each test functions alone at one time. However in multiple testing, I found that only one test succeed, and the other would be failed with TimeOutError(gRPC status -DEADLINE_EXCEEDED) or stopped.

Interestingly, this problem is solved when I remove channel caching(@lru_cache), or add pytest event_loop fixture override for session(or module) scope. I found the second method in this question/answer.

Why this happens? What makes my test stop or fail? I guess that it is related with event loop, but don't know details.


Minimal Reproducible Example(MRE)

# mre.py
from functools import lru_cache

from grpclib.client import Channel

from config.config import current_config
from custom.protobuf.service import OtherServiceStub


@lru_cache(maxsize=None)
def get_cached_client() -> OtherServiceStub:
    host, port = '127.0.0.1', 50051
    channel = Channel(host, port)
    cached_client = OtherServiceStub(channel)
    return cached_client

async def get_data(arg1: str = None):
    client = get_cached_client()
    data = client.get_detail_data(arg1='foo')
    return data
# test_mre.py

@pytest.mark.asyncio
async def test_1(): # SUCCEED
    client = get_cached_client()
    await client.get_data(arg1='foo')


@pytest.mark.asyncio
async def test_2(): # FAIL(or STOP)
    client = get_cached_client()
    await client.get_data(arg1='bar')

@pytest.mark.asyncio
async def test_3(): # FAIL(or STOP)
    client = get_cached_client()
    await client.get_data(arg1='something')
# solved if(1)
# not cached
def get_cached_client() -> OtherServiceStub:
    host, port = '127.0.0.1', 50051
    channel = Channel(host, port)
    cached_client = OtherServiceStub(channel)
    return cached_client

# solved if(2)
# override event_loop fixture
@pytest.fixture(scope="session")
def event_loop(request):
    """Create an instance of the default event loop for each test case."""
    loop = asyncio.get_event_loop_policy().new_event_loop()
    yield loop
    loop.close()

Environment

pytest==6.2.4
pytest-asyncio==0.15.1
grpcio==1.37.1
grpcio-tools==1.37.1
grpclib==0.4.1
protobuf==3.15.8
betterproto==1.2.5

1 Answers1

0

I found that this problem derives from the implementation of gRPC Channel and pytest event loop fixture.

lru_cache returns cached result if the same function is called. This, 'same' means that if function is called by same input(arguments). Asked caching function gets no argument, so if you call the function, you will get exactly same result with previous call except the first call. so, your grpc channel in test codes are all the same channel exactly.

# test_mre.py

@pytest.mark.asyncio
async def test_1(): 
    client = get_cached_client() # FIRST CALL - Create Channel & Stub object in here
    await client.get_data(arg1='foo')


@pytest.mark.asyncio
async def test_2(): # FAIL(or STOP)
    client = get_cached_client() # NOT FIRST CALL - Reuse object which is created in test_1
    await client.get_data(arg1='bar')

@pytest.mark.asyncio
async def test_3(): # FAIL(or STOP)
    client = get_cached_client() # NOT FIRST CALL - Reuse object which is created in test_1
    await client.get_data(arg1='something')

Then, why reused channel can't be used properly? The problem is in pytest-asyncio decorator.

@pytest.mark.asyncio makes new event loop and close when the function is done for each function which it is applied to. The default event loop scope is function. You can see this in the implementation of event loop fixture in pytest-asyncio.

Python gRPC Channel object enrolls event loop environment that it is created at, and the Channel is closed when that event loop is closed. In asked example, it is test_1 function event loop. When you called the same channel and try to use it in test_2 function, test_1 event loop is already closed, so the channel is closed(running=False, closed=True). It means that the await request would not get response forever.

@pytest.mark.asyncio
async def test_1(): 
    client = get_cached_client()
    await client.get_data(arg1='foo')
    # At this point, event loop and the channel is closed.


@pytest.mark.asyncio
async def test_2(): 
    client = get_cached_client() # Calling closed channel
    await client.get_data(arg1='bar')

@pytest.mark.asyncio
async def test_3(): 
    client = get_cached_client() # Calling closed channel
    await client.get_data(arg1='something')

So this is the reason why first test succeed but the other tests fail. Only in first event loop, channel is alive. If you had set timeout argument, then test would fail because you can't get response from gRPC Server in your timeout limit(no matter how enough). If not, you would see all other tests are stopped, because python gRPC Channel has no default timeout limit.

Your two solutions can fix up this issue. First, if Channel object is not cached, then each test function would create their own channel, and the event loop issue is cleared. Second, if you set default event loop in session scope, you can reuse your default event loop fixture in all test functions. So Channel object would not be closed(because it's event loop is not closed).