How can I simulate autoscaling backends with logs pulled into a DataFrame?

Question

I'm attempting to use request logs to simulate costs and resources used by auto-scaling a dynamic number of backend service pools. My use case is specific to Google Cloud Run, but there are similar models in use by systems like Kubernetes.

In a fully on-demand Cloud Run service deployment, a load balancer distributes requests to a pool of containers that may have as few as zero active instances, up to a configurable maximum. Each container in the service pool is identical (in terms of resources, configuration, etc.) and has a fixed number of maximum concurrent requests. Containers are active for whole number multiples of 100ms intervals.

These 100ms intervals are not aligned to seconds, but that's the best I've accomplished so far with my limited Pandas knowledge. The issue is that this overcounts when, say, a request arrives at t=50ms and lasts for a 90ms duration. This should only use one container for one 100ms interval, but my approach treats the container as active for both the 0-100ms and 100-200ms intervals, as it straddles both.

My logs have a starting timestamp and duration, which I start by transforming into absolute starting and ending timestamps:

time_boundaries_df = pd.read_csv(request_log_filename, index_col=None, header=0, \
    dtype={'duration':'int64'})
time_boundaries_df['start_time'] = pd.to_datetime(time_boundaries_df['start_time'])
time_boundaries_df['duration']   = pd.to_timedelta(time_boundaries_df['duration'], unit='ms')
time_boundaries_df['end_time']   = time_boundaries_df['start_time'] + time_boundaries_df['duration']

I'm aware that the date interpretation here is slow, but the format in the logs isn't consistent enough to use format=DATE_FORMAT.

Once I have the absolute start/end of each request, I explode each request into second-aligned 100ms windows. At this point, my model is overcounting many requests due to misalignment with the windows (the start=50ms, duration=90ms case, which is a contrived case where the overcount is 100%).

active_windows_df['active_window'] = active_windows_df.parallel_apply( \
    lambda row: pd.date_range(row['start_time'], row['end_time'], freq='100ms'), axis=1)
active_windows_df = active_windows_df.explode('active_window', ignore_index=True)
active_windows_df['active_window'] = active_windows_df['active_window'].dt.floor('100ms')
active_windows_df = active_windows_df.drop(columns=['start_time', 'duration', 'end_time'])

Finally, I'm counting container concurrency for each service pool via overlaps:

# Convert each aligned time window with request concurrency for each service pool
# (with a pool defined as a unique site_id/env_id pair).
container_concurrency_df = active_windows_df.groupby(['site_id', 'env_id', 'active_window']) \
    .size().reset_index(name='concurrency')

# Convert request concurrency to container concurrency by dividing
# by the concurrency each container supports and taking the ceiling.
container_concurrency_df['concurrency'] = np.ceil(container_concurrency_df['concurrency'] \
    .div(CONTAINER_CONCURRENCY))

# Finally, determine the number of 100ms slices each service pool would have used
# for the entire simulation.
container_concurrency_df = container_concurrency_df.groupby(['site_id', 'env_id'])['concurrency'] \
    .sum().reset_index(name='container_slices')

Because most requests are actually 400ms+ (since any single request can misalign to add up to one additional 100ms slot), the misalignment overcounting is probably limited to 25%, but I'd still like to solve it.

The Real Question

Is it possible for Pandas to count overlaps of rolling, dynamic 100ms windows rather than the alignment model I'm trying?

Finding a Correct Solution

A correct solution should fix the start=50ms, duration=90ms case by using only one 100ms container, and that container should also count as handling other requests (up to its concurrency limit) while it's active.

A correct solution should also only use one container for 300ms when:

Request A: start=50ms, duration=90ms
Request B: start=190ms, duration=50ms

A correct solution should also only use one container (assuming concurrency at least two) for 100ms when:

Request A: start=50ms, duration=90ms
Request B: start=60ms, duration=50ms

Google's own visualization of billable container time may be helpful. I'm simulating the "CPU only during requests" model.

How can I simulate autoscaling backends with logs pulled into a DataFrame?

0 Answers0