Add worker resource and queue metrics to OTel instrumentation#613
Open
Mohammed0tarek wants to merge 3 commits intotaskiq-python:masterfrom
Open
Add worker resource and queue metrics to OTel instrumentation#613Mohammed0tarek wants to merge 3 commits intotaskiq-python:masterfrom
Mohammed0tarek wants to merge 3 commits intotaskiq-python:masterfrom
Conversation
Add counters and histograms to OpenTelemetryMiddleware: - tasks_sent: producer-side counter per task name - task_success / task_errors: consumer-side counters with retry_error attribute - task_execution_time: histogram using result.execution_time - task_wait_time: histogram measuring queue time from send to receive via UTC timestamps in labels Add tests covering all instruments, retry_error attribute paths, and queue time correctness.
- Add worker_active_tasks UpDownCounter driven by pre/post_execute hooks - Add worker_prefetched_tasks UpDownCounter via on_prefetch_queue_add/remove hooks in receiver - Add worker_cpu_utilization and worker_memory_utilization observable gauges (worker process only) - Add tests for all new metrics
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this adds
This extends the existing
OpenTelemetryMiddlewarewith four new metrics that givevisibility into what a worker process is doing at runtime, not just what tasks it has
finished.
Worker resource utilization (observable gauges, worker process only):
worker_cpu_utilization— CPU usage percentage of the worker processworker_memory_utilization— RSS memory in bytesBoth gauges are gaurded by
broker.is_worker_processso they stay silent inclient/producer processes.
Worker queue depth (UpDownCounters):
worker_active_tasks— number of tasks currently executing, incremented inpre_executeand decremented inpost_execute. Carries atask_nameattribute soyou can see per-task concurrency.
worker_prefetched_tasks— number of tasks sitting in the internal prefetch queue(fetched from the broker, waiting for a worker slot). Driven by two new hooks,
on_prefetch_queue_addandon_prefetch_queue_remove, called from theReceiveratthe right points — after a real message is enqueued, and after it is dequeued but
only once the
QUEUE_DONEsentinel is handled so it never fires on shutdown signals.Why UpDownCounter for the queue metrics
I didn't want to mess with the queue definition (they are local variables) so gauges where not the right thing
since they require polling and they are not an attribute in the receiver class.
Tests
Added tests for all four metrics to
tests/opentelemetry/test_metrics.py:test_active_tasks_counter— verifies net-zero value after task completion andtask_nameattribute correctnesstest_prefetch_queue_counter— calls the hooks directly (theReceiveris notexercised by
InMemoryBroker.kiq()) and asserts the counter valuetest_worker_resource_metrics_when_worker_process— setsis_worker_process = Trueandtriggers an OTel collection cycle to verify the gauges yield observations.
If feel like this is a slight workaround but I hope it is fine.
test_metrics_existupdated to includeworker_active_tasks