feat: add periodic WARNING metrics to assist in debugging#12976
feat: add periodic WARNING metrics to assist in debugging#12976agrawal-siddharth wants to merge 1 commit intogoogleapis:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a HealthCheckMetrics system within ConnectionWorker to monitor and log stream health, including latency, retry counts, and error responses. It also adds corresponding unit tests to verify metric gathering and threshold logic. The review feedback highlights several critical thread-safety concerns, recommending synchronization for metric updates and ensuring that shared state, such as the retry count and request queues, is accessed within existing locks to prevent race conditions. Additionally, there are suggestions to improve resource efficiency by caching the Gson instance and declaring constants as static final.
| HealthCheckMetrics.HealthCheckFields gatherHealthCheckMetrics() { | ||
| HealthCheckFields healthCheckFields = healthCheckMetrics.new HealthCheckFields(); | ||
| healthCheckFields.responseCodes = new HashMap<>(); | ||
| healthCheckMetrics.gatherHealthCheckMetrics(healthCheckFields); | ||
| healthCheckMetrics.emitHealthCheckMetrics(healthCheckFields); | ||
| return healthCheckFields; | ||
| } |
There was a problem hiding this comment.
This method accesses ConnectionWorker fields (like waitingRequestQueue and inflightRequestQueue) that are guarded by this.lock. It must acquire the lock before calling gatherHealthCheckMetrics to avoid race conditions and potential ConcurrentModificationException during tests.
HealthCheckMetrics.HealthCheckFields gatherHealthCheckMetrics() {
this.lock.lock();
try {
HealthCheckFields healthCheckFields = healthCheckMetrics.new HealthCheckFields();
healthCheckFields.responseCodes = new HashMap<>();
healthCheckMetrics.gatherHealthCheckMetrics(healthCheckFields);
healthCheckMetrics.emitHealthCheckMetrics(healthCheckFields);
return healthCheckFields;
} finally {
this.lock.unlock();
}
}References
- Ensure that access to shared state is protected by a lock when accessed concurrently by multiple threads.
| inflightRetryCount.set(0L); | ||
| this.lock.lock(); |
There was a problem hiding this comment.
inflightRetryCount.set(0L) should be moved inside the lock. This prevents a race condition where a concurrent requestCallback decrements the counter after it has been reset to 0 by the cleanup thread, leading to a negative value.
| inflightRetryCount.set(0L); | |
| this.lock.lock(); | |
| this.lock.lock(); | |
| inflightRetryCount.set(0L); | |
| try { |
References
- Ensure that access to shared state is protected by a lock when accessed concurrently by multiple threads.
| Gson gson = new Gson(); | ||
| log.warning(gson.toJson(healthCheckFields)); |
There was a problem hiding this comment.
Use a reusable GSON instance instead of creating a new one on every call to improve efficiency.
| Gson gson = new Gson(); | |
| log.warning(gson.toJson(healthCheckFields)); | |
| log.warning(GSON.toJson(healthCheckFields)); |
References
- To improve resource efficiency, cache and share resource-intensive clients or objects instead of creating a new instance for every call.
f2bf77d to
4b9d050
Compare
| @@ -249,6 +257,7 @@ class ConnectionWorker implements AutoCloseable { | |||
|
|
|||
| private final RequestProfiler.RequestProfilerHook requestProfilerHook; | |||
| private final TelemetryMetrics telemetryMetrics; | |||
| private final HealthCheckMetrics healthCheckMetrics; | |||
There was a problem hiding this comment.
can we mark the whole healthCheckMetrics guarded by lock ?
There was a problem hiding this comment.
Added the GuardedBy annotation.
However, at present this does not seem to be enforced on this project.
| /* | ||
| * Dump health check metrics as WARNING log. | ||
| */ | ||
| private void emitHealthCheckMetrics(HealthCheckFields healthCheckFields) { |
There was a problem hiding this comment.
why do we only expose three metrics here?
There was a problem hiding this comment.
Actually the populated HealthCheckFields are passed in to this method. I have clarified this in the comments. The modifications being made to 3 fields are only for purposes of rounding to 2 decimal places.
4b9d050 to
ac34cb6
Compare
No description provided.