The ICMP monitor can fail, even if valid responses are received before the timeout

Description

This can happen on systems with heavy load, and is prone to occur when full GCs are happening.

The ICMP replies and timeout callbacks are handled on separate threads, and in certain cases it is possible for the timeout callback to be made when we have received, but not yet processed the response.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Ronny Trommer April 1, 2016 at 8:13 PM

seems fixed, can we delete the associated branch: https://github.com/OpenNMS/opennms/tree/jira/NMS-7974-TS

Jesse White January 28, 2016 at 9:48 AM

Jesse White November 16, 2015 at 3:09 PM
Edited

Jesse White November 16, 2015 at 2:41 PM

This was caused by an issue in the RequestTracker code used to keep track of retries, timeouts and responses.

In v0.7 the RequestTracker has been updated to make all of the callbacks on a single thread, ensuring any queued responses and handled before the associated timeouts.

Fixed

Details

Assignee

Reporter

Components

Affects versions

Priority

PagerDuty

Created November 9, 2015 at 4:19 PM
Updated April 1, 2016 at 8:13 PM
Resolved November 16, 2015 at 7:35 PM