IllegalMonitorStateException in Poller ReentrantLock causes polling to stop
Description
I've configured a pseudo node on my OpenNMS instance for checking internet connectivity and latency. It has a public DNS IPv4 and IPv6 address assigned as interface addresses. After upgrading to 20.0.0 the collection of IPv4 response times stop to work after several hours (~9h on my server). Since the node and its interfaces stay up and no event is sent I think it is only related to the collection of the response times. Curiously, the IPv6 response times are still being collected and persisted.
I've not configured anything related to ICMP in my opennms.properties.
Also, the poller.log is full of exceptions like this. So, maybe this is related to issue .
Environment
Ubuntu 16.04, 97 Nodes, 140 Interfaces, IPv4 and IPv6
I can no longer reproduce the problem with the patch. I've also attached a thread dump from an existing system running 20.0.0 that experiences the reported problem.
Seth Leger June 29, 2017 at 4:51 PM
I've creating a PR that fixes the IllegalMonitorStateException problems:
If somebody can attach a thread dump of a 20.0.0 system experiencing this problem so that I can verify that this is the only issue, that would be appreciated.
Seth Leger June 28, 2017 at 4:50 PM
I think I see what's going on here... the semantics of the locks are the same but the PollableElement.withTreeLock(Callable<T>, long) method can try to release a lock (in the finally block) that it failed to obtain. This is throwing the IllegalMonitorStateException instead of the expected LockUnavailable exception. This explains why the exception is being thrown however it might not explain why all polls stop for the service.
I've configured a pseudo node on my OpenNMS instance for checking internet connectivity and latency. It has a public DNS IPv4 and IPv6 address assigned as interface addresses. After upgrading to 20.0.0 the collection of IPv4 response times stop to work after several hours (~9h on my server). Since the node and its interfaces stay up and no event is sent I think it is only related to the collection of the response times. Curiously, the IPv6 response times are still being collected and persisted.
I've not configured anything related to ICMP in my opennms.properties.
Also, the poller.log is full of exceptions like this. So, maybe this is related to issue .