Details
-
Type:
Bug
-
Status: Resolved (View Workflow)
-
Priority:
Minor
-
Resolution: Configuration
-
Affects Version/s: 24.1.3
-
Fix Version/s: None
-
Component/s: Notifications / Actions
-
Security Level: Default (Default Security Scheme)
-
Labels:None
-
Environment:RHEL 7, OpenJDK 8, PostgreSQL 9.2
-
HB Backlog Status:HB
Description
I have some services that are having frequent, small SNMP outages (down for ~30 seconds), and not all of the notifications are getting auto-acked. So, despite having a 2m delay on the notification path, I'm getting notified about issues that are already resolved, and not ever being told that they are resolved.
An example of a recent outage's timing:
Lost Service Time | 2020-07-08T10:05:36-04:00 |
---|
Regained Service Time | 2020-07-08T10:06:07-04:00 |
---|
But, the notification related to that (linked to the same outage event):
Notification Time | 2020-07-08T10:05:37-04:00 | Time Replied |   |
---|
Users Notified
Sent To | Sent At | Media | Contact Info |
---|---|---|---|
mkelly | 2020-07-08T10:07:51-04:00 | javaEmail |   |
So, it didn't get acked at 10:06:07, so I got an email at 10:07:51, even though the outage was already resolved.
—
My config for notifications:
- Using auto-acknowledge-alarm:
<auto-acknowledge-alarm resolution-prefix="RESOLVED: "> <uei>uei.opennms.org/nodes/serviceResponsive</uei> <uei>uei.opennms.org/nodes/nodeRegainedService</uei> <uei>uei.opennms.org/nodes/interfaceUp</uei> <uei>uei.opennms.org/nodes/nodeUp</uei> <uei>uei.opennms.org/correlation/remote/wideSpreadOutageResolved</uei> <!-- omit a few custom alarms for our environment --> <uei>uei.opennms.org/threshold/highThresholdRearmed</uei> <uei>uei.opennms.org/threshold/lowThresholdRearmed</uei> <uei>uei.opennms.org/internal/importer/importSuccessful</uei> </auto-acknowledge-alarm>
- Default queue handler stuff:
<queue> <queue-id>default</queue-id> <interval>20s</interval> <handler-class> <name>org.opennms.netmgt.notifd.DefaultQueueHandler</name> </handler-class> </queue>
- Destination path:
<path name="Email-Servers" initial-delay="2m"> <target> <name>Servers_OnCall</name> <autoNotify>on</autoNotify> <command>javaEmail</command> </target> </path>
—
Right now, nothing shows up for this whole month in notifd.log (I assume the default logging level isn't going to show me anything), but I do see this in alarmd for the specific alarm in question:
2020-07-08 10:10:31,613 WARN [alarmd-Thread-4-of-4] o.o.n.a.d.DroolsAlarmContext: Failed to acquire Drools session lock within 20000ms. Add or update for alarm with id=6059035
and reduction-key=uei.opennms.org/nodes/nodeLostService::2751:10.xx.xx.xx:SNMP will not be immediately reflected in the context.
Attachments
Issue Links
- mentioned in
-
Page Loading...