Configuration
Details
Assignee
UnassignedUnassignedReporter
Mike KellyMike KellyHB Grooming Date
Jul 21, 2020HB Backlog Status
HBComponents
Affects versions
Priority
Minor
Details
Details
Assignee
Unassigned
UnassignedReporter
Mike Kelly
Mike KellyHB Grooming Date
Jul 21, 2020
HB Backlog Status
HB
Components
Affects versions
Priority
PagerDuty
PagerDuty
PagerDuty
Created July 8, 2020 at 2:30 PM
Updated January 6, 2021 at 12:36 PM
Resolved January 6, 2021 at 12:36 PM
I have some services that are having frequent, small SNMP outages (down for ~30 seconds), and not all of the notifications are getting auto-acked. So, despite having a 2m delay on the notification path, I'm getting notified about issues that are already resolved, and not ever being told that they are resolved.
An example of a recent outage's timing:
Lost Service Time
2020-07-08T10:05:36-04:00
Regained Service Time
2020-07-08T10:06:07-04:00
But, the notification related to that (linked to the same outage event):
Notification Time
2020-07-08T10:05:37-04:00
Time Replied
 
Users Notified
Sent To
Sent At
Media
Contact Info
mkelly
2020-07-08T10:07:51-04:00
javaEmail
 
So, it didn't get acked at 10:06:07, so I got an email at 10:07:51, even though the outage was already resolved.
—
My config for notifications:
Using auto-acknowledge-alarm:
<auto-acknowledge-alarm resolution-prefix="RESOLVED: "> <uei>uei.opennms.org/nodes/serviceResponsive</uei> <uei>uei.opennms.org/nodes/nodeRegainedService</uei> <uei>uei.opennms.org/nodes/interfaceUp</uei> <uei>uei.opennms.org/nodes/nodeUp</uei> <uei>uei.opennms.org/correlation/remote/wideSpreadOutageResolved</uei> <!-- omit a few custom alarms for our environment --> <uei>uei.opennms.org/threshold/highThresholdRearmed</uei> <uei>uei.opennms.org/threshold/lowThresholdRearmed</uei> <uei>uei.opennms.org/internal/importer/importSuccessful</uei> </auto-acknowledge-alarm>
Default queue handler stuff:
<queue> <queue-id>default</queue-id> <interval>20s</interval> <handler-class> <name>org.opennms.netmgt.notifd.DefaultQueueHandler</name> </handler-class> </queue>
Destination path:
<path name="Email-Servers" initial-delay="2m"> <target> <name>Servers_OnCall</name> <autoNotify>on</autoNotify> <command>javaEmail</command> </target> </path>
—
Right now, nothing shows up for this whole month in notifd.log (I assume the default logging level isn't going to show me anything), but I do see this in alarmd for the specific alarm in question: