Cleared alarms with closed ticket state not removed when using a hybrid approach

Description

When I created NMS-13189, I described the 4 use cases I thought users would ever use.

To recapitulate, here are the customer requirements:

  • The customer doesn't want the Tickets to be automatically created by OpenNMS. An operator will create them manually from the alarm details page.

  • The customer doesn't want the Tickets to be automatically closed for cleared alarms.

  • The customer doesn't want the Alarms to be automatically cleared for closed tickets.

  • The customer wants the ticket state to be automatically updated for cleared alarms... and have those alarms to be cleared.

Besides what was explained on that other Jira issue, it turns out that the customer who triggered that effort wants to use the Ticketing Integration differently.

They are interested in a hybrid approach. They want to create tickets manually from the OpenNMS WebUI via the alarm page. Still, they want Alarmd to update the ticket state automatically based on what happened in the TTicket Implementation (in their case, Remedy). Meaning, if they close the ticket, the alarm should be automatically updated. Then, the cleared alarms (with a closed ticket state) must be automatically removed from the database.

My initial thought was to apply the following change to the "updateTickets" rule in "alarmd.drl":

That solves part of the problem; updating the alarm with the ticket state no matter what.

However, the cleared alarms with a closed ticket state are never removed from the database, although the condition triggers either the "cleanUp" or the "fullCleanUp" rule in "alarmd.drl".

I tested in the following two scenarios, using JIRA:

1) Close the ticket before receiving the resolving event.

1.1) Generate a nodeLostService
1.2) Manually create the ticket from the alarm page in the OpenNMS WebUI
1.3) Close the ticket on JIRA
1.4) Wait until the alarm is updated (i.e., the ticket state should change from OPEN to CLOSED)
1.5) Generate a nodeRegainedService
1.6) Verify that the trigger alarm is now cleared (and the ticket state remains as CLOSED)
1.7) Wait until the alarm is removed from the database, which should happen according to the "cleanUp" rule (not happening).

2) Close the ticket after receiving the resolving event.

1.1) Generate a nodeLostService
1.2) Manually create the ticket from the alarm page in the OpenNMS WebUI
1.3) Generate a nodeRegainedService
1.6) Verify that the trigger alarm is now cleared (and the ticket state remains OPEN)
1.5) Close the ticket on JIRA
1.6) Wait until the alarm is updated (i.e., the ticket state should change from OPEN to CLOSED)
1.7) Wait until the alarm is removed from the database, which should happen according to the "cleanUp" rule (not happening).

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Chandra Gorantla May 6, 2021 at 7:22 PM
Edited

Added comment with this commit

Alejandro Galue May 5, 2021 at 6:33 PM

I was able to verify that if the window-time of the cleanUp rule is smaller than the window-time of the updateTickets rule, both use cases described here work, with the configuration changes outlined on this JIRA issue.

However, this inter-rule dependency is not intuitive, and it defeats the purpose of using Drools. Still, if that's the only solution we can have here, I think we must add a comment on the updateTickets rule so future users never forget about this hidden dependency.

Chandra Gorantla May 5, 2021 at 3:36 AM
Edited

  With updateTickets rule that you modified and having the window less than cleanUp rule should resolve the issue.

Chandra Gorantla May 4, 2021 at 8:32 PM

Took default alarmd.drl  and  enabled updateTickets and  clearAlarmsForClosedTickets and the issue is not reproducible.

Fixed

Details

Assignee

Reporter

HB Grooming Date

HB Backlog Status

Components

Sprint

Priority

PagerDuty

Created April 12, 2021 at 5:43 PM
Updated May 11, 2021 at 1:56 PM
Resolved May 6, 2021 at 7:22 PM