reloading BSM daemon causes the state of serviceProblem alarm to be reset

Description

As part of troubleshooting effort for a customer’s issue, we found that reloading BSM daemon from the WebUI is causing the serviceProblem alarm severity to be reset. The severity deviates from what it should to something else upon reloading the daemon.

This is also reproducible in horizon 31 locally in a lab controlled environment. Here’s my observations of reproducing it

  • When a business service is associated with multiple ICMP service edges on different nodes

  • BS severity is set to match the highest severity of the edges.

  • BS is affected by multiple nodes

Attached is the reproduce video, alarmd.log in DEBUG level, events related to BSM states from PostgreSQL

Acceptance / Success Criteria

None

Attachments

6

Activity

Show:

Jeff Gehlbach January 20, 2023 at 7:49 PM

A patch JAR is possible (the changes are all isolated to the one or maybe two POMs) but might be messy for the customer to handle since they’re doing containerized deployments. do you think they would be open to trying the 31.0.4-SNAPSHOT tag from DockerHub? Or would handling patch JARs be preferable?

David Hustace January 19, 2023 at 4:58 PM

Just a quick comment then a suggestion…

  • The video is a really nice capture of the problem. Thanks for making the effort.

  • Since we missed the release, can we send them a patch?

Mark Mahacek January 18, 2023 at 11:45 PM

The commit for this fix does appear to be in the 2022.1.31, 2021.1.23, and 2022.1.12 tagged branches. It just didn’t make it downstream to 31.0.3 in time.

Mark Mahacek January 18, 2023 at 10:17 PM

Might be worth a special note in the next release notes to indicate this issue was improperly tagged for the previous version.

Jeff Gehlbach January 18, 2023 at 10:11 PM

Yes, that’s what happened. Moving and re-resolving.

Fixed

Details

Assignee

Reporter

HB Grooming Date

HB Backlog Status

FD#

Components

Affects versions

Priority

PagerDuty

Created December 8, 2022 at 3:15 PM
Updated April 20, 2023 at 3:50 PM
Resolved January 18, 2023 at 10:12 PM