Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-12412

Alarmd fails intermittently and OOMs

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 25.0.0, Meridian-2019.1.0, 25.1.0
    • Meridian-2019.1.1, 25.1.1
    • Alarms
    • Security Level: Default (Default Security Scheme)
    • None
    • Horizon 2019 - November 20th, Horizon 2019 - November 27th

    Description

      While investigating reports of memory leaks and odd behavior with alarms, we found several problems with the current implementation of Drools in alarmd:

      1. Intermittent errors at run-time and tests flap due to issues with transaction management
      2. Fire thread does not properly recover from exceptions (and OOMs may happen subsequently due to alarms being queued, but never un-queued)

       

      #1 has been a problem since the re-write, but we haven't been able to track down until now.

      #2 is a regression from NMS-12322, which aimed to fix a memory leak, but inadvertently introduced another one.

       

      Attachments

        Activity

          People

            j-white Jesse White
            j-white Jesse White
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.