Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-12412

Alarmd fails intermittently and OOMs

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 25.0.0, Meridian-2019.1.0, 25.1.0
    • Fix Version/s: Meridian-2019.1.1, 25.1.1
    • Component/s: Alarms
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Sprint:
      Horizon 2019 - November 20th, Horizon 2019 - November 27th

      Description

      While investigating reports of memory leaks and odd behavior with alarms, we found several problems with the current implementation of Drools in alarmd:

      1. Intermittent errors at run-time and tests flap due to issues with transaction management
      2. Fire thread does not properly recover from exceptions (and OOMs may happen subsequently due to alarms being queued, but never un-queued)

       

      #1 has been a problem since the re-write, but we haven't been able to track down until now.

      #2 is a regression from NMS-12322, which aimed to fix a memory leak, but inadvertently introduced another one.

       

        Attachments

          Activity

            People

            • Assignee:
              j-white Jesse White
              Reporter:
              j-white Jesse White
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: