Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-12767

Race condition when enabling the Situations Feedback feature

    XMLWordPrintable

    Details

    • Sprint:
      Horizon 2021 - Apr 14 - Apr 28, Horizon 2021 - Apr 28 - May 12
    • HB Backlog Status:
      Backlog

      Description

      OpenNMS has multiple optional features installed via Karaf that require configuration to work.

      Usually, and because I always use automation tools to configure the solutions for customers and when building test environments, it is an unattended script responsible for configuring everything.

      All the features I tried in the past can be configured this way, and never had issues with them.

      Unfortunately, that is not the case with Situations Feedback. After a clean installation (i.e., nothing inside /opt/opennms/data), the feature tries to start but it doesn't seem to work, and I can see the following on the Karaf shell:

      admin@opennms> health-check
      Verifying the health of the container
      
      Verifying installed bundles                                       [ Success  ]
      ALEC :: Driver                                                    [ Success  ] => Tick duration (99 percentile): 22 ms
      Connecting to ElasticSearch ReST API (Flows)                      [ Success  ] => Not configured
      Number of active alarms stored in Elasticsearch (Alarm History)   [ Success  ] => Found 3 alarms.
      Connecting to ElasticSearch ReST API (Situation Feedback)         [ Timeout  ] => Health Check did not finish within 5000 ms
      

      This Situations Feedback plugin is configured very similarly to the Elasticsearch forwarders for Events and Alarm History. These two always work when setting them automatically (never had issues with them or any other feature), but I haven't found a way to start the Situations Feedback in a similar way that always works.

      The following is the easiest fix I found that always works for me:

      #!/bin/bash
      ssh -p 8101 admin@localhost "\
      config:edit org.opennms.features.situation-feedback.persistence.elastic;
      config:property-set elasticUrl "https://elastic:9200";
      config:property-set globalElasticUser "elastic"
      config:property-set globalElasticPassword "0p3nNM5"
      config:property-set indexPrefix "dc1-"
      config:property-set elasticIndexStrategy "daily";
      config:property-set connTimeout 30000;
      config:property-set readTimeout 300000;
      config:update;
      config:list '(service.pid=org.opennms.features.situation-feedback.persistence.elastic)'
      "
      

      The above script reconfigures the feature. Internally, that triggers the reload of all the dependent bundles, and after that, the plugin works as intended. Most of the time, after doing this, the feature survives an OpenNMS restart. But, if the user removes the content of the data directory prior to starting OpenNMS (like on an upgrade), the above script must be executed one more time to fix the problem.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              swachter Stefan Wachter
              Reporter:
              agalue Alejandro Galue
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                HB Grooming Date:

                  Git Integration