Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-14331

Grafana Panel Internal Server Error when lasteventid is Null for an Alarm when Using HELM

    XMLWordPrintable

Details

    • 3
    • Horizon 22 - Jun 23 - Jul 7
    • Backlog
    • 1143

    Description

      One of the customer reported the below issue on their H29.0.9 Setup

      Setup
      Grafana - v7.5.5
      ONMS - 29.0.9
      Helm - 7.1.0

      Issue:
      Client is getting 500 Internal Server Error on their Grafana (using HELM Plugin for Alarms/Situations) for certain time-frames.
      Upon checking and narrowing down the time frame , it was noticed that this was happening when alarms had "null" lasteventid.
      Also noticed that the /opennms/rest/alarms endpoint works fine when an alarms lasteventid is null but /opennms/api/v2/alarms endpoint does not work when lasteventid is null, which is why Grafana Errors out with Internal Server Error{}

      I've tested and validated the same by updating one of the situation/alarm and setting its lasteventid to null in alarms table.

      Original Situation

      Updated the lasteventid

      api/v2/alarms Erroring out

      opennms/rest/alarms endpoint works just fine during this issue

      Example of lasteventid missing from customer..

      opennms=> select alarmid from alarms where lasteventid is null;
       alarmid
      ----------
       36424320
      
      
      
      <alarm type="1" count="88" id="36424320" severity="MAJOR">
      <description>
      <p>The BGPBackwardTransition Event is generated when the BGP FSM moves from a higher numbered state to a lower numbered state.</p><table> <tr><td><b> bgpPeerRemoteAddr</b></td><td> *.*.*.*;</td><td><p></p></td></tr> <tr><td><b> bgpPeerLastError</b></td><td> 0x0000;</td><td><p></p></td></tr> <tr><td><b> bgpPeerState</b></td><td> 1;</td><td><p> idle(1) connect(2) active(3) opensent(4) openconfirm(5) established(6) </p></td></tr></table>
      </description>
      <firstEventTime>2022-05-20T23:22:26.651-05:00</firstEventTime>
      <ipAddress>*.*.*.*</ipAddress>
      <lastEventTime>2022-05-23T10:15:10.660-05:00</lastEventTime>
      <logMessage>
      <p> bgpBackwardTransition trap received bgpPeerRemoteAddr=*.*.*.* bgpPeerLastError=0x0000 bgpPeerState=1</p>
      </logMessage>
      <managedObjectInstance>413</managedObjectInstance>
      <managedObjectType>node</managedObjectType>
      <nodeId>413</nodeId>
      <nodeLabel>redacted-name</nodeLabel>
      <reductionKey>
      uei.opennms.org/standard/rfc1269/traps/bgpBackwardTransition::413:*.*.*.*
      </reductionKey>
      <suppressedTime>2022-05-20T23:22:26.651-05:00</suppressedTime>
      <suppressedUntil>2022-05-20T23:22:26.651-05:00</suppressedUntil>
      <uei>
      uei.opennms.org/standard/rfc1269/traps/bgpBackwardTransition
      </uei>
      <x733ProbableCause>0</x733ProbableCause>
      </alarm> 

      This also seems to be an issue with Alarmd, because the "lasteventid" is populated as null which should not be the case, since every alarm will have a lasteventid associated..

      Not sure why do we have lasteventid/firsteventtime not having any not null clause, but this also could be contributing to this issue.

      opennms=> \d alarms;
                                     Table "public.alarms"
              Column         |           Type           | Collation | Nullable | Default
      -----------------------+--------------------------+-----------+----------+---------
       alarmid               | integer                  |           | not null |
       eventuei              | character varying(256)   |           | not null |
       nodeid                | integer                  |           |          |
       ipaddr                | text                     |           |          |
       serviceid             | integer                  |           |          |
       reductionkey          | text                     |           |          |
       alarmtype             | integer                  |           |          |
       counter               | integer                  |           | not null |
       severity              | integer                  |           | not null |
      
       lasteventid           | integer                  |           |          |
       firsteventtime        | timestamp with time zone |           |          |
       lasteventtime         | timestamp with time zone |           |          |
      

      Attachments

        Activity

          People

            aramos-vizcarra Alberto
            Sriraag Sridhar Sriraag Sridhar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: