Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-10027

The JMX-Cassandra service goes down for all the cluster when a single instance is down.

    XMLWordPrintable

    Details

      Description

      On my lab (tested on latest develop and Meridian 2017), I found that when you monitor every single Cassandra instance of your current cluster, when one instance goes down, OpenNMS generates nodeLostService events for the JMX-Cassandra service for every single cluster member, not just the one that actually went down.

       

      Here is how that service is defined:

      <service name="JMX-Cassandra" interval="300000" user-defined="false" status="on">
        <parameter key="port" value="7199"/>
        <parameter key="retry" value="2"/>
        <parameter key="timeout" value="3000"/>
        <parameter key="protocol" value="rmi"/>
        <parameter key="urlPath" value="/jmxrmi"/>
        <parameter key="rrd-base-name" value="jmx-cassandra"/>
        <parameter key="ds-name" value="jmx-cassandra"/>
        <parameter key="thresholding-enabled" value="true"/>
        <parameter key="factory" value="PASSWORD-CLEAR"/>
        <parameter key="username" value="cassandra"/>
        <parameter key="password" value="cassandra"/>
        <parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
        <parameter key="beans.storage" value="org.apache.cassandra.db:type=StorageService"/>
        <parameter key="tests.operational" value="storage.OperationMode == 'NORMAL'"/>
        <parameter key="tests.joined" value="storage.Joined"/>
        <parameter key="tests.unreachables" value="empty(storage.UnreachableNodes)"/>
      </service>

      The last entry is the problem.

      If I remove it from the configuration, now the service behaves as expected. It goes down only for the instance that is not working.

      That means, the following line should not be part of the default configuration:

      <parameter key="tests.unreachables" value="empty(storage.UnreachableNodes)"/>

        Attachments

          Activity

            People

            • Assignee:
              agalue Alejandro Galue
              Reporter:
              agalue Alejandro Galue
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: