Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-10027

The JMX-Cassandra service goes down for all the cluster when a single instance is down.

    XMLWordPrintable

Details

    Description

      On my lab (tested on latest develop and Meridian 2017), I found that when you monitor every single Cassandra instance of your current cluster, when one instance goes down, OpenNMS generates nodeLostService events for the JMX-Cassandra service for every single cluster member, not just the one that actually went down.

       

      Here is how that service is defined:

      <service name="JMX-Cassandra" interval="300000" user-defined="false" status="on">
        <parameter key="port" value="7199"/>
        <parameter key="retry" value="2"/>
        <parameter key="timeout" value="3000"/>
        <parameter key="protocol" value="rmi"/>
        <parameter key="urlPath" value="/jmxrmi"/>
        <parameter key="rrd-base-name" value="jmx-cassandra"/>
        <parameter key="ds-name" value="jmx-cassandra"/>
        <parameter key="thresholding-enabled" value="true"/>
        <parameter key="factory" value="PASSWORD-CLEAR"/>
        <parameter key="username" value="cassandra"/>
        <parameter key="password" value="cassandra"/>
        <parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"/>
        <parameter key="beans.storage" value="org.apache.cassandra.db:type=StorageService"/>
        <parameter key="tests.operational" value="storage.OperationMode == 'NORMAL'"/>
        <parameter key="tests.joined" value="storage.Joined"/>
        <parameter key="tests.unreachables" value="empty(storage.UnreachableNodes)"/>
      </service>

      The last entry is the problem.

      If I remove it from the configuration, now the service behaves as expected. It goes down only for the instance that is not working.

      That means, the following line should not be part of the default configuration:

      <parameter key="tests.unreachables" value="empty(storage.UnreachableNodes)"/>

      Attachments

        Activity

          People

            agalue Alejandro Galue
            agalue Alejandro Galue
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.