Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-12274

Improve robustness of CassandraBlobStore for async operations



    • Enhancement
    • Status: Resolved (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • Meridian-2019.1.0, 25.1.0
    • None
    • Security Level: Default (Default Security Scheme)
    • None
    • Horizon 2019 - September 4th, Horizon 2019 - September 25th, Horizon 2019 - October 2nd


      Currently the CassandraBlobStore can overwhelm the Cassandra cluster with async requests if too many requests are in flight relative to how many async connections the cluster is allowing (looks like 250 for a single node cluster by default).

      When this happens the operation will throw an exception when the result future is inspected indicating the operation was not processed.

      To avoid this happening we could add logic to the CassandraBlobStore to only allow a certain amount of requests to be in flight at once.

      The main situation that this will be problematic in is managing thresholding states. Specifically when clearing all thresholding states since we attempt an async delete on each of the states (and there may be many thousands).

      We should probably use a global gate of some sort (resilience4j?) that only allows X number of in flight async requests at once regardless of operation to ensure we never overwhelm Cassandra connection pool (this number should be configurable).

      The problem can be reproduced using the benchmark command with an appropriately large number of async requests:

      opennms-kv-blob:benchmark -a 1024 10000




            mbrooks Matthew Brooks (Inactive)
            mbrooks Matthew Brooks (Inactive)
            0 Vote for this issue
            1 Start watching this issue



              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.