Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-12274

Improve robustness of CassandraBlobStore for async operations

    XMLWordPrintable

    Details

    • Type: Enhancement
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Meridian-2019.1.0, 25.1.0
    • Component/s: None
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Sprint:
      Horizon 2019 - September 4th, Horizon 2019 - September 25th, Horizon 2019 - October 2nd

      Description

      Currently the CassandraBlobStore can overwhelm the Cassandra cluster with async requests if too many requests are in flight relative to how many async connections the cluster is allowing (looks like 250 for a single node cluster by default).

      When this happens the operation will throw an exception when the result future is inspected indicating the operation was not processed.

      To avoid this happening we could add logic to the CassandraBlobStore to only allow a certain amount of requests to be in flight at once.

      The main situation that this will be problematic in is managing thresholding states. Specifically when clearing all thresholding states since we attempt an async delete on each of the states (and there may be many thousands).

      We should probably use a global gate of some sort (resilience4j?) that only allows X number of in flight async requests at once regardless of operation to ensure we never overwhelm Cassandra connection pool (this number should be configurable).

      The problem can be reproduced using the benchmark command with an appropriately large number of async requests:

      opennms-kv-blob:benchmark -a 1024 10000
      

        Attachments

          Activity

            People

            • Assignee:
              mbrooks Matthew Brooks
              Reporter:
              mbrooks Matthew Brooks
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: