Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-13859

Very large node caches can cause telemetry adapters to fail on Sentinel

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Medium
    • Resolution: Fixed
    • 28.0.0, 29.0.4
    • 29.0.5
    • Sentinel, Telemetry
    • Security Level: Default (Default Security Scheme)
    • 29.0.4
    • Horizon 22 - Jan 5 - Jan 19
    • 868
    • Hide

      Large node caches no longer block startup

      Show
      Large node caches no longer block startup

    Description

      dataSourceSync: initialized list of managed IP addresses with 334146 members

      While initializing this cache, this code block takes roughly two minutes, and the ElasticFlowRepository is taking approximately three minutes for the bundles to fully start. This is long enough that bundles which depend on these will time out:

       2021-12-14T10:13:57,739 | ERROR | CM Configuration Updater (ManagedServiceFactory Update: factoryPid=[org.opennms.features.telemetry.adapters]) | AdapterManager | 260 - org.opennms.features.telemetry.distributed.sentinel - 28.0.0 | Failed to create class org.opennms.netmgt.telemetry.daemon.TelemetryMessageConsumer
       java.lang.Exception: No adapter found for class: org.opennms.netmgt.telemetry.protocols.sflow.adapter.SFlowAdapter
       at org.opennms.netmgt.telemetry.daemon.TelemetryMessageConsumer.init(TelemetryMessageConsumer.java:97) ~[!/:?]
       at org.opennms.netmgt.telemetry.distributed.sentinel.AdapterManager.updated(AdapterManager.java:118) [!/:?]
       at org.apache.felix.cm.impl.helper.ManagedServiceFactoryTracker.updated(ManagedServiceFactoryTracker.java:159) [!/:?]
       at org.apache.felix.cm.impl.helper.ManagedServiceFactoryTracker.provideConfiguration(ManagedServiceFactoryTracker.java:93) [!/:?]
       at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceFactoryUpdate.provide(ConfigurationManager.java:1264) [!/:?]
       at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceFactoryUpdate.run(ConfigurationManager.java:1208) [!/:?]
       at org.apache.felix.cm.impl.UpdateThread.run0(UpdateThread.java:122) [!/:?]
       at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:84) [!/:?]
       at java.lang.Thread.run(Thread.java:829) [?:?]
       2021-12-14T10:13:57,745 | INFO | CM Configuration Updater (ManagedServiceFactory Update: factoryPid=[org.opennms.features.telemetry.adapters]) | AdapterManager | 260 - org.opennms.features.telemetry.distributed.sentinel - 28.0.0 | Creating new consumer for pid: org.opennms.features.telemetry.adapters.7f39bb96-c641-45ab-a0b7-fcdfd88836b7
       2021-12-14T10:14:09,002 | ERROR | Blueprint Extender: 1 | BlueprintContainerImpl | 19 - org.apache.aries.blueprint.core - 1.10.3 | Unable to start container for blueprint bundle org.opennms.features.telemetry.protocols.netflow.adapter/28.0.0 due to unresolved dependencies [(objectClass=org.opennms.netmgt.flows.api.FlowRepository)]
       java.util.concurrent.TimeoutException: null
       at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:393) [!/:1.10.3]
       at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:45) [!/:1.10.3]
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
       at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
       at java.lang.Thread.run(Thread.java:829) [?:?]

       

      The current workaround for this issue is to restart the bundles:

       

      265 │ Failure │ 80 │ 28.0.0 │ OpenNMS :: Features :: Telemetry :: Protocols :: Netflow :: Adapter
      266 │ Active │ 80 │ 28.0.0 │ OpenNMS :: Features :: Telemetry :: Protocols :: Netflow :: Transport
      267 │ Failure │ 80 │ 28.0.0 │ OpenNMS :: Features :: Telemetry :: Protocols :: SFlow :: Adapter

      While moving the telemetry configs away and back again.

      Attachments

        Activity

          People

            cgorantla Chandra Gorantla
            wkeaney Will Keaney
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: