Uploaded image for project: 'Architecture for Learning Enabled Correlation (ALEC)'
  1. Architecture for Learning Enabled Correlation (ALEC)
  2. ALEC-88

Tick Errors in karaf.log that might be preventing ALEC to perform correlations.

    XMLWordPrintable

    Details

    • Sprint:
      Horizon 2020 - June 24, Horizon 2020 - July 8
    • Backlog Status:
      Backlog NG

      Description

      While helping a customer to deploy ALEC on two of their data centers, I found the following error in karaf.log:

      2020-07-06T13:44:15,687 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Graph does not contain source vertex CEVertex[id=4412, resourceKey=[snmp-interface, 884:32]]
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2207) ~[?:?]
              at com.google.common.cache.LocalCache.get(LocalCache.java:3953) ~[?:?]
              at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957) ~[?:?]
              at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine.getSpatialDistanceBetween(AbstractClusterEngine.java:771) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine.getDiagnosticTextForSituation(AbstractClusterEngine.java:556) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine.lambda$mapClusterToSituations$11(AbstractClusterEngine.java:514) ~[?:?]
              at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine.mapClusterToSituations(AbstractClusterEngine.java:513) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine.lambda$onTick$8(AbstractClusterEngine.java:374) ~[?:?]
              at org.opennms.alec.engine.cluster.GraphManager.withGraph(GraphManager.java:327) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine.onTick(AbstractClusterEngine.java:330) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine.tick(AbstractClusterEngine.java:158) ~[?:?]
              at org.opennms.alec.driver.main.Driver$2.run(Driver.java:199) [430:org.opennms.alec.driver.main:1.1.0.SNAPSHOT]
              at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
              at java.util.TimerThread.run(Timer.java:506) [?:?]
      Caused by: java.lang.RuntimeException: Graph does not contain source vertex CEVertex[id=4412, resourceKey=[snmp-interface, 884:32]]
              at org.opennms.alec.engine.cluster.DijkstraSolvableGraph.getDistance(DijkstraSolvableGraph.java:177) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine$2.load(AbstractClusterEngine.java:794) ~[?:?]
              at org.opennms.alec.engine.cluster.AbstractClusterEngine$2.load(AbstractClusterEngine.java:779) ~[?:?]
              at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542) ~[?:?]
              at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323) ~[?:?]
              at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286) ~[?:?]
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[?:?]
              ... 15 more
      

      On one of the DC, there are lots of consecutive errors which match the 30 seconds on which each Tick happens:

      15:29:27 # date
      Mon Jul  6 15:29:33 CDT 2020
      
      15:29:33 # grep "ERROR.*ALEC Driver" karaf* karaf.log | tail
      karaf.log:2020-07-06T15:24:45,430 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:25:15,436 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:25:45,415 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:26:15,439 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:26:45,536 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:27:15,498 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:27:45,489 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:28:15,550 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:28:45,549 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      karaf.log:2020-07-06T15:29:15,400 | ERROR | ALEC Driver Tick | Driver                           | 430 - org.opennms.alec.driver.main - 1.1.0.SNAPSHOT | Tick failed with exception.
      ```
      

      Not all the errors are associated with the same Vertex:

      16:03:28 # grep 'Graph does not contain source vertex' karaf.log | sed 's/.* CEVertex//' | wc -l
      2762
      
      16:03:34 # grep 'Graph does not contain source vertex' karaf.log | sed 's/.* CEVertex//' | sort -u | wc -l
      24
      
      16:03:40 # grep 'Graph does not contain source vertex' karaf.log | sed 's/.* CEVertex//' | sort -u
      [id=10221, resourceKey=[snmp-interface, 186:527040512]]
      [id=13073, resourceKey=[snmp-interface, 5:437796864]]
      [id=13126, resourceKey=[snmp-interface, 143:527040896]]
      [id=13324, resourceKey=[snmp-interface, 692:526910272]]
      [id=17399, resourceKey=[snmp-interface, 441:526845184]]
      [id=23284, resourceKey=[snmp-interface, 691:526910272]]
      [id=24424, resourceKey=[snmp-interface, 432:526648704]]
      [id=26055, resourceKey=[snmp-interface, 125:526844544]]
      [id=26793, resourceKey=[snmp-interface, 212:526909504]]
      [id=27329, resourceKey=[snmp-interface, 126:527238080]]
      [id=28780, resourceKey=[snmp-interface, 353:527368768]]
      [id=28972, resourceKey=[snmp-interface, 316:526647296]]
      [id=30033, resourceKey=[snmp-interface, 692:526977152]]
      [id=32093, resourceKey=[snmp-interface, 323:527761408]]
      [id=34417, resourceKey=[snmp-interface, 131:526647296]]
      [id=34537, resourceKey=[snmp-interface, 143:527106944]]
      [id=38085, resourceKey=[snmp-interface, 103:527171584]]
      [id=4412, resourceKey=[snmp-interface, 884:32]]
      [id=4918, resourceKey=[snmp-interface, 261:19]]
      [id=52695, resourceKey=[snmp-interface, 149:527172288]]
      [id=5315, resourceKey=[snmp-interface, 156:527106880]]
      [id=5472, resourceKey=[snmp-interface, 154:526909952]]
      [id=7631, resourceKey=[snmp-interface, 432:526912064]]
      [id=9140, resourceKey=[snmp-interface, 222:2922]]
      

      The other DC has the same behavior but not as exacerbated as the one from the above.

      This customer is running Horizon 26.1.1 with ALEC 1.1.0-SNAPSHOT (as unfortunately, 1.0.2 was unable to start on their environment for some reason).

        Attachments

          Activity

            People

            Assignee:
            mbrooks Matthew Brooks
            Reporter:
            agalue Alejandro Galue
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: