Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-10697

Elasticsearch forwarding fails to recover after outage

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 24.0.0
    • Fix Version/s: Meridian-2019.1.0, 25.1.0
    • Component/s: None
    • Security Level: Default (Default Security Scheme)
    • Labels:
      None
    • Sprint:
      Horizon 2019 - September 18th, Horizon 2019 - September 25th, Horizon 2019 - October 2nd

      Description

      The Elastiscearch service on my system died to an OOM. OpenNMS was configured to forward flows, alarms and events to ES.

      After restarting Elasticsearch, the forwarders failed to recover and I found the following exceptions in the logs:

      2019-05-14T03:17:17,388 ERROR org.opennms.features.jest.client:25.0.0.SNAPSHOT(248) [ElasticAlarmIndexer] org.opennms.p[5/1941]
      lasticsearch.rest.template.DefaultTemplateInitializer: An error occurred while initializing template alarms: Connection pool sh
      ut down.                                                                                                                      
      java.lang.IllegalStateException: Connection pool shut down
              at org.apache.http.util.Asserts.check(Asserts.java:34) ~[?:?]                                                         
              at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:184) ~[?:?]                                      
              at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.ja
      va:251) ~[?:?]
              at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:175) ~[?:?]
              at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[?:?]
              at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[?:?]
              at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[?:?]
              at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[?:?]
              at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[?:?]
              at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) ~[?:?]
              at io.searchbox.client.http.JestHttpClient.executeRequest(JestHttpClient.java:133) ~[?:?]
              at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:67) ~[?:?]
              at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:60) ~[?:?]
              at org.opennms.plugins.elasticsearch.rest.executors.DefaultRequestExecutor.execute(DefaultRequestExecutor.java:60) ~[24
      8:org.opennms.features.jest.client:25.0.0.SNAPSHOT]
              at org.opennms.plugins.elasticsearch.rest.OnmsJestClient.execute(OnmsJestClient.java:55) ~[256:org.opennms.features.ope
      nnms-es-rest:25.0.0.SNAPSHOT]
              at org.opennms.plugins.elasticsearch.rest.template.DefaultTemplateInitializer.getServerVersion(DefaultTemplateInitializ
      er.java:125) ~[248:org.opennms.features.jest.client:25.0.0.SNAPSHOT]
              at org.opennms.plugins.elasticsearch.rest.template.DefaultTemplateInitializer.doInitialize(DefaultTemplateInitializer.j
      ava:110) ~[248:org.opennms.features.jest.client:25.0.0.SNAPSHOT]
              at org.opennms.plugins.elasticsearch.rest.template.DefaultTemplateInitializer.initialize(DefaultTemplateInitializer.jav
      a:81) [248:org.opennms.features.jest.client:25.0.0.SNAPSHOT]
              at org.opennms.features.alarms.history.elastic.ElasticAlarmIndexer.run(ElasticAlarmIndexer.java:201) [218:org.opennms.f
      eatures.alarms.history.elastic:25.0.0.SNAPSHOT]
      

      Other threads were also stuck waiting for initialization:

      "Camel (sinkServer) thread #111 - JmsConsumer[OpenNMS.Sink.Telemetry-Netflow-5]" #1540 daemon prio=5 os_prio=0 cpu=20919.48ms elapsed=97004.29s tid=0x00007f032c1af000 nid=0x3803 waiting on condition  [0x00007f023f05e000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
              at java.lang.Thread.sleep(java.base@11.0.3/Native Method)
              at org.opennms.plugins.elasticsearch.rest.template.DefaultTemplateInitializer.waitBeforeRetrying(DefaultTemplateInitializer.java:102)
              at org.opennms.plugins.elasticsearch.rest.template.DefaultTemplateInitializer.initialize(DefaultTemplateInitializer.java:87)
              - locked <0x0000000609ee8d70> (a org.opennms.netmgt.flows.elastic.ElasticFlowRepositoryInitializer)
              at org.opennms.netmgt.flows.elastic.InitializingFlowRepository.ensureInitialized(InitializingFlowRepository.java:106)
      

        Attachments

          Activity

            People

            • Assignee:
              mvr Markus von RĂ¼den
              Reporter:
              j-white Jesse White
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: