Uploaded image for project: 'Architecture for Learning Enabled Correlation (ALEC)'
  1. Architecture for Learning Enabled Correlation (ALEC)
  2. ALEC-74

Kafka streams dies and does not recover from error

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Labels:
      None

      Description

      Found the following errors in the logs of a deployment:

      2019-07-03T06:03:50,761 | INFO  | alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26 | StreamThread                     | 123 - org.apache.servicemix.bundles.kafka-clients - 2.0.0.1 | stream-thread [alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26] State transition from PENDING_SHUTDOWN to DEAD
      2019-07-03T06:03:50,762 | INFO  | alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26 | KafkaStreams                     | 123 - org.apache.servicemix.bundles.kafka-clients - 2.0.0.1 | stream-client [alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a] State transition from REBALANCING to ERROR
      2019-07-03T06:03:50,762 | WARN  | alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26 | KafkaStreams                     | 123 - org.apache.servicemix.bundles.kafka-clients - 2.0.0.1 | stream-client [alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a] All stream threads have died. The instance will be in error state and should be closed.
      2019-07-03T06:03:50,762 | INFO  | alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26 | StreamThread                     | 123 - org.apache.servicemix.bundles.kafka-clients - 2.0.0.1 | stream-thread [alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26] Shutdown complete
      2019-07-03T06:03:50,763 | ERROR | alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26 | OpennmsDatasource                | 132 - org.opennms.alec.datasource.opennms-kafka - 1.0.2.SNAPSHOT | Stream error on thread: alec-datasource-ce03aa38-fa0a-4289-a671-acc28c26da5a-StreamThread-26
      org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_10, processor=KSTREAM-SOURCE-0000000000, topic=alarms, partition=10, offset=2056351
              at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:304) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:957) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
      Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_10] Abort sending since an error caught with a previous record (key alarm:uei.opennms.org/vendor/cisco/syslog/ifDown:00000000-0000-0000-0000-000000000000:3571:Ethernet106/1/8 value null timestamp 1562151138432) to topic alec-inventory due to org.apache.kafka.common.errors.TimeoutException: Expiring 19 record(s) for alec-inventory-0: 36128 ms has passed since batch creation plus linger time
      You can increase producer parameter `retries` and `retry.backoff.ms` to avoid this error.
              at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:130) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:50) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:189) ~[124:org.apache.servicemix.bundles.kafka-streams:2.0.0.1]
              at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1235) ~[123:org.apache.servicemix.bundles.kafka-clients:2.0.0.1]
              at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204) ~[123:org.apache.servicemix.bundles.kafka-clients:2.0.0.1]
              at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187) ~[123:org.apache.servicemix.bundles.kafka-clients:2.0.0.1]
              at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:635) ~[123:org.apache.servicemix.bundles.kafka-clients:2.0.0.1]
              at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:290) ~[123:org.apache.servicemix.bundles.kafka-clients:2.0.0.1]
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238) ~[123:org.apache.servicemix.bundles.kafka-clients:2.0.0.1]
              at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163) ~[123:org.apache.servicemix.bundles.kafka-clients:2.0.0.1]
              at java.lang.Thread.run(Thread.java:745) [?:?]
      Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 19 record(s) for alec-inventory-0: 36128 ms has passed since batch creation plus linger time
      

      Restarting the datasource and driver bundles allowed it to recover.

        Attachments

          Activity

            People

            Assignee:
            patrick.schweizer Patrick Schweizer
            Reporter:
            j-white Jesse White
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: