Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-13539

Minion stops processing flows with "Invalid packet: null" until restart

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Minor
    • Resolution: Fixed
    • 28.0.2
    • 29.0.0
    • None
    • Security Level: Default (Default Security Scheme)
    • 28.0.2 + 27b561bf527bcff69155551503d5058c238fc52 cherry-picked
    • 5
    • Horizon 2021 - Sep 1 - 15, Horizon 2021 - Oct 13 - 27
    • Backlog
    • 741
    • Hide

      Minion will process flows without throwing exceptions.

      Show
      Minion will process flows without throwing exceptions.

    Description

      Some of our minions will process flows fine for a few hours or days, then suddenly stop.
      When I tail their logs, there are many messages saying "Invalid packet: null"

       
      Given this bundle list:

      354 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: API
      355 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Common
      356 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Config :: API
      357 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Config :: JAXB
      358 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Distributed :: Common
      359 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Distributed :: Minion
      360 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Listeners
      361 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Protocols :: BMP :: Parser
      362 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Protocols :: BMP :: Transport
      363 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Protocols :: Common
      364 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Protocols :: Netflow :: Parser
      365 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Protocols :: Netflow :: Transport
      366 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Protocols :: SFlow :: Parser
      367 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Registry
      368 x Active x 80 x 28.0.2 x OpenNMS :: Features :: Telemetry :: Shell
      

      Restarting bundles 358, 360, 364, and 365 has no effect on this issue.
      Restarting bundle 359 (OpenNMS :: Features :: Telemetry :: Distributed :: Minion) does allow flow processing to resume.

       
      All of the minions were restarted on Friday (8/20).
      This morning (8/23), there were 168 in this state.
       
      In addition, this code leads me to believe that at DEBUG log level, the entire exception should be written to the log, but this does not appear to be the case.
       
      Attached are debug logs, packet captures, thread dumps.
       

      Attachments

        Issue Links

          Activity

            People

              cpape Christian Pape
              wkeaney Will Keaney
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: