Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-4846

Provisiond leaks file handles, eventually causing "Too many open files" crashes

    XMLWordPrintable

    Details

      Description

      Attached patch attempts to solve this issue by having only a single NioSocketConnector per timeout value and disposing them as soon as possible. I've had it soak testing for a week on our problematic test server and no crashes, so I believe it is at least making a difference in file handle consumption.

      Original issue as copy/pasted from opennms-devel post:
      I've been doing a lot of digging around various 'Too many open files' crashes we've been seeing locally, and I think I've pinned down a big leak of file descriptors in provisiond's use of org.apache.mina connectors.

      What it's currently doing in AsyncBasicDetector#isServiceDetected:

      • For each service, create a new NioSocketConnector
      • Configure that connector with a handler, filters etc
      • Make a connection out, check for results etc

      There seem to be two problems with this approach:

      1) Constructing an NioSocketConnector creates a lot of 'anon_inode' and 'pipe' file descriptors - on one machine it was 8 & 12 respectiovely and on another 4/8, so I'm not sure quite what the difference is there (under linux, at least; I assume some equivalent under Windows). The actual connect() call only uses one more handle. This causes it to run out of descriptors a lot faster than expected.

      2) If new NioSocketConnector() crashes due to a "Too many open files" exception, Mina sometimes just sort of falls over dead with "NoClassDefFoundError: Could not initialize class sun.nio.ch.FileDispatcher". This class does exist in my JVM (openjdk 6) and if I reflectively inspect it first, it sometimes stops the crashes happening. I'm pretty baffled there, to be honest. If it does get itself into this state, you can't close existing sockets, you can't open new ones; all the anon_inode and pipe FDs just sit there. This seems to tally with behaviour we've witnessed in opennms instances where we've had a Too many open files crash - lsof shows a few thousand pipe/anon_inode handles just sitting around long after the crash.

      What I think Mina wants you to be doing is creating a single NioSocketConnector to reuse everywhere and using the optional IoSessionInitializer in .connect() to configure filters and attach state objects to the IoSession. This would take a moderate overhaul of AsyncBasicDetector, as the handler would need to be rewritten to be a singleton that takes some state using IoSession.get/setAttribute rather than having one handler per service detect attempt and probably a fair chunk of refactoring at the same time.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                seth Seth Leger
                Reporter:
                duncanm Duncan Mackintosh
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: