Uploaded image for project: 'OpenNMS'
  1. OpenNMS
  2. NMS-7516

XML Collector is not working as expected for node-level resources

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 15.0.1, Meridian-2015.1.0, 16.0.0
    • Fix Version/s: 16.0.1, Meridian-2015.1.1, 17.0.0
    • Component/s: Data Collection - XML
    • Security Level: Default (Default Security Scheme)
    • Labels:
    • Environment:
      Linux Redhat, Oracle Java version 1.7.0_72
    • Sprint:
      Finalize 16.0.1

      Description

      XML Datacollection is messing up RRD Storage Directory.
      Side effect is that data is stored in wrong directory and it causes RRD concurrent write access.
      Seen exception in log is

      org.opennms.netmgt.collection.api.CollectionException: An undeclared throwable was caught during data collection for interface 27/10.200.19.12/CiscoPorts XML-Collector
      	at org.opennms.netmgt.collectd.CollectableService.doCollection(CollectableService.java:421) ~[opennms-services-15.0.1.jar:?]
      	at org.opennms.netmgt.collectd.CollectableService.doRun(CollectableService.java:322) [opennms-services-15.0.1.jar:?]
      	at org.opennms.netmgt.collectd.CollectableService.access$000(CollectableService.java:70) [opennms-services-15.0.1.jar:?]
      	at org.opennms.netmgt.collectd.CollectableService$1.run(CollectableService.java:300) [opennms-services-15.0.1.jar:?]
      	at org.opennms.core.logging.Logging.withPrefix(Logging.java:66) [org.opennms.core.logging-15.0.1.jar:?]
      	at org.opennms.netmgt.collectd.CollectableService.run(CollectableService.java:296) [opennms-services-15.0.1.jar:?]
      	at org.opennms.netmgt.scheduler.LegacyScheduler$1.run(LegacyScheduler.java:209) [opennms-services-15.0.1.jar:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_72]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_72]
      	at org.opennms.core.concurrent.LogPreservingThreadFactory$3.run(LogPreservingThreadFactory.java:124) [opennms-util-15.0.1.jar:?]
      	at java.lang.Thread.run(Thread.java:745) [?:1.7.0_72]
      Caused by: java.util.ConcurrentModificationException
      	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) ~[?:1.7.0_72]
      	at java.util.HashMap$KeyIterator.next(HashMap.java:956) ~[?:1.7.0_72]
      	at org.opennms.netmgt.collection.api.AttributeGroup.visit(AttributeGroup.java:107) ~[org.opennms.features.collection.api-15.0.1.jar:?]
      	at org.opennms.netmgt.collection.support.AbstractCollectionResource.visit(AbstractCollectionResource.java:117) ~[org.opennms.features.collection.api-15.0.1.jar:?]
      	at org.opennms.netmgt.collection.support.MultiResourceCollectionSet.visit(MultiResourceCollectionSet.java:69) ~[org.opennms.features.collection.api-15.0.1.jar:?]
      	at org.opennms.netmgt.collectd.CollectableService.doCollection(CollectableService.java:394) ~[opennms-services-15.0.1.jar:?]
      	... 10 more
      

      Exception is associated to [Collectd-Thread-41-of-50] collecting one node for XML service called CiscoPorts-XML-Collection according to log

      2015-02-27 11:43:48,876 ERROR [Collectd-Thread-41-of-50] o.o.n.c.CollectableService: An undeclared throwable was caught during data collection for interface 27/10.200.19.12/CiscoPorts XML-Collector
      

      What I can see in log is that [Collectd-Thread-27-of-50] which is also collecting one node for same service has an incorrect node to RRD directory mapping according to logs

      2015-02-27 11:43:48,063 INFO  [Collectd-Thread-27-of-50] o.o.n.c.CollectableService: run: starting new collection for 12/10.200.39.12/CiscoPorts XML-Collector/CiscoPorts-XML-Collection
      2015-02-27 11:43:48,804 DEBUG [Collectd-Thread-27-of-50] o.o.n.c.DefaultCollectionAgent: getStorageDir: isStoreByForeignSource = false, foreignSource = null, foreignId = null, dir = 27
      

      I attach full collectd.log taken from opennms start, Hope it can help pointing the issue.

      Notes:

      • Exact same configuration running in opennms 1.12.9-2 was running perfectly well.
      • As far as I see only XML Datacollection is affected (No SNMP datacollection problem)

        Attachments

        1. collectd.1.log.zip
          7.51 MB
        2. collectd-02062015.zip
          91 kB
        3. NMS-7516-problem.log
          49 kB
        4. opennms.pgdump
          563 kB

          Activity

            People

            • Assignee:
              agalue Alejandro Galue
              Reporter:
              reseaux.pri Network Team
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: