Uploaded image for project: 'CernVM'
  1. CernVM
  2. CVM-1111

Improve failure detection/logging on empty CVMFS_SERVER_URL / unavailable config repo

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • High
    • Resolution: Fixed
    • None
    • CernVM-FS 2.3.3
    • CVMFS
    • None
    • x86_64-slc6-gcc48-opt

    Description

      Fermilab has been running cvmfs-2.3.2 for a little over a month on their SL6 worker nodes. Today on a few nodes it was discovered that some repositories are getting i/o errors saying there was a malformed URL. Doing cvmfs_talk host info at that time shows zero hosts. I went through /var/log/messages and extracted example messages that I thought were relevant:

      Nov 27 16:52:26 fnpc4623 mount.cvmfs: external location for configuration files does not exist: /cvmfs/config-osg.opensciencegrid.org/etc/cvmfs
      ... that message repeated numerous times ....
      Nov 27 16:52:27 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) another process holds lock.mu2e.opensciencegrid.org, waiting.
      Nov 27 16:52:40 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) lock lock.mu2e.opensciencegrid.org acquired
      Nov 27 16:52:41 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) GeoAPI request /api/v1.0/geo/@proxy@/,cmsextproxy.cern.ch,cmsextproxy.fnal.gov failed with error 2 [malformed URL]
      Nov 27 16:52:41 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to retrieve geographic order from stratum 1 servers
      Nov 27 16:52:41 fnpc4623 cvmfs2: (xenon.opensciencegrid.org) CernVM-FS: unmounted /cvmfs/xenon.opensciencegrid.org (xenon.opensciencegrid.org)
      Nov 27 16:52:41 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to download repository manifest (2 - malformed URL)
      Nov 27 16:52:42 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) CernVM-FS: linking /cvmfs/mu2e.opensciencegrid.org to repository mu2e.opensciencegrid.org
      Nov 27 16:52:46 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) CernVM-FS: unmounted /cvmfs/mu2e.opensciencegrid.org (mu2e.opensciencegrid.org)
      Nov 27 16:55:42 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to download repository manifest (2 - malformed URL)
       
      ... repeated every 5 minutes or so for 2 days until ...
       
      Nov 29 10:07:03 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to fetch /artexternals/g4neutron/v4_5/NULL/G4NDL4.5/Capture/CrossSection/38_84_Strontium.z (hash: 6d0b98139a39338ade4fae31c228dc6467c459bd, error 2 [malformed URL])
      Nov 29 10:07:03 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to open inode: 305524, CAS key 6d0b98139a39338ade4fae31c228dc6467c459bd, error code 2
       
      ... then those are repeated very frequently until cleaned up by cvmfs_config umount...

      There was no indication of any activity with config-osg.opensciencegrid.org for quite some time before these messages. The last prior message was that it was being linked (that is mounted), not unmounted.

      One of the nodes I looked at was in this condition for 3 weeks.

      I don't expect this to be enough information to figure out what the root cause is, but it is a placeholder. The mailformed URL on the GeoAPI lookup is I think a secondary bug. I'm hoping to try to catch a node in the condition of "external location for configuration files does not exist", I think that will be helpful. I tried to force the condition by killing cvmfs-config but didn't have any luck.

      Attachments

        Activity

          People

            dwd Dave Dykstra
            dwd Dave Dykstra
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:
              Actual End: