Details
-
Improvement
-
Status: Closed
-
High
-
Resolution: Fixed
-
None
-
None
-
x86_64-slc6-gcc48-opt
-
Description
Fermilab has been running cvmfs-2.3.2 for a little over a month on their SL6 worker nodes. Today on a few nodes it was discovered that some repositories are getting i/o errors saying there was a malformed URL. Doing cvmfs_talk host info at that time shows zero hosts. I went through /var/log/messages and extracted example messages that I thought were relevant:
Nov 27 16:52:26 fnpc4623 mount.cvmfs: external location for configuration files does not exist: /cvmfs/config-osg.opensciencegrid.org/etc/cvmfs
|
... that message repeated numerous times ....
|
Nov 27 16:52:27 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) another process holds lock.mu2e.opensciencegrid.org, waiting.
|
Nov 27 16:52:40 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) lock lock.mu2e.opensciencegrid.org acquired
|
Nov 27 16:52:41 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) GeoAPI request /api/v1.0/geo/@proxy@/,cmsextproxy.cern.ch,cmsextproxy.fnal.gov failed with error 2 [malformed URL]
|
Nov 27 16:52:41 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to retrieve geographic order from stratum 1 servers
|
Nov 27 16:52:41 fnpc4623 cvmfs2: (xenon.opensciencegrid.org) CernVM-FS: unmounted /cvmfs/xenon.opensciencegrid.org (xenon.opensciencegrid.org)
|
Nov 27 16:52:41 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to download repository manifest (2 - malformed URL)
|
Nov 27 16:52:42 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) CernVM-FS: linking /cvmfs/mu2e.opensciencegrid.org to repository mu2e.opensciencegrid.org
|
Nov 27 16:52:46 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) CernVM-FS: unmounted /cvmfs/mu2e.opensciencegrid.org (mu2e.opensciencegrid.org)
|
Nov 27 16:55:42 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to download repository manifest (2 - malformed URL)
|
|
... repeated every 5 minutes or so for 2 days until ...
|
|
Nov 29 10:07:03 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to fetch /artexternals/g4neutron/v4_5/NULL/G4NDL4.5/Capture/CrossSection/38_84_Strontium.z (hash: 6d0b98139a39338ade4fae31c228dc6467c459bd, error 2 [malformed URL])
|
Nov 29 10:07:03 fnpc4623 cvmfs2: (mu2e.opensciencegrid.org) failed to open inode: 305524, CAS key 6d0b98139a39338ade4fae31c228dc6467c459bd, error code 2
|
|
... then those are repeated very frequently until cleaned up by cvmfs_config umount...
|
There was no indication of any activity with config-osg.opensciencegrid.org for quite some time before these messages. The last prior message was that it was being linked (that is mounted), not unmounted.
One of the nodes I looked at was in this condition for 3 weeks.
I don't expect this to be enough information to figure out what the root cause is, but it is a placeholder. The mailformed URL on the GeoAPI lookup is I think a secondary bug. I'm hoping to try to catch a node in the condition of "external location for configuration files does not exist", I think that will be helpful. I tried to force the condition by killing cvmfs-config but didn't have any luck.