Uploaded image for project: 'CernVM'
  1. CernVM
  2. CVM-1465

cvmfs_config killall did not clean up

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: CernVM-FS 2.5
    • Component/s: CVMFS
    • Labels:
      None
    • Platforms:
      x86_64-slc6-gcc48-opt
    • Development:

      Description

      FNAL upgraded a lot of their worker nodes to cvmfs-2.4.4. and 45 of them hung for several hours in the cvmfs_config reload. The admin then ran cvmfs_config killall on those nodes and it got worse, with a number of repositories hung with 'Transport endpoint is not connected' and a mount of the config repository hung. Then they called me in to investigate. On one machine I got the mount to proceed with rmdir /var/run/cvmfs/cvmfs.pause. I could do umount on two of the respositories but two others said that the mountpoint was busy. I couldn't do fuser on the mountpoint because it was stale, and had to use umount -l to unmount.

      So there's two questions: why did the original reload hang, and why did the killall not clean it up? I think the latter is because of one of the repositories that can't be unmounted; killall does not proceed to clean /var/run/cvmfs if an unmount fails. Maybe it's just a matter of using umount -l. For the former, about 10% of the nodes did not get upgraded (I don't know why) so we should be able to catch it tomorrow before they attempt a killall so maybe I can tell something from examining a machine at that point.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jblomer Jakob Blomer
                Reporter:
                dwd Dave Dykstra
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  PlannedEnd:
                  PlannedStart: