Uploaded image for project: 'CernVM'
  1. CernVM
  2. CVM-1957

GC hangs indefinitely if there is a network failure

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Medium
    • Resolution: Fixed
    • CernVM-FS 2.7.4
    • CernVM-FS 2.8
    • CVMFS
    • None
    • ANY

    Description

      We run GC regularly like this:
      /usr/bin/cvmfs_server gc -a -l -f

      It ran normally until:

      Preserving Revision 4264 (17 Nov 2020 11:03:17 / added @ 17 Nov 2020 11:03:17)
      ├─ c8d2d9b86403219ddd398a75de0a425ad223d01c-shake128 /
      failed to load catalog 98d744d30060a2a6bfe21f6ed2c85c1adc65171a-shake128C (3 - network failure)
      garbage collection failed
      Segmentation fault
      Fail (6)!
      umount: /cvmfs/soft.computecanada.ca: device is busy.
              (In some cases useful info about processes that use
               the device is found by lsof(8) or fuser(1))
      

      I am not sure why there was a network failure, maybe some momentary blip, but even if a network failure does occur, preferably CVMFS server would handle it and fail gracefully. Instead there was a seg fault and the GC processes hung indefinitely.
      There were also many abort processes piled up (from users trying to clear up the status of the repo)

      root      9692     1  0 05:51 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server abort -f soft.computecanada.ca
      root      9835  9692  0 05:51 ?        00:00:00 /bin/mount /var/spool/cvmfs/soft.computecanada.ca/rdonly
      root      9836  9835  0 05:51 ?        00:00:00 cvmfs2 soft.computecanada.ca /var/spool/cvmfs/soft.computecanada.ca/rdonly -o rw,nodev,allow_other,config=/etc/cvmfs/repositories.d/soft.computecanada.ca/client.conf:/var/spool/cvmfs/soft.computecanada.ca/client.local,cvmfs_suid,suid
      root     20371     1  0 Nov29 ?        00:00:00 cvmfs2 restricted.computecanada.ca /var/spool/cvmfs/restricted.computecanada.ca/rdonly -o rw,nodev,allow_other,config=/etc/cvmfs/repositories.d/restricted.computecanada.ca/client.conf:/var/spool/cvmfs/restricted.computecanada.ca/client.local,cvmfs_suid,suid
      root     20376     1  0 Nov29 ?        00:00:00 cvmfs2 restricted.computecanada.ca /var/spool/cvmfs/restricted.computecanada.ca/rdonly -o rw,nodev,allow_other,config=/etc/cvmfs/repositories.d/restricted.computecanada.ca/client.conf:/var/spool/cvmfs/restricted.computecanada.ca/client.local,cvmfs_suid,suid
      root     20967 20962  0 02:00 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server gc -a -l -f
      root     21128 20967  0 02:00 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server gc -a -l -f
      root     21131 21128  0 02:00 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server gc -a -l -f
      

      When I ran abort, I saw it hung on this log message:

      2020-11-30T10:51:17.588-08:00 local@harpsponge.comp.uvic.ca user.notice cvmfs2: (soft.computecanada.ca) another process holds ./lock_cachedb, wai
      ting.
      

      After killing the hung gc processes (and aborts) it recovered.

      Attachments

        1. s0.log
          24 kB
        2. s0 stats.pdf
          182 kB

        Issue Links

          Activity

            People

              jblomer Jakob Blomer
              rptaylor Ryan Taylor
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: