Uploaded image for project: 'CernVM'
  1. CernVM
  2. CVM-1957

GC hangs indefinitely if there is a network failure

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Medium
    • Resolution: Fixed
    • Affects Version/s: CernVM-FS 2.7.4
    • Fix Version/s: CernVM-FS 2.8
    • Component/s: CVMFS
    • Labels:
      None
    • Platforms:
      ANY
    • Development:

      Description

      We run GC regularly like this:
      /usr/bin/cvmfs_server gc -a -l -f

      It ran normally until:

      Preserving Revision 4264 (17 Nov 2020 11:03:17 / added @ 17 Nov 2020 11:03:17)
      ├─ c8d2d9b86403219ddd398a75de0a425ad223d01c-shake128 /
      failed to load catalog 98d744d30060a2a6bfe21f6ed2c85c1adc65171a-shake128C (3 - network failure)
      garbage collection failed
      Segmentation fault
      Fail (6)!
      umount: /cvmfs/soft.computecanada.ca: device is busy.
              (In some cases useful info about processes that use
               the device is found by lsof(8) or fuser(1))
      

      I am not sure why there was a network failure, maybe some momentary blip, but even if a network failure does occur, preferably CVMFS server would handle it and fail gracefully. Instead there was a seg fault and the GC processes hung indefinitely.
      There were also many abort processes piled up (from users trying to clear up the status of the repo)

      root      9692     1  0 05:51 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server abort -f soft.computecanada.ca
      root      9835  9692  0 05:51 ?        00:00:00 /bin/mount /var/spool/cvmfs/soft.computecanada.ca/rdonly
      root      9836  9835  0 05:51 ?        00:00:00 cvmfs2 soft.computecanada.ca /var/spool/cvmfs/soft.computecanada.ca/rdonly -o rw,nodev,allow_other,config=/etc/cvmfs/repositories.d/soft.computecanada.ca/client.conf:/var/spool/cvmfs/soft.computecanada.ca/client.local,cvmfs_suid,suid
      root     20371     1  0 Nov29 ?        00:00:00 cvmfs2 restricted.computecanada.ca /var/spool/cvmfs/restricted.computecanada.ca/rdonly -o rw,nodev,allow_other,config=/etc/cvmfs/repositories.d/restricted.computecanada.ca/client.conf:/var/spool/cvmfs/restricted.computecanada.ca/client.local,cvmfs_suid,suid
      root     20376     1  0 Nov29 ?        00:00:00 cvmfs2 restricted.computecanada.ca /var/spool/cvmfs/restricted.computecanada.ca/rdonly -o rw,nodev,allow_other,config=/etc/cvmfs/repositories.d/restricted.computecanada.ca/client.conf:/var/spool/cvmfs/restricted.computecanada.ca/client.local,cvmfs_suid,suid
      root     20967 20962  0 02:00 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server gc -a -l -f
      root     21128 20967  0 02:00 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server gc -a -l -f
      root     21131 21128  0 02:00 ?        00:00:00 /bin/sh /usr/bin/cvmfs_server gc -a -l -f
      

      When I ran abort, I saw it hung on this log message:

      2020-11-30T10:51:17.588-08:00 local@harpsponge.comp.uvic.ca user.notice cvmfs2: (soft.computecanada.ca) another process holds ./lock_cachedb, wai
      ting.
      

      After killing the hung gc processes (and aborts) it recovered.

        Attachments

        1. s0.log
          24 kB
        2. s0 stats.pdf
          182 kB

          Issue Links

            Activity

              People

              Assignee:
              jblomer Jakob Blomer
              Reporter:
              rptaylor Ryan Taylor
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: