Uploaded image for project: 'CernVM'
  1. CernVM
  2. CVM-1967

GC crash on file IO error

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Low
    • Resolution: Fixed
    • Affects Version/s: CernVM-FS 2.7.4
    • Fix Version/s: CernVM-FS 2.8
    • Component/s: CERNVM
    • Labels:
      None
    • Platforms:
      ANY
    • Development:

      Description

      GC failued with a local IO error. Afterwards, the gc process seems to have terminated, but the repo did not automatically recover to a healthy state.

      GC cron encountered an error

      Starting soft-dev.computecanada.ca at Tue Jan 19 02:00:09 PST 2021
      Running Garbage Collection 
        --> marking unreferenced objects [Tue, 19 Jan 2021 10:00:09 GMT]
      Preserving data objects in latest revision
      Preserving Revision 9034 (18 Jan 2021 15:00:23 / added @ 18 Jan 2021 15:00:25)
      ├─ 8f7c4da6f494b17de240df81383896076cce3344-shake128 /
      failed to load catalog c20178fef63c12d89bf856918f2748ac30ecb879C (2 - local I/O failure)
      garbage collection failed
      Segmentation fault
      Fail (6)!
      umount: /cvmfs/soft-dev.computecanada.ca: not mounted
      umount: /cvmfs/soft-dev.computecanada.ca: not mounted
      ERROR from cvmfs_server gc!
      Finished soft-dev.computecanada.ca at Tue Jan 19 02:02:02 PST 2021
      

      system logs:

      2021-01-19T02:00:09.198-08:00 local@cvmfs-server user.notice cvmfs_server: (soft-dev.computecanada.ca) opened transaction
      2021-01-19T02:00:09.257-08:00 local@cvmfs-server user.notice cvmfs_server: (soft-dev.computecanada.ca) started garbage collection
      2021-01-19T02:00:09.261-08:00 local@cvmfs-server authpriv.info runuser: pam_unix(runuser:session): session opened for user libuser by (uid=0)
      2021-01-19T02:00:09.659-08:00 local@cvmfs-server user.notice cvmfs_swissknife: Parameter CVMFS_EXTENDED_GC_STATS was not set in the repository configuration
       file. condemned_bytes were not counted.
       
       
      2021-01-19T02:01:02.195-08:00 local@cvmfs-server user.notice cvmfs_server: (soft-dev.computecanada.ca) aborting transaction
      2021-01-19T02:01:02.403-08:00 local@cvmfs-server user.notice cvmfs_swissknife: Parameter CVMFS_STATISTICS_DB was not set in the repository configuration file. Using default value: /var/spool/cvmfs/soft-dev.computecanada.ca/stats.db
      2021-01-19T02:01:02.449-08:00 local@cvmfs-server kern.info kernel: cvmfs_swissknif[14585]: segfault at 0 ip 000000000047d1a3 sp 00007fff789496e0 error 6 in cvmfs_swissknife[400000+286000]
      2021-01-19T02:01:02.450-08:00 local@cvmfs-server authpriv.info runuser: pam_unix(runuser:session): session closed for user libuser
      2021-01-19T02:01:02.452-08:00 local@cvmfs-server user.notice cvmfs_server: (soft-dev.computecanada.ca) failed to garbage collect
      2021-01-19T02:01:02.477-08:00 local@cvmfs-server user.info wall[15034]: wall: user root broadcasted 2 lines (122 chars)
      2021-01-19T02:01:02.622-08:00 local@cvmfs-server user.warning cvmfs2: (soft-dev.computecanada.ca) looks like cvmfs has been crashed previously
      2021-01-19T02:01:02.622-08:00 local@cvmfs-server user.notice cvmfs2: (soft-dev.computecanada.ca) another process holds ./lock_cachedb, waiting.
      2021-01-19T02:01:06.244-08:00 local@cvmfs-server user.notice cvmfs2: (soft-dev.computecanada.ca) lock ./lock_cachedb acquired
      2021-01-19T02:01:06.255-08:00 local@cvmfs-server user.notice cvmfs2: (soft-dev.computecanada.ca) CernVM-FS: unmounted /var/spool/cvmfs/soft-dev.computecanada.ca/rdonly (soft-dev.computecanada.ca)
      2021-01-19T02:01:06.278-08:00 local@cvmfs-server user.notice cvmfs2: (soft-dev.computecanada.ca) re-building cache database
      2021-01-19T02:01:27.756-08:00 local@cvmfs-server user.notice cvmfs2: (soft-dev.computecanada.ca) removing empty file ./28/d94b83b7229b331fc601362b8e272e02665122-shake128 during automatic cache db rebuild
      

      httpd logs

       
      ::1 - - [19/Jan/2021:02:00:09 -0800] "HEAD /cvmfs/soft-dev.computecanada.ca/.cvmfspublished HTTP/1.1" 200 - "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfspublished HTTP/1.1" 200 666 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "HEAD /cvmfs/soft-dev.computecanada.ca/data/44/7fb861828f7fe7aa3437a3c56877528061b008-shake128X HTTP/1.1" 200 - "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfspublished HTTP/1.1" 200 666 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "HEAD /cvmfs/soft-dev.computecanada.ca/data/44/7fb861828f7fe7aa3437a3c56877528061b008-shake128X HTTP/1.1" 200 - "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "HEAD /cvmfs/soft-dev.computecanada.ca/.cvmfsreflog HTTP/1.1" 200 - "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfspublished HTTP/1.1" 200 666 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "HEAD /cvmfs/soft-dev.computecanada.ca/data/44/7fb861828f7fe7aa3437a3c56877528061b008-shake128X HTTP/1.1" 200 - "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "HEAD /cvmfs/soft-dev.computecanada.ca/.cvmfsreflog HTTP/1.1" 200 - "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfspublished HTTP/1.1" 200 666 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/44/7fb861828f7fe7aa3437a3c56877528061b008-shake128X HTTP/1.1" 200 840 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfswhitelist HTTP/1.1" 200 436 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfsreflog HTTP/1.1" 200 29696 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfspublished HTTP/1.1" 200 666 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/44/7fb861828f7fe7aa3437a3c56877528061b008-shake128X HTTP/1.1" 200 840 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/.cvmfswhitelist HTTP/1.1" 200 436 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/8f/7c4da6f494b17de240df81383896076cce3344-shake128C HTTP/1.1" 200 1341849 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:09 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/2b/4831f14005152cafac2df83759bdc3cd30e28eC HTTP/1.1" 200 2742890 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:10 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/cc/4860fc8c0dc01dc84c6cd440dd297746312d78C HTTP/1.1" 200 2998386 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:10 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/8e/12882f62b9f31e974dd8c9ba6cdaaf17ae8a5fC HTTP/1.1" 200 2668979 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:11 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/d7/161d8670e0c096d20fc9df7535dc7f3b1cd98aC HTTP/1.1" 200 3006050 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:11 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/81/83b2926e5e7700fef455ffbd8d413a03d87325-shake128C HTTP/1.1" 200 117431 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:12 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/67/663c5f46903bc0a2a0e3b747bcdf3e1f96f9a6-shake128C HTTP/1.1" 200 135961 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:00:12 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/cb/d16cb40b7148d23b3494ee4d25e1d8672f0484-shake128C HTTP/1.1" 200 3419954 "-" "cvmfs Fuse 2.7.4"
      # ....
      ::1 - - [19/Jan/2021:02:01:02 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/34/3e9fd588aaf963f86f53afb0b0b35b9a2dd053C HTTP/1.1" 200 2006 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:01:02 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/24/f933f235d0b88c51bb6ff2d11590398dbde61cC HTTP/1.1" 200 2002 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:01:02 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/5c/788a50f1ea76d43e58bbb09cd3928366ec8b38C HTTP/1.1" 200 3116 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:01:02 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/2f/3bf5229b8f69561d79296778ea7b61cc3cb5d8C HTTP/1.1" 200 2014 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:01:02 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/61/3a60203aded8d5483531d27cdfbb753e01a236C HTTP/1.1" 200 2010 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:01:02 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/23/cd8a6a63e0916df45f6c7ca2a3d78bb7ccb9e1C HTTP/1.1" 200 2071 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:01:02 -0800] "GET /cvmfs/soft-dev.computecanada.ca/data/c2/0178fef63c12d89bf856918f2748ac30ecb879C HTTP/1.1" 200 59488 "-" "cvmfs Fuse 2.7.4"
      ::1 - - [19/Jan/2021:02:02:02 -0800] "HEAD /cvmfs/restricted.dev.computecanada.ca/.cvmfspublished HTTP/1.1" 200 - "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.44 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
      

      It appears that the content hash does not match the file name for the catalog in question:

      $ openssl dgst -shake128 0178fef63c12d89bf856918f2748ac30ecb879C 
      SHAKE128(0178fef63c12d89bf856918f2748ac30ecb879C)= 9fc85118bd040bd268d600e6ff888121
      

      However, from our system backups I confirm that the data file has not changed since it was originally written to disk on 2020-08-21 22:41:16 (at least, that is when the backup ran.)

      FWIW a basic sqlite3 integrity check on the decompressed file indicates it is okay:

      $ sqlite3 decomp 
      SQLite version 3.6.20
      Enter ".help" for instructions
      Enter SQL statements terminated with a ";"
      sqlite> pragma integrity_check;
      ok
      

      Also , inspecting a dump of the sqlite3 file, I do not see any obvious corruption. I will attach the compressed object so you can take a look.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jblomer Jakob Blomer
              Reporter:
              rptaylor Ryan Taylor
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: