Uploaded image for project: 'CernVM'
  1. CernVM
  2. CVM-1947

CVMFS 2.7.4-1 crashes periodically

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Low
    • Resolution: Not a Bug
    • None
    • None
    • CVMFS
    • None
    • CentOS 7.8

      # rpm -qa | grep cvmfs
      cvmfs-config-osg-2.4-4.osg35.el7.noarch
      cvmfs-2.7.4-1.osg35.el7.x86_64

       

       

    • GNU/Linux
    • ANY

    Description

      We occasionally see CVMFS crash on our EL7 nodes used by the South Pole Telescope project for usage on the Open Science Grid. We have many interactive users on these machines, all of whom are sourcing software from CVMFS.

      We sometimes, but not always, get a stack trace that looks like this (partially truncated from /var/log/messages):
      Oct 23 15:24:09 scott cvmfs2: (config-osg.opensciencegrid.org) –
      Signal: 6, errno: 2, version: 2.7.4, PID: 3709094
      Executable path: /usr/bin/cvmfs2Thread 16 (Thread 0x7f3f07fd0700 (LWP 3709099)):
      #0 0x00007f3f0ca79c3d in poll () from /lib64/libc.so.6
      #1 0x00007f3f0a35985f in poll (__timeout=-1, __nfds=2, __fds=0x7f3f07fcfee8)
      at /usr/include/bits/poll2.h:46
      #2 Watchdog::MainWatchdogListener (data=0x137a900)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/monitor.cc:481
      #3 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 15 (Thread 0x7f3f077cf700 (LWP 3709100)):
      #0 0x00007f3f0d58275d in read () from /lib64/libpthread.so.0
      #1 0x00007f3f0a37e479 in read (__nbytes=1, __buf=0x7f3f077cee8f, __fd=11)
      at /usr/include/bits/unistd.h:44
      #2 ReadPipe (fd=11, buf=buf@entry=0x7f3f077cee8f, nbyte=nbyte@entry=1)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/util/posix.cc:421
      #3 0x00007f3f0a399638 in FuseInvalidator::MainInvalidator (data=0x137a7a0)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/fuse_evict.cc:111
      #4 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #5 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 14 (Thread 0x7f3f06fce700 (LWP 3709101)):
      #0 0x00007f3f0ca79c3d in poll () from /lib64/libc.so.6
      #1 0x00007f3f0a39a41f in poll (__timeout=<optimized out>, __nfds=1,
      __fds=0x7f3f06fcded0) at /usr/include/bits/poll2.h:46
      #2 FuseRemounter::MainRemountTrigger (data=0x137a720)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/fuse_remount.cc:174
      #3 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 13 (Thread 0x7f3f067cd700 (LWP 3709102)):
      #0 0x00007f3f0ca79c3d in poll () from /lib64/libc.so.6
      #1 0x00007f3f0a3446bf in poll (__timeout=60000, __nfds=1,
      __fds=0x7f3f067cced0) at /usr/include/bits/poll2.h:46
      #2 glue::NentryTracker::MainCleaner (data=0x137a230)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/glue_buffer.cc:187
      #3 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 12 (Thread 0x7f3f05fcc700 (LWP 3709103)):
      #0 0x00007f3f0ca79c3d in poll () from /lib64/libc.so.6
      #1 0x00007f3f0a33a9ca in poll (__timeout=<optimized out>,
      __nfds=<optimized out>, __fds=<optimized out>)
      at /usr/include/bits/poll2.h:46
      #2 download::DownloadManager::MainDownload (data=0x12d6f00)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/download.cc:501
      #3 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 11 (Thread 0x7f3f057cb700 (LWP 3709104)):
      #0 0x00007f3f0ca79c3d in poll () from /lib64/libc.so.6
      #1 0x00007f3f0a33a9ca in poll (__timeout=<optimized out>,
      __nfds=<optimized out>, __fds=<optimized out>)
      at /usr/include/bits/poll2.h:46
      #2 download::DownloadManager::MainDownload (data=0x12d8200)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/download.cc:501
      #3 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 10 (Thread 0x7f3f04fca700 (LWP 3709105)):
      #0 0x00007f3f0ca79c3d in poll () from /lib64/libc.so.6
      #1 0x00007f3f0a39eec8 in poll (__timeout=-1, __nfds=2, __fds=0x7f3efc0008c0)
      at /usr/include/bits/poll2.h:46
      #2 quota::MainWatchdogListener (data=0x12af5b0)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/quota_listener.cc:89
      #3 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 9 (Thread 0x7f3ef7fff700 (LWP 3709106)):
      #0 0x00007f3f0ca79c3d in poll () from /lib64/libc.so.6
      #1 0x00007f3f0a39f3e3 in poll (__timeout=-1, __nfds=2, __fds=0x7f3ef00008c0)
      at /usr/include/bits/poll2.h:46
      #2 quota::MainUnpinListener (data=0x137aa20)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/quota_listener.cc:47
      #3 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 8 (Thread 0x7f3ef77fe700 (LWP 3709107)):
      #0 0x00007f3f0d5829dd in accept () from /lib64/libpthread.so.0
      #1 0x00007f3f0a3a11cf in TalkManager::MainResponder (data=0x1378af0)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/talk.cc:167
      #2 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #3 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 7 (Thread 0x7f3ef6ffd700 (LWP 3709108)):
      #0 0x00007f3f0d5829dd in accept () from /lib64/libpthread.so.0
      #1 0x00007f3f0c74d69a in loader::loader_talk::MainTalk (data=<optimized out>)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/loader_talk.cc:59
      #2 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #3 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 6 (Thread 0x7f3ef67fc700 (LWP 3709109)):
      #0 0x00007f3f0ca7b9a3 in select () from /lib64/libc.so.6
      #1 0x00007f3f0a37f7e9 in SafeSleepMs (ms=ms@entry=100)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/util/posix.cc:1519
      #2 0x00007f3f0a35a407 in Watchdog::SendTrace (sig=6,
      siginfo=<optimized out>, context=<optimized out>)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/monitor.cc:312
      #3 <signal handler called>
      #4 0x00007f3f0c9bc387 in raise () from /lib64/libc.so.6
      #5 0x00007f3f0c9bda78 in abort () from /lib64/libc.so.6
      #6 0x00007f3f0c9b51a6 in __assert_fail_base () from /lib64/libc.so.6
      #7 0x00007f3f0c9b5252 in __assert_fail () from /lib64/libc.so.6
      #8 0x00007f3f0a36e344 in PosixQuotaManager::MakeReturnPipe (
      this=this@entry=0x12b4430, pipe=pipe@entry=0x7f3ef67fbaf0)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/quota_posix.cc:1420
      #9 0x00007f3f0a36e7b7 in PosixQuotaManager::GetSharedStatus (this=0x12b4430,
      gauge=gauge@entry=0x7f3ef67fbb78, pinned=pinned@entry=0x7f3ef67fbb80)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/quota_posix.cc:661
      #10 0x00007f3f0a36e85e in PosixQuotaManager::GetSize (this=<optimized out>)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/quota_posix.cc:676
      #11 0x00007f3f0a38defe in cvmfs::cvmfs_statfs (req=0x7f3ee8000e40, ino=256)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/cvmfs.cc:1246
      #12 0x00007f3f0c749c52 in loader::stub_statfs (req=0x7f3ee8000e40, ino=1)
      at /builddir/build/BUILD/cvmfs-2.7.4/cvmfs/loader.cc:286
      #13 0x00007f3f0be4573b in do_statfs () from /lib64/libfuse.so.2
      #14 0x00007f3f0be44b6b in fuse_ll_process_buf () from /lib64/libfuse.so.2
      #15 0x00007f3f0be41401 in fuse_do_work () from /lib64/libfuse.so.2
      #16 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #17 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 5 (Thread 0x7f3ef5ffb700 (LWP 3709110)):
      #0 0x00007f3f0d58275d in read () from /lib64/libpthread.so.0
      #1 0x00007f3f0be40d62 in fuse_kern_chan_receive () from /lib64/libfuse.so.2
      #2 0x00007f3f0be41d59 in fuse_ll_receive_buf () from /lib64/libfuse.so.2
      #3 0x00007f3f0be4137e in fuse_do_work () from /lib64/libfuse.so.2
      #4 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #5 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 4 (Thread 0x7f3ef57fa700 (LWP 3709111)):
      #0 0x00007f3f0d58275d in read () from /lib64/libpthread.so.0
      #1 0x00007f3f0be40d62 in fuse_kern_chan_receive () from /lib64/libfuse.so.2
      #2 0x00007f3f0be41d59 in fuse_ll_receive_buf () from /lib64/libfuse.so.2
      #3 0x00007f3f0be4137e in fuse_do_work () from /lib64/libfuse.so.2
      #4 0x00007f3f0d57bea5 in start_thread () from /lib64/libpthread.so.0
      #5 0x00007f3f0ca8496d in clone () from /lib64/libc.so.6Thread 3 (Thread 0x7f3ef4ff9700 (LWP 3709112)):
      #0 0x00007f3f0d58275d in r

       

      At Dave Dykstra's recommendation we increased CVMFS_NFILES='131072'. After this, crashes seemed to slow down, but we are not sure if it's related to the configuration change, or an unrelated change in usage patterns.

       

      I am attaching a cvmfs_config bugreport, but it was gathered after the crash unfortunately so I am not sure how helpful it will be.

       

      Please let me know if there's anything further I can do to help debug crashes. 

      Attachments

        1. cvmfs.config-osg.log.tgz
          592 kB
        2. cvmfs-bugreport.tar.gz
          84 kB
        3. cvmfs-bugreport.tar.gz
          343 kB
        4. cvmfs-bugreport-11-22.tar.gz
          120 kB
        5. cvmfs-bugreport-11-23.tar.gz
          134 kB

        Activity

          People

            jblomer Jakob Blomer
            lbryant Lincoln Bryant
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: