Details
-
Bug
-
Status: Closed
-
Medium
-
Resolution: Not Needed
-
CernVM-FS 2.7.1
-
None
-
CentOS7
-
ANY
-
Description
We have been seeing occasional client issues where repositories become unavailable and can not be recovered except by killing the associated processes.
For example:
$ ls /cvmfs/atlas.cern.ch
|
ls: cannot access '/cvmfs/atlas.cern.ch': No such file or directory
|
$ sudo mount -t cvmfs atlas.cern.ch /mnt/
|
Repository atlas.cern.ch is already mounted on /cvmfs/atlas.cern.ch
|
The command 'sudo cvmfs_config umount atlas.cern.ch'
completed apparently successfully, but didn't actually change anything.
'sudo cvmfs_talk -i atlas.cern.ch revision' worked and showed the revision number.
If I understand correctly (see attached bugreport), process 5411 is the watchdog pid for this repository, 5407 is the actual pid for atlas.cern.ch, and 5407 holds the pipe(write) that 5411 is reading from.
It seemed that 5407 was stuck in one of its threads, it keeps repeating the accept system call and not finishing:
$ sudo strace -f -p 5407
|
strace: Process 5407 attached with 13 threads
|
[pid 5423] read(16, <unfinished ...>
|
[pid 5422] read(16, <unfinished ...>
|
[pid 5421] accept(4, <unfinished ...>
|
[pid 5420] accept(15, <unfinished ...>
|
[pid 5419] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
|
[pid 5418] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
|
[pid 5417] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
|
[pid 5416] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
|
[pid 5415] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
|
[pid 5414] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
|
[pid 5413] read(12, <unfinished ...>
|
[pid 5412] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
|
[pid 5407] futex(0x7ffc1a3a2a50, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
|
[pid 5415] <... restart_syscall resumed> ) = 0
|
[pid 5415] poll([{fd=23, events=POLLIN|POLLPRI}], 1, 60000 <unfinished ...>
|
[pid 5420] <... accept resumed> {sa_family=AF_LOCAL, NULL}, [2]) = 40
|
[pid 5420] recvfrom(40, "mountpoint", 512, 0, NULL, NULL) = 10
|
[pid 5420] sendto(40, "/cvmfs/atlas.cern.ch\n", 21, MSG_NOSIGNAL, NULL, 0) = 21
|
[pid 5420] shutdown(40, SHUT_RDWR) = 0
|
[pid 5420] close(40) = 0
|
[pid 5420] accept(15, <unfinished ...>
|
[pid 5415] <... poll resumed> ) = 0 (Timeout)
|
[pid 5415] poll([{fd=23, events=POLLIN|POLLPRI}], 1, 60000 <unfinished ...>
|
[pid 5420] <... accept resumed> {sa_family=AF_LOCAL, NULL}, [2]) = 40
|
[pid 5420] recvfrom(40, "mountpoint", 512, 0, NULL, NULL) = 10
|
[pid 5420] sendto(40, "/cvmfs/atlas.cern.ch\n", 21, MSG_NOSIGNAL, NULL, 0) = 21
|
[pid 5420] shutdown(40, SHUT_RDWR) = 0
|
[pid 5420] close(40) = 0
|
[pid 5420] accept(15, <unfinished ...>
|
[pid 5415] <... poll resumed> ) = 0 (Timeout)
|
[pid 5415] poll([{fd=23, events=POLLIN|POLLPRI}], 1, 60000 <unfinished ...>
|
[pid 5420] <... accept resumed> {sa_family=AF_LOCAL, NULL}, [2]) = 40
|
[pid 5420] recvfrom(40, "mountpoint", 512, 0, NULL, NULL) = 10
|
[pid 5420] sendto(40, "/cvmfs/atlas.cern.ch\n", 21, MSG_NOSIGNAL, NULL, 0) = 21
|
[pid 5420] shutdown(40, SHUT_RDWR) = 0
|
[pid 5420] close(40) = 0
|
[pid 5420] accept(15, <unfinished ...>
|
[pid 5415] <... poll resumed> ) = 0 (Timeout)
|
[pid 5415] poll([{fd=23, events=POLLIN|POLLPRI}], 1, 60000 <unfinished ...>
|
[pid 5420] <... accept resumed> {sa_family=AF_LOCAL, NULL}, [2]) = 40
|
[pid 5420] recvfrom(40, "mountpoint", 512, 0, NULL, NULL) = 10
|
[pid 5420] sendto(40, "/cvmfs/atlas.cern.ch\n", 21, MSG_NOSIGNAL, NULL, 0) = 21
|
[pid 5420] shutdown(40, SHUT_RDWR) = 0
|
[pid 5420] close(40) = 0
|
[pid 5420] accept(15, <unfinished ...>
|
[pid 5415] <... poll resumed> ) = 0 (Timeout)
|
[pid 5415] poll([{fd=23, events=POLLIN|POLLPRI}], 1, 60000 <unfinished ...>
|
[pid 5420] <... accept resumed> {sa_family=AF_LOCAL, NULL}, [2]) = 40
|
[pid 5420] recvfrom(40, "mountpoint", 512, 0, NULL, NULL) = 10
|
[pid 5420] sendto(40, "/cvmfs/atlas.cern.ch\n", 21, MSG_NOSIGNAL, NULL, 0) = 21
|
[pid 5420] shutdown(40, SHUT_RDWR) = 0
|
[pid 5420] close(40) = 0
|
[pid 5420] accept(15, <unfinished ...>
|
[pid 5415] <... poll resumed> ) = 0 (Timeout)
|
[pid 5415] poll([{fd=23, events=POLLIN|POLLPRI}], 1, 60000^Cstrace: Process 5407 detached
|
The associated file descriptor is:
cvmfs2 5407 cvmfs 15u unix 0xffff8ffb38b76c00 0t0 60568 ./cvmfs_io.atlas.cern.ch
|
Attachments
Issue Links
- relates to
-
CVM-1478 DoS against new CVMFS mounts from userspace
-
- Open
-