Details
-
Bug
-
Resolution: Not a Bug
-
High
-
CernVM-FS 2.1.19
-
None
-
x86_64-slc6-gcc48-opt
-
Description
If I do for example
- cvmfs_server add-replica -o root http://cvmfs-stratum0.gridpp.rl.ac.uk:8000/cvmfs/mice.egi.eu /etc/cvmfs/keys/egi.eu.pub
- cvmfs_server snapshot mice.egi.eu
Then over the next couple minutes the number of TIME_WAIT connections shown by netstat -a keeps increasing. When this was run at RAL on a new machine it went all the way up to the maximum ~28K allowed and resulted in an error
failed to download http://cvmfs-stratum0.gridpp.rl.ac.uk/cvmfs/mice.egi.eu/data/5e/90b14382b49d1b245dd093039609ec188ffce1 (7 - host connection problem), abort
sh: line 1: 32451 Aborted cvmfs_swissknife pull -m mice.egi.eu -u http://cvmfs-stratum0.gridpp.rl.ac.uk/cvmfs/mice.egi.eu -r local,/srv/cvmfs/mice.egi.eu/data/txn,/srv/cvmfs/mice.egi.eu -x /var/spool/cvmfs/mice.egi.eu/tmp -k /etc/cvmfs/keys/egi.eu.pub -n 16 -t 10 -a 3
which cvmfs_swissknife_debug revealed was caused by "curl error 7", or CURLE_COULDNT_CONNECT.
cvmfs_swissknife pull must not be reusing http connections for multiple queries, which is also probably affecting performance. This should be fixed.
Meanwhile some workarounds for the error can be seen in this article
http://www.fromdual.com/huge-amount-of-time-wait-connections
for example
echo 15 > /proc/sys/net/ipv4/tcp_fin_timeout