We have a major computing site that uses a preloaded alien cache, and HTTP access is disabled in the manner described in
CVM-1507. We are finding that nodes are not updating to the latest repository revision available in the alien cache in a regular or consistent manner. In some cases, even after 3-4 days a node still has an old catalog revision. The issue is widespread and affects most nodes in the cluster.
Looking at the output of cvmfs_config stat -v, we see , for example:
File Catalog Revision: 1153 (expires in 1 minutes)
After the timer reaches 0 minutes, these messages are logged:
We speculate that after the attempt to download the manifest via HTTP fails, no further action is taken - whereas the client should be looking at the cvmfschecksum files in the alien cache in order to find the latest available root catalog hash, and start using it.
We tried adjusting CVMFS_MAX_TTL but it did not help.
We tried changing CVMFS_HTTP_PROXY to DIRECT and found that this caused the node to load the latest catalog revision almost immediately. However we then run into the problem that files in the latest revision are not available in the preloaded alien cache yet, and the client does not have a writable cache to store any downloaded files in.
This is a large MPI cluster , so it is important for the contents of CVMFS to be consistent across different nodes in the cluster, which is one of the reasons we decided on this configuration of alien cache. How can we get the nodes to stay up to date with the latest version in the alien cache? Is there a bug or misconfiguration?