[ROOT-6639] TNetXNGFile : Single readv transfer is too large Created: 03/Sep/14  Updated: 09/Oct/14  Resolved: 08/Oct/14

Status: Closed
Project: ROOT
Component/s: I/O
Affects Version/s: 6.00.02
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Christopher Rob Jones Assignee: Lukasz Janyst (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

LCG Installation of ROOT6


Attachments: File 0001-TNetXNGFile-Make-it-possible-to-disable-querying-for.patch     File 0001-TNetXNGFile-Make-sure-to-handle-echoed-responses-to-.patch     File 0001-TNetXNGFile-when-readv_iov_max-is-wrong-assume-that-.patch     File 0002-TNetXNGFile-Workaround-problem-with-dCache-returning.patch     File ROOT5-Error3.sh     File ROOT6-TNetXNGFile-Error3.sh    
Development:

 Description   

Hi,

Attempts to access data using the LHCb analysis application based on ROOT 6.00.02 (and xrootd 3.3.x) for files at certain sites leads to errors like :-

Error in <TNetXNGFile::ReadBuffers>: [ERROR] Server responded with an error: [3000] Single readv transfer is too large

Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:31073119, branch:_Event.m_version, entry:100, badread=1, nerrors=1, basketnumber=2
RootCnvSvc ERROR Error: createObj> Cannot access the object:9C2AD77D-A533-E111-8267-0024E87687E5:/Event
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=2, basketnumber=2
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=3, basketnumber=2
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=4, basketnumber=2
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=5, basketnumber=2
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=6, basketnumber=2
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=7, basketnumber=2
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=8, basketnumber=2
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=9, basketnumber=2
file probably overwritten: stopping reporting error messages
Error in <TBranchElement::GetBasket>: File: root://dcdoor04.pic.es/pnfs/pic.es/data/lhcb/MC/MC11a/ALLSTREAMS.DST/00014445/0000/00014445_00000019_1.allstreams.dst at byte:0, branch:_Event.m_version, entry:100, badread=0, nerrors=10, basketnumber=2

Error seems to happen after exactly 100 events.... No such problem with our previous application, based on ROOT5 + Xrootd 3.2.x.

I have attached two scripts which reproduce the issue. One for ROOT6 (which fails) and another for ROOT5, accessing the same file, which works fine.

cheers Chris



 Comments   
Comment by Lukasz Janyst (Inactive) [ 04/Sep/14 ]

Pic is a dCache site, and dCache does not implement the xrootd protocol in full. Here xrootd client tries to negotiate the maximal size for a readv and fails, it needs to assume certain defaults which seem to be to large. I can make a patch to make these defaults configurable.

Comment by Lukasz Janyst (Inactive) [ 04/Sep/14 ]

My previous comment was a guess without looking at the code, sorry. After actually looking at the code: the vector read parameters negotiation does succeed. The server advertises nonsense:

]==> xrdfs 193.109.172.23:20083 query config readv_iov_max
2147483647

]==> xrdfs 193.109.172.23:20083 query config readv_ior_max
2097152

This means that it can handle single readv chunk that it 2MB max and it can handle 2147483647 (over two billion) chunks per request. Apparently it cannot.

Comment by Christopher Rob Jones [ 05/Sep/14 ]

Hi,

Thanks for your feedback, but I am afraid I do not understand what this implies w.r.t. fixing the issue. If its a server issue, how come it only effects ROOT6 and not ROOT5 clients - The server is identical in both cases ? Can you perhaps explained a little on what the resolution should be, if it is an issue server side that should fixed there... ?

Chris

Comment by Lukasz Janyst (Inactive) [ 05/Sep/14 ]

When you read your data, you send vector read requests to the server. These may get heavy, so they need to be negotiated with the server. XRootD capable services can advertise what what are the parameters of the readv requests they can handle and the client in ROOT6 gets and honors this information to get the best possible performance. If the readv request you issue is too large for the server to handle, the client would transparently for you split it into multiple smaller requests and then reassemble the responses in such a way, that from your point of view the entire large readv request succeeded. The client in ROOT5 would not negotiate at all, it would just assume some values for these parameters that are small enough, so that all the servers can handle them. Now, if the server does not support parameter negotiation, the client in ROOT6 would assume the same small defaults as the client in ROOT5. Here the server tells the client that it can handle something that it cannot handle in reality but the client believes it. The problem is that the server is misconfigured, and a proper way to fix this problem is to contact the admins and tell them to configure the service properly. Alternatively, I can create a hot-fix (or rather a hot-break) for you that would make the client behave like the old one. But this "hot-break" will be for you only and will not go to any of the releases so if the real problem is not fixed you will see it again in the future.

Comment by Christopher Rob Jones [ 05/Sep/14 ]

Thanks for the feedback, I think I understand now.

I also agree that a 'hot-break' is not a good idea. I will pass this message on to our grid/production experts and ask them to discuss with the sites in question the proper fix.

cheers Chris

Comment by Stefan Roiser [ 26/Sep/14 ]

Lukasz,

one more idea on this topic, would it be possible on the client/application side to configure xroot access such that either it starts from a smaller request size or completely turn off this negotiation? Would such flags exist, without a "hot fix/break"?

I presume you have given the answer between the lines already above, I just wanted to explicitly ask the question before moving on into another direction.

cheers
Stefan

Comment by Lukasz Janyst (Inactive) [ 29/Sep/14 ]

Hi Stefan,

yes, we could add the flag, but it makes very little sense because people would likely disable the negotiation for all the jobs regardless of where the file is located. If the server does not wish support negotiation it should just return an error. In this case safe defaults are used. If it returns a non-error value the client will trust it.

Cheers,
Lukasz

Comment by Christopher Rob Jones [ 29/Sep/14 ]

Hi Lukasz,

I think we all agree this is a server problem and should be fixed there. The problem though is one of practicality - It is going to take a while before dcache makes a release with a fix and then all affected sites using that dcache xrootd implementation update their servers. In the meantime we have this problem...

When a server returns 'bad' parameters, from calls you list above, how obvious could it be, based just on those values, that the server is mis-configured ? I am just wondering if the clients couldn't some how detect broken servers, and in these cases ignore the information they are given and default back to the ROOT5 behaviour ? If possible wouldn't this allow the negotiation to occur where it can, but still allow ROOT6 to communicate with these bad servers (with appropriate warnings of course, so we know its still happening and can then follow up) ?

Chris

Comment by Lukasz Janyst (Inactive) [ 30/Sep/14 ]

Hi Chris,

I discussed this with Stefan in person. I will make a patch that will let you set a ROOT flag to just force the defaults, but please make sure you actually get this bug fixed in dCache.

Cheers,
Lukasz

Comment by Axel Naumann [ 01/Oct/14 ]

Reopened on request of Marco Clemencic; it's unclear how to proceed to get a working ROOT for LHCb. I will try to set up a chat with Marco and Lukasz.

Comment by Lukasz Janyst (Inactive) [ 02/Oct/14 ]

We can always meet if you feel like there is a need to chat. I can just send you a patch and then you can discuss with Marco how you want to make it available to LHCb.

Comment by Lukasz Janyst (Inactive) [ 02/Oct/14 ]

As discussed offline. I attach the patches.

Comment by Axel Naumann [ 02/Oct/14 ]

Thanks a lot for your super quick reaction, Lukasz!

I have uploaded them to v6-02-00-patches; master and 5.34 will follow tomorrow.

Marco, could you please confirm that this works for you?

Cheers, Axel

Comment by Stefan Roiser [ 03/Oct/14 ]

Hi all,

thanks for following up on this. FYI for what concerns LHCb to my understanding we would only need this fix in the ROOT6 part of the repo, ROOT5 does not do this negotiation between client and storage therefore dCache works with xroot access in ROOT 5.

cheers
Stefan

Comment by Lukasz Janyst (Inactive) [ 03/Oct/14 ]

Whether ROOT5 picks up the new client or not depends on which version of xrootd it's built against. AFAIK standard LCG builds use xrootd 3.2.7 which won't include the new client in ROOT.

Comment by Axel Naumann [ 03/Oct/14 ]

Hi,

Looks like we'll have to wait for a full LHCb stack based on v6-02-00-patches; Stefan and I don't manage to reproduce the failure outside.

As this gives 0 reaction time if the "magic value detection" does not work (e.g. because LHCb's dCache server uses a different magic value) we have a more explicit suppression of the ReadV size query: system.rootrc's NetXNG.QueryReadVParams can now be suppressed by setting the environment variable $ROOT_XRD_QUERY_READV_PARAMS to 0 - it defaults to 1 (i.e. "query the ReadV params").

So that's your backup option for building the release on Monday.

Cheers, Axel.

Comment by Marco Clemencic [ 06/Oct/14 ]

Hi,

it looks like the automatic detection does not work with

"root://dcdoor03.pic.es:1094/pnfs/pic.es/data/lhcb/LHCb/Collision12/DIMUON.DST/00020393/0001/00020393_00011582_1.dimuon.dst?filetype=raw"

but I could read the file using the environment variable.

I used a ROOT-only reproducer by Stefan:

TFile* f = TFile::Open("root://dcdoor03.pic.es:1094/pnfs/pic.es/data/lhcb/LHCb/Collision12/DIMUON.DST/00020393/0001/00020393_00011582_1.dimuon.dst?filetype=raw");
Long64_t pos[3] = {0,10000000,20000000};
Int_t len[3] = {10000000,10000000,10000000};
char buf[30000000];
f->ReadBuffers(buf, pos, len, 3)

Cheers
Marco

Comment by Marco Clemencic [ 06/Oct/14 ]

BTW, I tried the xrdfs command and this is what I get:

$ xrdfs dcdoor03.pic.es:1094 query config readv_iov_max
readv_iov_max
 

What am I doing wrong?

Cheers
Marco

Comment by Lukasz Janyst (Inactive) [ 06/Oct/14 ]

You're doing everything right, except that you queried a load-balancer, not a data server that actually serves the file. Echoing the query is one the of the valid responses, except that native xrootd servers never do that for this particular query, so it might not have been taken into account. I will see and provide a fix it it's not properly handled.

Comment by Lukasz Janyst (Inactive) [ 06/Oct/14 ]

I attached a patch that handles this case as well. Can you please give it a go?

Comment by Axel Naumann [ 06/Oct/14 ]

It's now in v6-02-00-patches; Marco or Stefan, could you let us know tomorrow whether this fixes it?

Cheers, Axel.

Comment by Marco Clemencic [ 07/Oct/14 ]

Hi,

I've been testing the latest version and I still see the error, so I decided to give it a go with the debugger.

What I discovered is that not only readv_iov_max is wrong, but also readv_ior_max should not be trusted! The default value 2097136 (0x1ffff0) works, but the value reported by the server, 2097152 (0x200000), gives the error when we try to read the file.

I would modify the patch so that if readv_iov_max is bad, we use the default for readv_ior_max too.

Cheers
Marco

Comment by Lukasz Janyst (Inactive) [ 07/Oct/14 ]

Hi Marco,

readv_iov_max denotes number of readv chunks allowed per request, so it kind of made sense to catch a value like 2147483647 (0x7FFFFFFF) and change it to something like 1024, because the value reported by the server was outlandish. This is kind of different, because readv_ior_max is the maximal allowed size of a readv chunk. So, when the server tells me that this limit is 2097152 (0x200000) - two megabytes, it's not unreasonable to trust it, instead of subtracting 16 bytes from it...

Cheers,
Lukasz

Comment by Marco Clemencic [ 07/Oct/14 ]

Of course, I agree with you.
My only claim is that in this case both of them are bogus (checked empirically), so you should not trust what the server answers if readv_iov_max is wrong.

Cheers
Marco

Comment by Lukasz Janyst (Inactive) [ 07/Oct/14 ]

OK, I'll make a patch.

Lukasz

Comment by Christopher Rob Jones [ 07/Oct/14 ]

But, to play devil's advocate....

If you have already deduced from the readv_iov_max that the server is mis-behaving, and that these values should not be trusted, would it not make sense to then distrust any values you get from the server, readv_ior_max included, even if on the face of it they are 'reasonable' numbers ?

Chris

Comment by Lukasz Janyst (Inactive) [ 07/Oct/14 ]

Yes, the patch is attached.

Comment by Axel Naumann [ 07/Oct/14 ]

...and uploaded to v6-02-00-patches. Waiting for confirmation from Marco / Stefan.

Thanks, Lukasz and Marco!

Comment by Stefan Roiser [ 08/Oct/14 ]

Guys,

I think we can claim victory

root [0] str = "root://dcdoor03.pic.es:1094/pnfs/pic.es/data/lhcb/LHCb/Collision12/DIMUON.DST/00020393/0001/00020393_00011582_1.dimuon.dst?filetype=raw"
(const char *) "root://dcdoor03.pic.es:1094/pnfs/pic.es/data/lhcb/LHCb/Collision12/DIMUON.DST/00020393/0001/00020393_00011582_1.dimuon.dst?filet"...
root [1] TFile *f = TFile::Open(str)
(class TFile *) 0x48997d0
root [2] Long64_t pos[3] =

{0,10000000,20000000}

;
root [3] Int_t len[3] =

{10000000,10000000,10000000}

;
root [4] char buf[30000000];
root [5] f->ReadBuffers(buf, pos, len, 3)
(Bool_t) false
root [6]

.... except the false return, but we don't get the read error anymore so the initial problem I'd say is fixed

Marco as a last try could you please do also the dst_dump within the LHCb runtime environment?

Thanks lot to everyone involved!!!!!

Cheers
Stefan

Comment by Marco Clemencic [ 08/Oct/14 ]

Hi Stefan,

I agree it's weird, but false is the correct result in case of success (TFile:ReadBuffers).

I cannot test with dst_dump... I didn't manage to make it work even in the working case.

Cheers
Marco

Comment by Stefan Roiser [ 08/Oct/14 ]

Marco,

yes, just saw the same, so we should be good I'd say

cheers
Stefan

Comment by Axel Naumann [ 08/Oct/14 ]

Hi,

Thanks everyone! Now that the automatic detection works I have reverted my env var / rootrc commit in v6-02-00-patches, i.e. nobody is listening to $ROOT_XRD_QUERY_READV_PARAMS anymore.

I have also applied Lukasz's commits e71da30 efe4de5 999a0cf 85078fd to master and v5-34-00-patches.

Cheers, Axel.

Comment by Lukasz Janyst (Inactive) [ 08/Oct/14 ]

Thanks!

Comment by Marco Clemencic [ 08/Oct/14 ]

Hi Axel,

we are sure that the automatic detection works in one case. We are not yet sure that we do not need the environment variable.

Cheers
Marco

Comment by Axel Naumann [ 08/Oct/14 ]

Hi Marco,

Well - we know of no case that fails But fair enough, I'll add the env var back in. Where do you need it? When / how can you assert that you don't need it after all?

Cheers, Axel.

Comment by Axel Naumann [ 09/Oct/14 ]

As discussed yesterday I have (re-)applied the env-var commit 9e8ffe2 to master, v6-02-00-patches and v5-34-00-patches.

Generated at Sat Sep 21 07:01:01 CEST 2019 using Jira 7.13.1#713001-sha1:5e06076c2d215a6f699b7e5c90ab2fae7ba5a1ce.