[ROOT-5280] PROOF-Lite crashes at the finish of the application Created: 14/Jun/13  Updated: 15/May/19  Resolved: 03/Jul/13

Status: Closed
Project: ROOT
Component/s: PROOF
Affects Version/s: 5.34/00
Fix Version/s: None

Type: Bug Priority: High
Reporter: Attila Krasznahorkay Assignee: Gerardo Ganis
Resolution: Fixed Votes: 0
Labels: None
Environment:

Tested on lxplus5, but there are reports of the same happening practically everywhere.


Development:

 Description   

Hi Gerri,

There's a new issue with PROOF-Lite in v5-34-08. (Not sure when it began, we only noticed it here, and it still affects v5-34-00-patches.)

At the end of an application the TProofLite destructor tends to crash. In an SFrame job the bracktrace looks like this:

#8 0x00002abf45aa6460 in TCollection::Contains (this=0x0, obj=0x1ae14e80)
at include/TCollection.h:84
#9 0x00002abf4ac599a0 in TProof::MarkBad (this=0x1a849d20, wrk=0x1ae60390,
reason=0x2abf4adc0b43 "+++ terminating +++")
at /afs/cern.ch/work/k/krasznaa/ROOT/sources/root-dbg/proof/proof/src/TProof.cxx:4246
#10 0x00002abf4ac5a29b in TProof::TerminateWorker (this=0x1a849d20,
wrk=0x1ae60390)
at /afs/cern.ch/work/k/krasznaa/ROOT/sources/root-dbg/proof/proof/src/TProof.cxx:4322
#11 0x00002abf4ac79771 in TProof::RemoveWorkers (this=0x1a849d20, workerList=
0x0)
at /afs/cern.ch/work/k/krasznaa/ROOT/sources/root-dbg/proof/proof/src/TProof.cxx:1406
#12 0x00002abf4ac95a44 in TProofLite::~TProofLite (this=0x1a849d20,
__in_chrg=<value optimized out>)
at /afs/cern.ch/work/k/krasznaa/ROOT/sources/root-dbg/proof/proof/src/TProofLite.cxx:369
#13 0x00002abf4516aaaa in SProofManager::Cleanup (this=0x1accf380)
at src/SProofManager.cxx:253
#14 0x00002abf45147c21 in SCycleController::ShutDownProof (this=0x7fffb41c3420)
at src/SCycleController.cxx:927
#15 0x00002abf45147eb2 in SCycleController::~SCycleController (
this=0x7fffb41c3420, __in_chrg=<value optimized out>)
at src/SCycleController.cxx:81
#16 0x0000000000402170 in main (argc=2, argv=0x7fffb41c3738)
at app/sframe_main.cxx:56

It seems that some list pointer doesn't get initialized in TProofLite.

On lxplus this can be re-created with:

svn co svn://svn.code.sf.net/p/sframe/code/SFrame/trunk SFrame
cd SFrame/
source setup.sh
make
cd user/config
sframe_main FirstCycle_config.xml

Cheers,
Attila



 Comments   
Comment by Attila Krasznahorkay [ 01/Jul/13 ]

Hi Gerri,

The crash is still there with v5-34-09. Any hints to how we could avoid it?

Cheers,
Attila

Comment by Gerardo Ganis [ 01/Jul/13 ]

Hi Attila,

Update.
I had a look during the past days. I cannot reproduce with plain ROOT nowhere (tried my Ubuntu machines and lxplus5 with my own builds and the LCG ones).
The line where it crashes for you corresponds to an access to an internal list which may indicate some corruption somewhere.
I need to run it inside SFrame possibly with valgrind.
I let you know.

Cheers, Gerri

Comment by Attila Krasznahorkay [ 01/Jul/13 ]

Hi Gerri,

I assume that it has something to do with the order in which objects are deleted when the application is shutting down. I guess by the time my SCycleController object is getting deleted, some "ROOT objects" are already gone.

Will run some further tests myself. But the problem is that it's not "only" SFrame jobs. We have most compiled applications failing in this way at the moment.

Cheers,
Attila

Comment by Attila Krasznahorkay [ 01/Jul/13 ]

Nope, apparently I was wrong. The SCycleController objects is already deleted before the application would start with cleaning up after itself. This cleanup happens here:

http://sourceforge.net/p/sframe/code/365/tree/SFrame/trunk/core/src/SProofManager.cxx#l249

But even if I comment out all the delete statements, I still get a crash. But this time from when ROOT tries to clean up after itself at the end of the application.

So I'm less and less sure how PROOF-Lite can work in any compiled application at the moment... :-/

Attila

Comment by Gerardo Ganis [ 03/Jul/13 ]

Hi Attila,

This should be fixed into the master and v5-34-00-patches.
Please try and let me know.

Cheers, Gerri

Comment by Attila Krasznahorkay [ 03/Jul/13 ]

Hi Gerri,

Yep, I don't see a crash anymore. My test job seems to finish correctly this time.

Thanks for the fix!

Cheers,
Attila

Generated at Sun Sep 22 01:29:55 CEST 2019 using Jira 7.13.1#713001-sha1:5e06076c2d215a6f699b7e5c90ab2fae7ba5a1ce.