[ROOT-6547] tutorials/pyroot/multifit.py segfaults Created: 05/Aug/14  Updated: 06/Sep/14  Resolved: 06/Sep/14

Status: Closed
Project: ROOT
Component/s: PyROOT
Affects Version/s: 5.34/00
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mattias Ellert Assignee: Philippe Canal
Resolution: Fixed Votes: 0
Labels: None
Environment:

Fedora 20


Attachments: File root-test-crash.patch    
Development:

 Description   

Patch attached.



 Comments   
Comment by Wim Lavrijsen [ 05/Aug/14 ]

Hi,

sorry, I neither see a segfault, nor do I understand why the patch would make any difference?

What segfault do you get?

Cheers,
Wim

Comment by Bertrand Bellenot [ 06/Aug/14 ]

Hi Wim,

Maybe the one we see on Windows: https://cdash.cern.ch/testDetails.php?test=2991065&build=70980

Cheers, Bertrand.

Comment by Mattias Ellert [ 06/Aug/14 ]

If I run multifit.py either as root or as my own user with bytcode files (.pyc/.pyo) files present the execution ends with:

      • Error in `python': corrupted double-linked list: 0x0000000002069790 ***
        ======= Backtrace: =========
        /lib64/libc.so.6[0x3520875cff]
        /lib64/libc.so.6[0x352087c63c]
        /lib64/libc.so.6[0x352087cee6]
        /usr/lib64/root/libCint.so.5.34(ZN4Cint8Internal18G_BufferReservoirD1Ev+0x53)[0x7f6073aeae73]
        /lib64/libc.so.6[0x35208394c9]
        /lib64/libc.so.6[0x3520839515]
        /lib64/libc.so.6(__libc_start_main+0xfc)[0x3520821d6c]
        python[0x400721]
        ======= Memory map: ========
        [ ,,, ]
        Avbruten (SIGABRT) (minnesutskrift skapad)

If I run as my own user without bytecode files present there is no error.

With the patch applied there is no crash when running as root.

Comment by Wim Lavrijsen [ 26/Aug/14 ]

Hi,

okay, but I still can't reproduce it and it still doesn't make any sense.

Problem with memory corruption is that yes, you can silence them by pushing the memory layout around a bit (sometimes as little as a print statement helps), but that's not a solution.

If I run valgrind (both with and without the patch), I see lots of cling stuff showing up, but no python thingies.

From the set of nightly tests, two have a stack trace, and both point to TROOT::EndOfProcessCleanups() cleaning up the canvases:
http://cdash.cern.ch/testDetails.php?test=3286510&build=72982
http://cdash.cern.ch/testDetails.php?test=3684631&build=72988

In ROOT.py, I have for application shutdown:
if isCocoa: gROOT.GetListOfCanvases().Delete()

(I see failures on some macs as well; don't know if they have Cocoa enabled though). Making that cleanup always explicit (i.e. earlier in the shutdown) might do the trick.

Could you test that?

Cheers,
Wim

Comment by Pere Mato Vila [ 27/Aug/14 ]

Hi Wim.
I do not see the line

if isCocoa: gROOT.GetListOfCanvases().Delete()

. This only appears in the master and not in the 5.34 branch. On the other hand, I have seen that if I remove the

-b

from the command line, the tutorial does not crash. So, it is certainly related to deleting canvases. What I do not understand is that -b prevents from creating canvases. So, why is deleting them? The following is the traceback of the crash.

  * frame #0: 0x00007fff904d3866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff976da35c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff8d2d4b1a libsystem_c.dylib`abort + 125
    frame #3: 0x00007fff8feaa07f libsystem_malloc.dylib`free + 411
    frame #4: 0x0000000101f29c72 libCore.so`TString::operator=(TString const&) [inlined] TString::UnLink() const + 21 at TString.h:239
    frame #5: 0x0000000101f29c5d libCore.so`TString::operator=(this=0x0000000102367248, rhs=0x00007fff5fbfefc0) + 29 at TString.cxx:286
    frame #6: 0x0000000101f434c3 libCore.so`TSystem::GetLibraries(this=0x0000000100538050, regexp=0x000000010226a208, options=<unavailable>, isRegexp=true) + 1347 at TSystem.cxx:2097
    frame #7: 0x0000000101f41ea6 libCore.so`TSystem::Load(this=0x0000000100538050, module=0x000000010481dd19, entry=0x0000000000000000, system=true) + 70 at TSystem.cxx:1773
    frame #8: 0x0000000101f2625f libCore.so`TROOT::LoadClass(this=0x0000000102351510, (null)=<unavailable>, libname=0x000000010481dd19, check=true) + 431 at TROOT.cxx:1569
    frame #9: 0x0000000101e9c910 libCore.so`TCint::AutoLoad(this=<unavailable>, cls=0x00007fff5fbff349) + 272 at TCint.cxx:1992
    frame #10: 0x0000000101f26436 libCore.so`TROOT::LoadClass(this=0x0000000102351510, requestedname=<unavailable>, silent=false) const + 326 at TROOT.cxx:1497
    frame #11: 0x0000000101ea18e1 libCore.so`TClass::GetClass(name=0x0000000104875779, load=true, silent=false) + 1329 at TClass.cxx:2692
    frame #12: 0x0000000101e938ae libCore.so`TBaseClass::GetClassPointer(this=0x0000000104875760, load=true) + 78 at TBaseClass.cxx:67
    frame #13: 0x0000000101f1bd3e libCore.so`TQObject::CollectClassSignalLists(this=0x000000010443a6a0, list=0x00007fff5fbff570, cls=<unavailable>) + 158 at TQObject.cxx:544
    frame #14: 0x0000000101f1c719 libCore.so`TQObject::Emit(this=0x000000010443a6a0, signal_name=0x000000010655b5e4) + 169 at TQObject.cxx:646
    frame #15: 0x0000000106511392 libGpad.so`~TCanvas [inlined] TCanvas::Destructor(this=<unavailable>) + 114 at TCanvas.cxx:653
    frame #16: 0x0000000106511320 libGpad.so`~TCanvas(this=0x000000010443a630) + 96 at TCanvas.cxx:624
    frame #17: 0x000000010651113f libGpad.so`~TCanvas(this=0x000000010443a630) + 15 at TCanvas.cxx:621
    frame #18: 0x0000000101e78075 libCore.so`TList::Delete(this=0x0000000100543630, option=<unavailable>) + 677 at TList.cxx:459
    frame #19: 0x0000000101f247e2 libCore.so`TROOT::EndOfProcessCleanups(this=0x0000000102351510) + 114 at TROOT.cxx:759
    frame #20: 0x00007fff8d2d5794 libsystem_c.dylib`__cxa_finalize + 164
    frame #21: 0x00007fff8d2d5a4c libsystem_c.dylib`exit + 22

Comment by Wim Lavrijsen [ 30/Aug/14 ]

Pere,

I missed that this was v5.34; was thinking dev. Canvases can still be created with -b, they'd just have no associated window (one use of this is to print to .pdf).

I don't understand that traceback, but given that this is during shutdown, I don't expect the system to be in a consistent state and don't like that AutoLoad at that time at all.

Note however, that if I add the line:

gROOT.GetListOfCanvases().Delete()

in v5.34, I get a segfault right there, so that is not a solution either.

Cheers,
Wim

Comment by Axel Naumann [ 30/Aug/14 ]

Indeed, there is no (relevant) python in the backtrace; to me it simply looks like a tear down issue with TClass versus signal/slot. I'd recommend assigning to Philippe...

Comment by Philippe Canal [ 06/Sep/14 ]

Hi,

This problem has been fixed on both the master and the v5-34-00-patch branches (even though the random behavior was not as visible on the master, it was still present) by existing the cleanups done at the end of ROOT.py.

Cheers,
Philippe.

Generated at Sat Sep 21 06:42:38 CEST 2019 using Jira 7.13.1#713001-sha1:5e06076c2d215a6f699b7e5c90ab2fae7ba5a1ce.