Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-10119

TChain constructor / destructor confusion

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: 6.16/00
    • Fix Version/s: None
    • Component/s: I/O
    • Labels:
      None
    • Environment:

      CentOS 7 and SLC6 with GCC 6.2

    • Development:

      Description

      Hi,

      I'm running into a very strange issue with the ATLAS analysis software that I can't make heads or tails out of.

      After a long delay, we switched our analysis releases to LCG_95 / ROOT 6.16/00 last night. And now we see two of our tests failing, which seem to come from the same source. One of the failing examples is this:

      http://atlas-computing.web.cern.ch/atlas-computing/links/distDirectory/gitwww/21_2WebArea/nicos_web_area212_AthAnalysis_x86_64slc6gcc62opt64BS6G62AthenaOpt/NICOS_TestLog_2019-05-14T0322/Control_xAODRootAccess___xAODRootAccessConf__xAODRootAccessTest__m.html

      I rebuilt all the code in debug mode locally to try to find how the ROOT code crashes exactly. I get this trace:

      [bash][pcadp02]:AnalysisBase > ./Control/xAODRootAccess/test-bin/ut_xaodrootaccess_transtree_test.exe 
      ut_xaodrootaccess_tran... INFO    Environment initialised for data access
      xAOD::MakeTransientTree   INFO    Created transient tree "CollectionTree" in ROOT's common memory
      TCanvas::MakeDefCanvas    INFO     created default TCanvas with name c1
      xAOD::MakeTransientTree   INFO    Created transient tree "CollectionTree" in ROOT's common memory
      xAOD::MakeTransientMet... INFO    Created transient metadata tree "MetaData" in ROOT's common memory
      xAOD::MakeTransientMet... INFO    Created transient metadata tree "MetaData" in ROOT's common memory
      xAOD::MakeTransientTree   INFO    Created transient tree "CollectionTree" in ROOT's common memory
      xAOD::MakeTransientMet... INFO    Created transient metadata tree "MetaData" in ROOT's common memory
      TUnixSystem::DispatchS... ERROR   segmentation violation
       
       
       
      ===========================================================
      There was a crash (kSigSegmentationViolation).
      This is the entire stack trace of all threads:
      ===========================================================
      #0  0x00007f75b32afa3c in waitpid () from /lib64/libc.so.6
      #1  0x00007f75b322dde2 in do_system () from /lib64/libc.so.6
      #2  0x00007f75b6474d38 in TUnixSystem::Exec (this=0x20a7630, shellcmd=0x1bbee9a0 "/home/krasznaa/projects/lcg95/build/install/AnalysisBaseExternals/21.2.75/InstallArea/x86_64-centos7-gcc62-dbg/etc/gdb-backtrace.sh 16942 1>&2") at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/core/unix/src/TUnixSystem.cxx:2119
      #3  0x00007f75b64755a3 in TUnixSystem::StackTrace (this=0x20a7630) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/core/unix/src/TUnixSystem.cxx:2413
      #4  0x00007f75b6478e89 in TUnixSystem::DispatchSignals (this=0x20a7630, sig=kSigSegmentationViolation) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/core/unix/src/TUnixSystem.cxx:3644
      #5  0x00007f75b6471090 in SigHandler (sig=kSigSegmentationViolation) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/core/unix/src/TUnixSystem.cxx:408
      #6  0x00007f75b6478ddc in sighandler (sig=11) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/core/unix/src/TUnixSystem.cxx:3621
      #7  <signal handler called>
      #8  adjust_pointer<void> (offset=<error reading variable: Cannot access memory at address 0x763c726f74636566>, base=0x1416d600) at /afs/cern.ch/cms/CAF/CMSCOMM/COMM_ECAL/dkonst/GCC/build/contrib/gcc-6.2.0/src/gcc/6.2.0/libstdc++-v3/libsupc++/tinfo.h:68
      #9  __cxxabiv1::__dynamic_cast (src_ptr=0x1416d600, src_type=0x605d88 <typeinfo for TObject>, dst_type=0x7f75b67dc078 <typeinfo for TNotifyLinkBase>, src2dst=0) at /afs/cern.ch/cms/CAF/CMSCOMM/COMM_ECAL/dkonst/GCC/build/contrib/gcc-6.2.0/src/gcc/6.2.0/libstdc++-v3/libsupc++/dyncast.cc:55
      #10 0x00007f75b5d4bd5e in TTree::~TTree (this=0x7fff2daa68e0, __in_chrg=<optimized out>) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/tree/tree/src/TTree.cxx:885
      #11 0x00007f75b5cff9ab in TChain::~TChain (this=0x7fff2daa68e0, __in_chrg=<optimized out>) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/tree/tree/src/TChain.cxx:176
      #12 0x0000000000403a59 in main () at /home/krasznaa/projects/lcg95/athena/Control/xAODRootAccess/test/ut_xaodrootaccess_transtree_test.cxx:63
      ===========================================================
       
       
      The lines below might hint at the cause of the crash.
      You may get help by asking at the ROOT forum http://root.cern.ch/forum
      Only if you are really convinced it is a bug in ROOT then please submit a
      report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
      from above as an attachment in addition to anything else
      that might help us fixing this issue.
      ===========================================================
      #8  adjust_pointer<void> (offset=<error reading variable: Cannot access memory at address 0x763c726f74636566>, base=0x1416d600) at /afs/cern.ch/cms/CAF/CMSCOMM/COMM_ECAL/dkonst/GCC/build/contrib/gcc-6.2.0/src/gcc/6.2.0/libstdc++-v3/libsupc++/tinfo.h:68
      #9  __cxxabiv1::__dynamic_cast (src_ptr=0x1416d600, src_type=0x605d88 <typeinfo for TObject>, dst_type=0x7f75b67dc078 <typeinfo for TNotifyLinkBase>, src2dst=0) at /afs/cern.ch/cms/CAF/CMSCOMM/COMM_ECAL/dkonst/GCC/build/contrib/gcc-6.2.0/src/gcc/6.2.0/libstdc++-v3/libsupc++/dyncast.cc:55
      #10 0x00007f75b5d4bd5e in TTree::~TTree (this=0x7fff2daa68e0, __in_chrg=<optimized out>) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/tree/tree/src/TTree.cxx:885
      #11 0x00007f75b5cff9ab in TChain::~TChain (this=0x7fff2daa68e0, __in_chrg=<optimized out>) at /home/krasznaa/projects/lcg95/build/build/AnalysisBaseExternals/src/ROOT/tree/tree/src/TChain.cxx:176
      #12 0x0000000000403a59 in main () at /home/krasznaa/projects/lcg95/athena/Control/xAODRootAccess/test/ut_xaodrootaccess_transtree_test.cxx:63
      ===========================================================
       
       
      xAOD::TFileAccessTracer   INFO    Sending file access statistics to http://rucio-lb-prod.cern.ch:18762/traces/
      [bash][pcadp02]:AnalysisBase >
      

      Which just doesn't make any sense to me. The line in our code that triggers the crash is apparently this one:

      https://gitlab.cern.ch/atlas/athena/blob/21.2/Control/xAODRootAccess/test/ut_xaodrootaccess_transtree_test.cxx#L63

      But why would the TChain destructor be called instead of the constructor? Along the ROOT update I actually also introduced some smaller CMake updates into our code. But nothing that I could imagine explaining this weird behaviour. o.O All the compilation/linking commands for this code can be found here if anyone's interested:

      http://atlas-computing.web.cern.ch/atlas-computing/links/distDirectory/gitwww/GITWebArea/nightlies/21.2/2019-05-14T0332/AnalysisBase/x86_64-slc6-gcc62-opt/AnalysisBase/Control.xAODRootAccess.log.html

      I also ran the code through Valgrind, thinking that maybe some upstream memory corruption is responsible for confusing the stack trace. But the first thing that Valgrind reports about is the same...

      xAOD::MakeTransientTree   INFO    Created transient tree "CollectionTree" in ROOT's common memory
      TCanvas::MakeDefCanvas    INFO     created default TCanvas with name c1
      xAOD::MakeTransientTree   INFO    Created transient tree "CollectionTree" in ROOT's common memory
      xAOD::MakeTransientMet... INFO    Created transient metadata tree "MetaData" in ROOT's common memory
      xAOD::MakeTransientMet... INFO    Created transient metadata tree "MetaData" in ROOT's common memory
      xAOD::MakeTransientTree   INFO    Created transient tree "CollectionTree" in ROOT's common memory
      xAOD::MakeTransientMet... INFO    Created transient metadata tree "MetaData" in ROOT's common memory
      ==10029== Invalid read of size 8
      ==10029==    at 0x6E9FC3D: __dynamic_cast (dyncast.cc:50)
      ==10029==    by 0x5823D5D: TTree::~TTree() (TTree.cxx:885)
      ==10029==    by 0x57D79AA: TChain::~TChain() (TChain.cxx:176)
      ==10029==    by 0x403A58: main (ut_xaodrootaccess_transtree_test.cxx:63)
      ==10029==  Address 0x2f7ab730 is 0 bytes inside an unallocated block of size 48 in arena "client"
      ==10029== 
      ==10029== Invalid read of size 8
      ==10029==    at 0x6E9FC40: adjust_pointer<void> (tinfo.h:68)
      ==10029==    by 0x6E9FC40: __dynamic_cast (dyncast.cc:55)
      ==10029==    by 0x5823D5D: TTree::~TTree() (TTree.cxx:885)
      ==10029==    by 0x57D79AA: TChain::~TChain() (TChain.cxx:176)
      ==10029==    by 0x403A58: main (ut_xaodrootaccess_transtree_test.cxx:63)
      ==10029==  Address 0x6b63617254465337 is not stack'd, malloc'd or (recently) free'd
      ==10029== 
      TUnixSystem::DispatchS... ERROR   segmentation violation
      

      I was considering possibly posting this to the ROOT forum, but then thought that it would be better handled on JIRA instead. If anybody has any ideas of what could be going wrong, please let us know.

      Note that all our other unit- and integration tests ran fine last night. So it's not a "full breakdown" of ROOT that we see. Just some very specific tests are breaking down...

      Cheers,
      Attila

        Attachments

          Activity

            People

            • Assignee:
              pcanal Philippe Canal
              Reporter:
              akraszna Attila Krasznahorkay
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: