Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-9762

Custom read rule fails to execute in certain situations

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 6.14/04
    • Fix Version/s: 6.16/00, 6.14/08
    • Component/s: I/O
    • Labels:
      None
    • Environment:

      x86_64-slc6-gcc62-opt, but it probably doesn't matter.

      Description

      Hi,

      This will be a tricky one.

      Danilo Piparo, this is what I mentioned this morning. By now I finally managed to distil the issue down to a level that I though I should be able to share with you guys.

      The ATLAS Muon Combined Performance group started complaining recently that some of their jobs are no longer working since we switched the analysis releases to ROOT 6.14/04. They observed a crash pointing back deep into the ROOT I/O code with those jobs.

      I've put a demonstrator of this under:

      /afs/cern.ch/work/k/krasznaa/public/ROOT-ioError
      

      (That data file is not public, so please handle this issue with care...)

      When I run that executable, it crashes with:

      ./demonstrator /home/krasznaa/data/data17/AOD/data17_13TeV.00338349.physics_Main.merge.AOD.f877_m1885/data17_13TeV.00338349.physics_Main.merge.AOD.f877_m1885._lb0447._0004.1 
      xAOD::Init                INFO    Environment initialised for data access
      TUnixSystem::DispatchS... ERROR   segmentation violation
       
       
       
      ===========================================================
      There was a crash.
      This is the entire stack trace of all threads:
      ===========================================================
      #0  0x0000003279cac89e in waitpid () from /lib64/libc.so.6
      #1  0x0000003279c3e4e9 in do_system () from /lib64/libc.so.6
      #2  0x00007f9063795e1d in TUnixSystem::StackTrace() () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libCore.so
      #3  0x00007f9063798584 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libCore.so
      #4  <signal handler called>
      #5  0x00007f9062711971 in ROOT::read_ElementLinkBase_0(char*, TVirtualObject*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAthLinksDict.so
      #6  0x00007f90630d4d22 in int TStreamerInfo::ReadBufferArtificial<char**>(TBuffer&, char** const&, TStreamerElement*, int, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #7  0x00007f90631beded in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #8  0x00007f9063044bd4 in TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #9  0x00007f9062f5f13c in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #10 0x00007f90621fc998 in TBranchElement::GetEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libTree.so
      #11 0x00007f90621fc626 in TBranchElement::GetEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libTree.so
      #12 0x00007f9062e6ed11 in xAOD::TObjectManager::getEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libxAODRootAccess.so
      #13 0x00007f9062e57793 in xAOD::TEvent::setAuxStore(xAOD::TObjectManager&, bool) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libxAODRootAccess.so
      #14 0x00007f9062e59555 in xAOD::TEvent::getInputObject(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::type_info const&, bool, bool) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libxAODRootAccess.so
      #15 0x0000000000402505 in xAOD::TReturnCode xAOD::TEvent::retrieve<DataVector<xAOD::TrackParticle_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >(DataVector<xAOD::TrackParticle_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/src/Control/xAODRootAccess/xAODRootAccess/TEvent.icc:65
      #16 0x00000000004025b4 in xAOD::TReturnCode read<DataVector<xAOD::TrackParticle_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >(xAOD::TEvent&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) () at /home/krasznaa/projects/elRead/demonstrator/demonstrator.cxx:100
      #17 0x0000000000401d87 in main () at /home/krasznaa/projects/elRead/demonstrator/demonstrator.cxx:76
      ===========================================================
       
       
      The lines below might hint at the cause of the crash.
      You may get help by asking at the ROOT forum http://root.cern.ch/forum
      Only if you are really convinced it is a bug in ROOT then please submit a
      report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
      from above as an attachment in addition to anything else
      that might help us fixing this issue.
      ===========================================================
      #5  0x00007f9062711971 in ROOT::read_ElementLinkBase_0(char*, TVirtualObject*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libAthLinksDict.so
      #6  0x00007f90630d4d22 in int TStreamerInfo::ReadBufferArtificial<char**>(TBuffer&, char** const&, TStreamerElement*, int, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #7  0x00007f90631beded in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #8  0x00007f9063044bd4 in TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #9  0x00007f9062f5f13c in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libRIO.so
      #10 0x00007f90621fc998 in TBranchElement::GetEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libTree.so
      #11 0x00007f90621fc626 in TBranchElement::GetEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libTree.so
      #12 0x00007f9062e6ed11 in xAOD::TObjectManager::getEntry(long long, int) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libxAODRootAccess.so
      #13 0x00007f9062e57793 in xAOD::TEvent::setAuxStore(xAOD::TObjectManager&, bool) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libxAODRootAccess.so
      #14 0x00007f9062e59555 in xAOD::TEvent::getInputObject(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::type_info const&, bool, bool) () from /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/lib/libxAODRootAccess.so
      #15 0x0000000000402505 in xAOD::TReturnCode xAOD::TEvent::retrieve<DataVector<xAOD::TrackParticle_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >(DataVector<xAOD::TrackParticle_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at /cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBase/21.2.50/InstallArea/x86_64-slc6-gcc62-opt/src/Control/xAODRootAccess/xAODRootAccess/TEvent.icc:65
      #16 0x00000000004025b4 in xAOD::TReturnCode read<DataVector<xAOD::TrackParticle_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> > >(xAOD::TEvent&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) () at /home/krasznaa/projects/elRead/demonstrator/demonstrator.cxx:100
      #17 0x0000000000401d87 in main () at /home/krasznaa/projects/elRead/demonstrator/demonstrator.cxx:76
      ===========================================================
       
       
      xAOD::TFileAccessTracer   INFO    Sending file access statistics to http://rucio-lb-prod.cern.ch:18762/traces/
      

      The final line it is referring to, is coming from a custom read rule that we define here:

      https://gitlab.cern.ch/atlas/athena/blob/21.2/Control/AthLinksSA/AthLinks/selection.xml#L15

      Now, as the example code says, if you re-order the reading of the objects in it, the problem goes away. Also, if you use an older analysis release, which was still using ROOT 6.12/06, the error also doesn't show up.

      I'll finish this here, I'll add more info in further comments. (Don't want to risk the description becoming too long for JIRA...)

      Cheers,
      Attila

        Attachments

          Activity

            People

            • Assignee:
              pcanal Philippe Canal
              Reporter:
              akraszna Attila Krasznahorkay
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Actual Start:
                Actual End: