Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-8544

Regression in ROOT 6.08.04: MPI program hangs when loading TFile plugins

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 6.08/04
    • Fix Version/s: 6.10/00, 6.08/06
    • Component/s: Cling
    • Labels:
      None
    • Environment:

      FZ-Juelich Computing Cluster
      Scientific Linux release 6.8 (Carbon)
      x86_64
      glibc 2.12

    • Development:

      Description

      Hello,

      I'm suffering from a regression which appeared between ROOT 6.08.02 and 6.08.04. I'm working on a large computing cluster in Juelich, Germany and we are using Intel MPI combined with the Intel Compiler version 16.0.2 and libstdc++ from GCC 4.9.3 to do parallel job processing.

      This has worked very well with ROOT 6.08.02. I've recently updated to 6.08.04 and now my MPI jobs hang in TFile::Open. The backtrace looks like this (sorry, no line numbers...)

      #0  0x00002addd5b0548a in mmap64 () from /lib64/libc.so.6
      #1  0x00002addde771e75 in llvm::sys::Memory::allocateMappedMemory(unsigned long, llvm::sys::MemoryBlock const*, unsigned int, std::error_code&) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #2  0x00002addddafd5a8 in llvm::SectionMemoryManager::allocateCodeSection(unsigned long, unsigned int, unsigned int, llvm::StringRef) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #3  0x00002adddbf65223 in cling::Azog::allocateCodeSection(unsigned long, unsigned int, unsigned int, llvm::StringRef) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #4  0x00002addddbbb1b5 in llvm::RuntimeDyldImpl::emitSection(llvm::object::ObjectFile const&, llvm::object::SectionRef const&, bool) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #5  0x00002addddbba8a8 in llvm::RuntimeDyldImpl::findOrEmitSection(llvm::object::ObjectFile const&, llvm::object::SectionRef const&, bool, std::map<llvm::object::SectionRef, unsigned int, std::less<llvm::object::SectionRef>, std::allocator<std::pair<llvm::object::SectionRef const, unsigned int> > >&) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #6  0x00002addddbb9b44 in llvm::RuntimeDyldImpl::loadObjectImpl(llvm::object::ObjectFile const&) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #7  0x00002addddbd4e0b in llvm::RuntimeDyldELF::loadObject(llvm::object::ObjectFile const&) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #8  0x00002addddbb725d in llvm::RuntimeDyld::loadObject(llvm::object::ObjectFile const&) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #9  0x00002adddbf694ce in llvm::orc::ObjectLinkingLayer<cling::IncrementalJIT::NotifyObjectLoadedT>::ConcreteLinkedObjectSet<std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > >, std::unique_ptr<cling::Azog, std::default_delete<cling::Azog> >, std::unique_ptr<llvm::orc::LambdaResolver<cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}, cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}>, std::default_delete<{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}> >, std::_List_iterator<std::unique_ptr<llvm::orc::ObjectLinkingLayerBase::LinkedObjectSet, std::default_delete<std::_List_iterator> > > llvm::orc::ObjectLinkingLayer<cling::IncrementalJIT::NotifyObjectLoadedT>::addObjectSet<std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > >, std::unique_ptr<cling::Azog, std::default_delete<cling::Azog> >, llvm::orc::LambdaResolver<cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}, cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}> >(std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > >, std::unique_ptr<cling::Azog, std::default_delete<cling::Azog> >, llvm::orc::LambdaResolver<cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}, cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}>)::{lambda(std::default_delete<std::_List_iterator>, llvm::RuntimeDyld&, std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > > const&, std::function<void ()()>)#1}>::finalize() () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #10 0x00002adddbf6a05d in std::_Function_handler<unsigned long ()(), llvm::orc::ObjectLinkingLayer<cling::IncrementalJIT::NotifyObjectLoadedT>::ConcreteLinkedObjectSet<std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > >, std::unique_ptr<cling::Azog, std::default_delete<cling::Azog> >, std::unique_ptr<llvm::orc::LambdaResolver<cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}, cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}>, std::default_delete<{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}> >, std::_List_iterator<std::unique_ptr<llvm::orc::ObjectLinkingLayerBase::LinkedObjectSet, std::default_delete<std::_List_iterator> > > llvm::orc::ObjectLinkingLayer<cling::IncrementalJIT::NotifyObjectLoadedT>::addObjectSet<std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > >, std::unique_ptr<cling::Azog, std::default_delete<cling::Azog> >, llvm::orc::LambdaResolver<cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}, cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}> >(std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > >, std::unique_ptr<cling::Azog, std::default_delete<cling::Azog> >, llvm::orc::LambdaResolver<cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}, cling::IncrementalJIT::addModules(std::vector<llvm::Module*, std::allocator<llvm::Module*> >&&)::{lambda(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#2}>)::{lambda(std::default_delete<std::_List_iterator>, llvm::RuntimeDyld&, std::vector<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > >, std::allocator<std::unique_ptr<llvm::object::OwningBinary<llvm::object::ObjectFile>, std::default_delete<llvm::object::OwningBinary<llvm::object::ObjectFile> > > > > const&, std::function<void ()()>)#1}>::getSymbolMaterializer(std::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #11 0x00002adddbf6a63d in std::_Function_handler<unsigned long ()(), llvm::orc::LazyEmittingLayer<llvm::orc::IRCompileLayer<cling::IncrementalJIT::RemovableObjectLinkingLayer> >::EmissionDeferredSet::find(llvm::StringRef, bool, llvm::orc::IRCompileLayer<cling::IncrementalJIT::RemovableObjectLinkingLayer>&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #12 0x00002adddbeb24f2 in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #13 0x00002adddbeb1775 in cling::Interpreter::EvaluateInternal(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #14 0x00002adddc004ebb in cling::MetaSema::actOnxCommand(llvm::StringRef, llvm::StringRef, cling::Value*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #15 0x00002adddc018385 in cling::MetaParser::isXCommand(cling::MetaSema::ActionResult&, cling::Value*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #16 0x00002adddc017ba6 in cling::MetaParser::isCommand(cling::MetaSema::ActionResult&, cling::Value*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #17 0x00002adddc0179ed in cling::MetaParser::isMetaCommand(cling::MetaSema::ActionResult&, cling::Value*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #18 0x00002adddc003704 in cling::MetaProcessor::process(char const*, cling::Interpreter::CompilationResult&, cling::Value*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #19 0x00002adddbe23939 in _INTERNAL_67__homea_vsk10_vsk1011_src_root_root_6_08_04_core_meta_src_TCling_cxx_eaa8abb9::HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #20 0x00002adddbe273ec in TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #21 0x00002adddbe264ff in TCling::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #22 0x00002addce81de1e in TApplication::ExecuteFile(char const*, int*, bool) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCore.so
      #23 0x00002adddbe26121 in TCling::ExecuteMacro(char const*, TInterpreter::EErrorCode*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCling.so
      #24 0x00002addce6c76b5 in TROOT::Macro(char const*, int*, bool) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCore.so
      #25 0x00002addce7f45fb in TPluginManager::LoadHandlerMacros(char const*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCore.so
      #26 0x00002addce7f40d6 in TPluginManager::LoadHandlersFromPluginDirs(char const*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCore.so
      #27 0x00002addce7f398b in TPluginManager::FindHandler(char const*, char const*) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libCore.so
      #28 0x00002addcf0e3352 in TFile::Open(char const*, char const*, char const*, int, int) () from /homea/vsk10/vsk1011/public/sw/root/6.08.04/lib/libRIO.so
      #29 0x00002addc1b20723 in IO::TreeWriter::Initialize(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /homea/vsk10/vsk1011/ams/ACsoft/root6_08_04/lib/libacsoft.so.7.5
      #30 0x00000000004088e3 in main ()

      The backtrace is the same in all MPI processes. I strongly suspect that this has something to do with the change that was made to fix ROOT-8523 - or with the other commit to interpreter/llvm/src/lib/ExecutionEngine/SectionMemoryManager.cpp that followed it.

      My program is a compiled program, not a ROOT macro. It is worth noting that the problem only happens when using MPI to launch it in parallel. If I start just one instance of the program it doesn't happen. We are typically launching 5*56 = 280 instances distributed over 5 physically disjunct computing nodes, which communicate with each other via MPI. As soon as more than 1 node is used the program starts to hang.

      Sorry I can't be more specific. Let me know if you need more details.

        Attachments

          Activity

            People

            • Assignee:
              pcanal Philippe Canal
              Reporter:
              bbeische Bastian Beischer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: