Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-7901

Double-free in TROOT::EndOfProcessCleanups redux



    • Bug
    • Status: Closed (View Workflow)
    • High
    • Resolution: Fixed
    • 6.04/12
    • 6.06/02
    • Cling
    • None
    • lxplus atlas build


      hi -

      Unfortunately, it looks like ROOT-7775 hasn't been completely fixed as of
      root 6.04.12. In current atlas devval, using root 6.04.12, i can get
      the same crash with:

      $ asetup devval,rel_0,dbg
      $ RecExCommon_links.sh
      $ MALLOC_CHECK_=3 athena.py --stdcmalloc CaloRecEx/CaloRecEx_topOptions.py  2>&1|tee log


      Py:Athena            INFO leaving with code 0: "successful run"
      *** glibc detected *** /afs/cern.ch/sw/lcg/releases/LCG_81b/Python/2.7.9.p1/x86_
      64-slc6-gcc49-dbg/bin/python: free(): invalid pointer: 0x0000000019f19250 ***
      ======= Backtrace: =========

      Whether or not the crash occurs can be sensitive to the environment.
      I see this crash interactively, but it does not happen when the same
      job is run as part of the automated tests in the release build.
      I see the same crash in a couple other jobs in my local test tree,
      but they do not reproduce in devval.

      If my analysis of this is correct (see below), it will be difficult to
      produce a small example that reproduces this. I haven't spent much time
      doing this.

      I instrumented IncrementalExecutor.cpp again, and verified that we are
      indeed calling multiple cleanups with the same argument.

      Here is part of the trace. Here, we're processing constructors for
      `cling-module-45'. This registers one with an argument of 0x7f0253c66278.

       init cling-module-45 _GLOBAL__sub_I_cling_module_45 0x3912620
      4   addexit 0x7f0253c66028 0x3912620
      5   addexit 0x7f0253c66040 0x3912620
      6   addexit 0x7f0253c66058 0x3912620
      7   addexit 0x7f0253c66070 0x3912620
      8   addexit 0x7f0253c66088 0x3912620
      9   addexit 0x7f0253c660a0 0x3912620
      10   addexit 0x7f0253c660b8 0x3912620
      11   addexit 0x7f0253c660d0 0x3912620
      12   addexit 0x7f0253c660e8 0x3912620
      13   addexit 0x7f0253c66100 0x3912620
      14   addexit 0x7f0253c66118 0x3912620
      15   addexit 0x7f0253c66130 0x3912620
      16   addexit 0x7f0253c66148 0x3912620
      17   addexit 0x7f0253c66160 0x3912620
      18   addexit 0x7f0253c66178 0x3912620
      19   addexit 0x7f0253c66198 0x3912620
      20   addexit 0x7f0253c661b8 0x3912620
      21   addexit 0x7f0253c661d8 0x3912620
      22   addexit 0x7f0253c661f8 0x3912620
      23   addexit 0x7f0253c66218 0x3912620
      24   addexit 0x7f0253c66238 0x3912620
      25   addexit 0x7f0253c66258 0x3912620
      26   addexit 0x7f0253c66278 0x3912620

      Later, we add another destrcutor for the same address from

      209 init cling-module-452 __cxx_global_var_initcling_module_4529 0x185c0f80
      210   addexit 0x7f0253c66278 0x185c0f80

      Then we run destructors for the same address when cleaning up both modules:

      540 xact 0x185c6c30 mod cling-module-452
      541 dtor 0x7f0253c66278
      542 dtor 0x7f0253c66238
      957 xact 0x3893520 mod cling-module-45
      958 dtor 0x7f0253c66378
      959 dtor 0x7f0253c66358
      960 dtor 0x7f0253c66338
      961 dtor 0x7f0253c66318
      962 dtor 0x7f0253c662f8
      963 dtor 0x7f0253c662d8
      964 dtor 0x7f0253c662b8
      965 dtor 0x7f0253c66298
      966 dtor 0x7f0253c66278

      Maybe Philippe already sees where i'm going with this...

      His comments in the patch to fix ROOT-7775 pointed out that the problem
      was that the names of the __cxx_global_var_init* functions were not unique
      across modules. As a fix, he added the module name, so the function name
      is formed from "__cxx_global_var_init" + module-name + serial-number.
      However — this is not sufficient to make the names unique, because
      the module-name ends with digits. So for example, serial number
      29 in cling-module-45 would have the same name
      as serial number 9 in cling-module-452!

      I tried making the following change on top of Philippe's fix:

      diff --git a/interpreter/llvm/src/tools/clang/lib/CodeGen/CGDeclCXX.cpp b/interpreter/llvm/src/tools/clang/lib/CodeGen/CGDeclCXX.cpp
      index 893b918..8752772 100644
      --- a/interpreter/llvm/src/tools/clang/lib/CodeGen/CGDeclCXX.cpp
      +++ b/interpreter/llvm/src/tools/clang/lib/CodeGen/CGDeclCXX.cpp
      @@ -294,7 +294,7 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D,
         // Create a variable initialization function.
         llvm::Function *Fn =
      -                                         llvm::Twine(FnName)+moduleName.str(),
      +                                         llvm::Twine(FnName)+moduleName.str()+"_",
         auto *ISA = D->getAttr<InitSegAttr>();

      and the crashes i was seeing went away.




            pcanal Philippe Canal
            ssnyder Scott Snyder
            0 Vote for this issue
            4 Start watching this issue