Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-6698

Crash in PROOF lite

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 5.34/00
    • Fix Version/s: 6.02.00
    • Component/s: PROOF
    • Labels:
      None
    • Environment:

      SLC6
      ROOT 5.34.20, 5.34.21 and GIT branch v5-34-00-patches

      Description

      Hi,

      since switching to ROOT 5.34.20, but also with the more recent versions 5.34.21 and the GIT branch v5-34-00-patches from this morning, I observe crashes of PROOF lite. I set up the processing like this:

      TChain a("USR314")
      TProof::Open("lite://")
      a.SetProof()
      a.Add("/nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.*.root")
      a.Process("KinematicPlotsEta.C+")

      The output on the console is

      Info in <TProofLite::SetQueryRunning>: starting query: 1
      Info in <TProofQueryResult::SetRunning>: nwrks: 4       
      Looking up for exact location of files: OK (493 files)                 
      Info in <TPacketizerAdaptive::TPacketizerAdaptive>: Setting max number of workers per node to 4
      10:10:45 17043 Wrk-0.1 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W33_slot4_69611.root
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W33_slot4_69611.root - skipping
      10:10:45 17043 Wrk-0.1 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70053.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70053.root - skipping
      10:10:45 17047 Wrk-0.3 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70054.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70054.root - skipping
      10:10:45 17043 Wrk-0.1 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70070.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70070.root - skipping
      10:10:45 17041 Wrk-0.0 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70225.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W35_slot4_70225.root - skipping
      10:10:45 17047 Wrk-0.3 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70525.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70525.root - skipping
      10:10:45 17045 Wrk-0.2 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70526.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70526.root - skipping
      10:10:45 17043 Wrk-0.1 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70726.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70726.root - skipping
      10:10:46 17045 Wrk-0.2 | Error in <TDSet::GetEntries>: cannot find tree "USR314" in /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70939.root   
      Error in <TPacketizerAdaptive::ValidateFiles>: cannot get entries for file: /nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W37_slot4_70939.root - skipping
      Info in <TPacketizerAdaptive::InitStats>: fraction of remote files 0.000000                                                                                         
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      entries: -1 (-1)                                                                                                                                                    
      Info in <TProofLite::MarkBad>:                                                                                                                                      
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.1) as bad                                     
       +++ Reason: received kPROOF_FATAL                                                                                                                                  
       
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.1) as bad
       +++ Reason: received kPROOF_FATAL                                                                                             
       
       +++ Most likely your code crashed
       +++ Please check the session logs for error messages either using
       +++ the 'Show logs' button or executing                          
       +++                                                              
       +++ root [] TProof::Mgr("optiplex09.e18.physik.tu-muenchen.de")->GetSessionLogs()->Display("*")
       
       
      entries: 66 (66)
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 28 (28)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 61 (61)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 46 (46)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 20 (20)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 36 (36)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      Info in <TPacketizerAdaptive::InitStats>: fraction of remote files 0.000000
      0.2: caught exception triggered by signal '1' while processing dset:'TDSet:USR314', file:'/nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W33_slot4_69616.root' - check logs for possible stacktrace - last event: 7                                                                                                                                                                                                         
      0.0: caught exception triggered by signal '1' while processing dset:'TDSet:USR314', file:'/nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W33_slot4_69623.root' - check logs for possible stacktrace - last event: 9                                                                                                                                                                                                         
      Info in <TProofLite::MarkBad>:                                                                                                                                                                                     
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.0) as bad                                                                                    
       +++ Reason: undefined message in TProof::CollectInputFrom(...)                                                                                                                                                    
       
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.0) as bad
       +++ Reason: undefined message in TProof::CollectInputFrom(...)                                                                
       
       +++ Most likely your code crashed
       +++ Please check the session logs for error messages either using
       +++ the 'Show logs' button or executing                          
       +++                                                              
       +++ root [] TProof::Mgr("optiplex09.e18.physik.tu-muenchen.de")->GetSessionLogs()->Display("*")
       
       
      entries: 28 (28)
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 10 (10)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 14 (14)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      Info in <TPacketizerAdaptive::InitStats>: fraction of remote files 0.000000
      0.3: caught exception triggered by signal '1' while processing dset:'TDSet:USR314', file:'/nfs/mds/user/suhl/analysis/slot4/filtered_eta/hist.2008_W33_slot4_69643.root' - check logs for possible stacktrace - last event: 29                                                                                                                                                                                                        
      Info in <TProofLite::MarkBad>:                                                                                                                                                                                     
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.3) as bad                                                                                    
       +++ Reason: undefined message in TProof::CollectInputFrom(...)                                                                                                                                                    
       
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.3) as bad
       +++ Reason: undefined message in TProof::CollectInputFrom(...)                                                                
       
       +++ Most likely your code crashed
       +++ Please check the session logs for error messages either using
       +++ the 'Show logs' button or executing                          
       +++                                                              
       +++ root [] TProof::Mgr("optiplex09.e18.physik.tu-muenchen.de")->GetSessionLogs()->Display("*")
       
       
      entries: 85 (85)
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 30 (30)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 103 (103)                                                         
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      Info in <TPacketizerAdaptive::InitStats>: fraction of remote files 0.000000
      Info in <TProofLite::MarkBad>:                                             
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.2) as bad
       +++ Reason: undefined message in TProof::CollectInputFrom(...)                                                                
       
       +++ Message from master at optiplex09.e18.physik.tu-muenchen.de : marking optiplex09.e18.physik.tu-muenchen.de:-1 (0.2) as bad
       +++ Reason: undefined message in TProof::CollectInputFrom(...)                                                                
       
       +++ Most likely your code crashed
       +++ Please check the session logs for error messages either using
       +++ the 'Show logs' button or executing                          
       +++                                                              
       +++ root [] TProof::Mgr("optiplex09.e18.physik.tu-muenchen.de")->GetSessionLogs()->Display("*")
       
       
      entries: 91 (91)
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 5 (5)                                                             
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 33 (33)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 8 (8)                                                             
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      entries: 85 (85)                                                           
      Error in <TPacketizerAdaptive::SplitPerHost>: Error removing a missing file
      Info in <TPacketizerAdaptive::InitStats>: fraction of remote files 0.000000
      Lite-0: all output objects have been merged

      The PROOF session log is attached.

      The same script works on the same data without problems in 5.34.19. The crash does only occur if I process trees obtained from real data, for the roughly 500 trees the number of entries ranges between 0 and 350 (with about half of the trees having less than 100 entries). If I process Monte Carlo data, in which case each tree contains 50001 events, no crash occurs. The crash also occurs with a single worker.

      In the attached logfile there is this error message Fatal: IsWriting() violated at line 3346 of `/tmp/suhl/root/io/io/src/TBufferFile.cxx' for one process (however, running the commands from above multiple times, this message is not always present). TFile::SetCacheRead always occurs in the stack-trace of at least one process.

      Do you have an idea what might be wrong? Thanks, Sebastian

        Attachments

          Activity

            People

            • Assignee:
              ganis Gerardo Ganis
              Reporter:
              suhl Sebastian Uhl
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: