Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-9332

TFileMerger/hadd using restricted sources list on recursive call ?

    XMLWordPrintable

    Details

    • Development:

      Description

      Dear ROOT gurus,

      I faced many times the case of having hundreds of root files to merge, each containing hundreds of objects (trees and hists mostly) split in different directories, some in common with other files, some existing in only one or few files, and hadd would take ages (several hours) to merge something amounting to only a few hundreds of MB at the end.
      I believe that this is related to run 

      https://github.com/root-project/root/blob/bf8582e586191431069f1e09e62a823709868366/io/io/src/TFileMerger.cxx#L445 
      when thousands of objects are involved.

      Although there may be ways to improve the picture by changing this line, I'd like to make another suggestion that would very likely help the situation for me: could you please consider adding an option that would restrict the recursive call at

      https://github.com/root-project/root/blob/bf8582e586191431069f1e09e62a823709868366/io/io/src/TFileMerger.cxx#L540
      to the part of 'sourcelist' that actually contains the parent folder ?
      I don't mean to make this the default behaviour (one would need to benchmark it) but for sure in some cases it would help since every loop in the subsequent recursive calls would be 'shorter'.
      Note that this would not change much the time spent on looking for a 'key' in 'allNames' but that would already be a good CPU time saving I think.

      What do you think ?

      Thank you in advance for your answer,
      Olivier.

        Attachments

          Activity

            People

            • Assignee:
              pcanal Philippe Canal
              Reporter:
              arnaez Olivier Arnaez
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                PlannedEnd:
                PlannedStart:

                Time Tracking

                Estimated:
                Original Estimate - 5 hours
                5h
                Remaining:
                Remaining Estimate - 5 hours
                5h
                Logged:
                Time Spent - Not Specified
                Not Specified