Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-8855 Upgrade of TDataFrame for 6.12/00 release
  3. ROOT-9002

Feature requests for TBufferMerger, output thread does too much work

    XMLWordPrintable

Details

    • Sub-task
    • Status: Closed (View Workflow)
    • High
    • Resolution: Fixed
    • 6.10/04
    • 6.12/00
    • I/O
    • None

    Description

      Report from Dan Riley:

      I have a couple of more immediate questions about the TBufferMerger from my scaling studies. The first has to do with this commit,

      https://github.com/root-project/root/commit/a3b9d864a0017479a694f0d6dddb926f4f79f80b

      Prior to that commit, the TBufferMerger::WriteOutputFile() central loop had a fairly limited lock on gROOTMutex:

               {
                  TDirectory::TContext ctxt;
                  TMemFile *tmp;
                  {
                     R__LOCKGUARD2(gROOTMutex);
                     tmp = new TMemFile(fName.c_str(), buffer->Buffer() + buffer->Length(), length, "READ");
                  }
                  buffer->SetBufferOffset(buffer->Length() + length);
                  merger.AddAdoptFile(tmp);
                  merger.PartialMerge();
                  merger.Reset();
               }

      Following that commit, and on into the current version, the gROOTMutex lock expanded to encompass the entire partial merge:

               {
                  R__LOCKGUARD(gROOTMutex);
                  memfile.reset(new TMemFile(fName.c_str(), buffer->Buffer() + buffer->Length(), length, "read"));
                  buffer->SetBufferOffset(buffer->Length() + length);
                  merger.AddFile(memfile.get(), false);
                  merger.PartialMerge();
               }

      The expanded scope of the gROOTMutex is a serious limit on scaling the TBufferMerger for our purposes (partially for reasons related to my next question). Is that scope expansion necessary? Is there any prospect to remove it back to the original limited scope of creating the TMemFile?

      Second question is, unfortunately, far more amorphous. With the prototype I'm working with, I see the AOD output (but not the MINIAOD) writer thread consuming far more CPU time than I would have expected, and from performance traces it seems to be mostly in the zlib deflate() routine (typical stack trace below). This is something of a mystery, as the TBufferMergerFiles being merged are LZMA compressed, and should (I think) have compression fully applied before being passed to the TBufferMerger. Do you know of any ROOT file internal structure that might account for the zlib compression?

      Attachments

        1. screenshot.png
          89 kB
          Guilherme Amadio
        2. tbm-output-zlib.png
          53 kB
          Guilherme Amadio

        Activity

          People

            amadio Guilherme Amadio
            amadio Guilherme Amadio
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:
              Actual Start: