Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-10657

[DF] Speed up jitting



    • Type: Improvement
    • Status: Closed (View Workflow)
    • Priority: High
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.22/00
    • Component/s: RDataFrame
    • Labels:
    • Development:


      The reproducer below by Stefan takes 28 seconds to run on my workstation, 21 of which are spent in RLoopManager::Jit. In particular, this is a usecase with multiple separate computation graphs and therefore multiple separate calls to the interpreter. The first RDF takes 2 seconds to just-in-time compile what it needs for its event-loop, and the others about 1 second each.

      As suggested during a PPP meeting, since most of the jitted expressions in fact contain the same logic, possibly applied to different columns, it might be beneficial to jit a function that implements that logic once, and re-use it as many times as needed: this avoids multiple separate template instantiations of helper types that, at the end of the day, are completely equivalent (and we know are expensive to instantiate thanks to clang's -ftime-trace).

      #include "ROOT/RDataFrame.hxx"
      #include <vector>
      #include <string>
      #include <sstream>
      int main() {
          // Create N dataframes
          // File: root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/GamGam/MC/mc_343981.ggH125_gamgam.GamGam.root
          const int num_df = 20;
          std::vector<ROOT::RDF::RNode> df;
          for (auto i = 0; i < num_df; i++) {
              df.emplace_back(ROOT::RDataFrame("mini", "mc_343981.ggH125_gamgam.GamGam.root").Range(1));
          // Add columns to the jitting via dummy defines
          const std::vector<std::string> columns = {"scaleFactor_PHOTON", "scaleFactor_PhotonTRIGGER",  "scaleFactor_PILEUP", "mcWeight", "photon_pt", "photon_eta", "photon_phi", "photon_E"};
          const int num_cols = 8;
          for (auto i = 0; i < num_df; i++) {
              for (auto j = 0; j < num_cols; j++) {
                  std::stringstream ss;
                  ss << "x" << j;
                  df[i] = df[i].Define(ss.str(), columns[j]);
          // Make histogram of a single column (still, all column definitions are jitted)
          for (auto i = 0; i < num_df; i++) {




            • Assignee:
              eguiraud Enrico Guiraud
              eguiraud Enrico Guiraud
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: