-
Type:
Improvement
-
Status: Closed (View Workflow)
-
Priority:
High
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 6.22/00
-
Component/s: RDataFrame
-
Labels:None
-
Development:
The reproducer below by Stefan takes 28 seconds to run on my workstation, 21 of which are spent in RLoopManager::Jit. In particular, this is a usecase with multiple separate computation graphs and therefore multiple separate calls to the interpreter. The first RDF takes 2 seconds to just-in-time compile what it needs for its event-loop, and the others about 1 second each.
As suggested during a PPP meeting, since most of the jitted expressions in fact contain the same logic, possibly applied to different columns, it might be beneficial to jit a function that implements that logic once, and re-use it as many times as needed: this avoids multiple separate template instantiations of helper types that, at the end of the day, are completely equivalent (and we know are expensive to instantiate thanks to clang's -ftime-trace).
#include "ROOT/RDataFrame.hxx"
|
#include <vector>
|
#include <string>
|
#include <sstream>
|
|
int main() { |
// Create N dataframes |
// File: root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/GamGam/MC/mc_343981.ggH125_gamgam.GamGam.root |
const int num_df = 20; |
std::vector<ROOT::RDF::RNode> df;
|
for (auto i = 0; i < num_df; i++) { |
df.emplace_back(ROOT::RDataFrame("mini", "mc_343981.ggH125_gamgam.GamGam.root").Range(1)); |
}
|
|
// Add columns to the jitting via dummy defines |
const std::vector<std::string> columns = {"scaleFactor_PHOTON", "scaleFactor_PhotonTRIGGER", "scaleFactor_PILEUP", "mcWeight", "photon_pt", "photon_eta", "photon_phi", "photon_E"}; |
const int num_cols = 8; |
for (auto i = 0; i < num_df; i++) { |
for (auto j = 0; j < num_cols; j++) { |
std::stringstream ss;
|
ss << "x" << j; |
df[i] = df[i].Define(ss.str(), columns[j]);
|
}
|
}
|
|
// Make histogram of a single column (still, all column definitions are jitted) |
for (auto i = 0; i < num_df; i++) { |
df[i].Histo1D("x0").GetValue(); |
}
|
}
|