The compilation times of RHist instantiations scale exponentially with histogram dimensionality and number of supported RAxis types, leading to very long compile times for higher-dimensional histograms.
To be more precise, given N the dimension of the histogram and A the number of RAxis types supported by ROOT, instantiating the constructor of RHist has a compile-time complexity of O(A^N). That's currently O(3^N), and will grow to O(4^N) once labeled axis support is added. Supporting more axis types in RAxisConfig will similarly increase the base of this power law.
The reason why this is happening is that RHistImpl is templated by the set of axis types (specified as a variadic sequence of template parameters) while RHist and its constructor aren't. Therefore, the RHist constructor must go through a type erasure layer called RHistImplGen which internally instantiates RHistImpl for every possible combination of axis types in order to ultimately dispatch to the right version of RHistImpl via switch statements.
Here are ways we could improve this:
- Reduce the subset of RHistImpl which is templated by the axis configuration to the absolute minimum needed for optimal run-time performance (i.e. Fill + dependencies), move everything else to type-erased axis access (going through RAxisView or RAxisBase).
- Give up on the RAxisConfig mechanism and give RHist a constructor which is templated by concrete axis types, avoiding instantiation of every possible RHistImpl.
Overall, I guess this is another motivation for the currently envisioned refactoring of moving RAxis from a partially variant-based design to a pure inheritance-based design.
By the way, I used clang 9's new -ftime-trace profiling feature to make sense of this, and strongly recommend it to anyone facing compile-time bloat issues. It's super nice.