Details
-
Sub-task
-
Status: Closed (View Workflow)
-
High
-
Resolution: Fixed
-
None
-
None
-
None
Description
There are a number of trivial operations that users often want to perform on dataframes that are surprisingly hard to get right, for example adding several `Define`s in a loop or conditionally adding a `Filter` depending on a runtime boolean (both use-cases are challenging in C++, trivial in python).
Difficulties boil down to the fact that different dataframe nodes have different types (because their types incorporate e.g. the type of the callable passed to a `Filter` and the type of their parent node in the computation graph).
I propose to add a common base class ROOT::Detail::RDF::RNodeBase` to all nodes of the graph (except leaves a.k.a results, which have a completely different interface),
so that users can, for example:
- take any dataframe node by reference in non-template functions as `RNode&`
- `emplace_back` dataframe nodes in ~`std::vector<RNode>`~ `vector<RInterface<RNode>>`
- have non-const pointers to dataframe nodes
and so on.
For example, conditionally adding a `Range` do a dataframe now looks like this:
auto maybe_ranged = [&df, mustAddRange]() -> ROOT::RDF::RNode {
|
return mustAddRange ? d.Range(1) : d; |
}();
|
while before this change one would have to add fake `Filter("true")` filters to normalize the return type of the lambda, involving the interpreter for no reason.
Internal `RDataFrame` code is also simplified by the introduction of this common base class.
The only downside I can think of is that if this mechanism is abused users might end up with extra, unnecessary virtual calls in their event loop – on the other hand, this mechanism should only be used in situations that required either complex template magic or dirty and slow tricks before.