Details
-
Sub-task
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
master
-
None
-
None
Description
Currently, for each event loop, TDataFrame invokes TInterpreter::Calc for each jitted Filter, each jitted Define, and once for all actions.
For some analyses, however, especially in pyROOT where all filters and defines are jitted, this means invoking the interpreter prohibitively many times (runtimes explode).
A reworking of TDF internals is required to invoke the interpreter once, lazily, right before running the event loop.
Plan of action:
1. separate JitTransformation in JitDefine and JitFilter
2. change jitting of defines so that it also defines an alias for the type of the output column
3. change jitting of filters, defines and actions so that the jitted lambdas use those aliases for the types of input defined columns: these are known even if the lambda that defines the column has not been jitted yet
4. change jitted Filter to return a TInterface<TJittedFilter>, where TJittedFilter is a wrapper around TFilter that acts like TFilter but allows plugging in the real TFilter at a later time
5. make jitting of filters lazy, taking advantage of TJittedFilter. It can be done together with the jitting of actions
6. make jitting of defins lazy – it can be done together with filters and actions