Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-8855 Upgrade of TDataFrame for 6.12/00 release
  3. ROOT-8857

Generalise the source of data for the TDataFrame



    • Sub-task
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Completed
    • None
    • None
    • None
    • None


      It is becoming more and more important to guarantee the ability to read formats different from the ROOT one with ROOT. Examples could be: xAOD, csv, json, sql, parquet, pandas, xls, arrow (nicely supporting C++).
      TDataFrame can be the natural place where this happens. Some requirements of the implementation:

      • It must be general, i.e. open to new sources types
      • It must guarantee good behaviour in MT mode
      • It must leave the TDF interface as clean as it is now, if not cleaner

      The challenges for the implementation, besides the formulation of the functionality in terms of interfaces, is the in memory representation of the data. This can be prototyped and investigated without diving into one or two data formats but rather thinking about a general solution.

      A possible path towards the concrete implementation of the source could be:

      • Define a callback mechanism. If a column name is requested by an action or a transformation, before signalling that it is not known, either as branch or temporary column, a callback can be invoked to create it.
      • This callback describes how to create the value. We can decide in principle to provide a few of those and duly document how a user can design her own
      • The callback is transparently invoked via a Define. Therefore if, for example, a Filter which has in input column "a", column "a" does not exist, a callback is defined, there is an attempt to create column "a", if successful the system transparently replaces the call to "Filter(myFilter, {"a"})" with Define ("a", mycallback("a")).Filter(myFilter, {"a"}

        ). The returned type is the same, so the entire procedure is transparent at the surface.

      • The callback can be a class, of which we may provide a skeleton following the CRTP (no virtual calls, 1 level of inheritance only). This class may have two methods:
      • GetNEvents()
      • GetColumnValue(...)


        Issue Links



              eguiraud Enrico Guiraud
              dpiparo Danilo Piparo
              0 Vote for this issue
              3 Start watching this issue


                Actual Start: