Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-9783 RDataFrame-related tickets
  3. ROOT-9225

Implement "Explode" transformation



    • Sub-task
    • Status: Reopened (View Workflow)
    • Medium
    • Resolution: Unresolved
    • None
    • None
    • RDataFrame
    • None


      Presently, with cache and snapshot, one can obtain from a TDF a second TDF which has at most the same number of rows (actually the same rows, modulo Defined quantities).
      Sometimes, especially to avoid nested looping, it can be useful to "explode" N rows into M, for example "unpacking" individual elements of collections stored in a column.

      This can be achieved lazily with the elements we have at disposal already: Take, TResultProxy and TDataSource.

      The idea is to:

      • Implement a method "Explode"
      • Which internally "takes" one or more columns which hold collections (provided that the cumulative number of elements of all collections held is the same)
      • Initialises a TLazyDS (temp name!) which just presents lazily the "taken" content as TDS columns
      • Initialises a TDF out of the TLazyDS source

      One would then be able to do something like:

      TDataFrame d("t","myfile.root");
      auto jets = d.Explode("jet"); // this is a tdf wih all jets
      auto jetPt = jets.Define("Pt","jet.Pt()").Histo1D("Pt");
      auto tracks = jets.Define("tracks","jets.GetTrack()").Explode("tracks"); // and here we get all tracks..
      auto tracksPt = tracks.Define("Pt","tracks.Pt()").Histo1D("Pt");
      tracksPt->Draw(); // this actually triggers everything. The event loop on d and the evt loop of jets and on tracks, in "waves" and lazily!


        Issue Links



              eguiraud Enrico Guiraud
              dpiparo Danilo Piparo
              2 Vote for this issue
              5 Start watching this issue