Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-8855 Upgrade of TDataFrame for 6.12/00 release
  3. ROOT-8879

Consistent, unified selection of column names in TDF



    • Sub-task
    • Status: Closed (View Workflow)
    • High
    • Resolution: Fixed
    • None
    • None
    • None
    • None


      There are currently a few different code paths that we follow to select the names of the columns that each transformation/action will work on. In some cases, when the user does not specify a list of names, we fall back to the full default set. In other cases we incrementally pop from the full default set until we have enough branches. This inconsistency is a consequence of the organic growth of the codebase. A common design and general refactoring is required.

      Streamlining this logic would also greatly help with ROOT-8873 (detection of invalid branch names) and ROOT-8857 (generalised data sources).

      The methods we have

      • `TLoopManager::GetDefaultBranches`
        • return list of branch names specified at TDataFrame construction time
      • `GetUsedBranchNames`
        • match c++ expression (as string) against supplied names of branches,
        • return branches that are used in the expression
      • `GetDefaultBranchNames(nExpectedBranches, actionNameForErr)`
        • return the first nExpectedBranches from TLoopManager::GetDefaultBranches, or throw an exception
      • `template<T1,T2,T3,T4> GetBranchNames(branchNames, actionNameForErr)`
        • evaluate neededBranches: 1 + number of non-void T2,T3,T4
        • evaluate providedBranches: number of non-empty branchNames if provided == needed -> return branchNames else return GetDefaultBranchNames(needed, actionNameForErr)
      • `PickBranchNames(nArgs, bl, defBl)`
        • binary decision between using the default set of branchNames or the provided set of branchNames bl (nArgs is a bit like "neededBranches" in GetBranchNames)

      Refactoring proposal

      • rename GetUsedBranchNames -> FindUsedBranchNames
      • rename TLoopManager::GetDefaultBranches -> TLoopManager::GetDefaultBranchNames (add functionality for retrieving only first N branches)
      • delete GetDefaultBranchNames, TLoopManager::GetDefaultBranchNames is enough
      • delete PickBranchNames, use TLoopManager::GetDefaultBranchNames instead
      • I don't think we need GetBranchNames<T1,T2,T3,T4>


        Issue Links



              eguiraud Enrico Guiraud
              eguiraud Enrico Guiraud
              0 Vote for this issue
              0 Start watching this issue


                Actual End: