Uploaded image for project: 'ROOT'
  1. ROOT
  2. ROOT-10074

Use BulkIO to speed up TTree.AsMatrix

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Open (View Workflow)
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: 6.18/00
    • Fix Version/s: None
    • Component/s: I/O, PyROOT
    • Labels:
      None
    • Environment:

      Python with Numpy

       

      Description

      Currently, the Pythonization `TTree.AsMatrix` turns (a subset of) TTree branches into a 2-dimensional Numpy array by iterating over entries.

      BulkIO was/is being introduced in version 17/01, and the first thing BulkIO can be used for is filling arrays in `TTree.AsMatrix`. Keeping the interface unchanged, its `GetEntry`-based implementation can be replaced with one based on BulkIO's `GetEntriesSerialized`. Users of `AsMatrix` wouldn't have to change any code, but it would run much faster. (Should be considerably faster than uproot, especially for a large number of baskets.)

      Given that the output Numpy arrays represent data for more than one basket, the implementation can't be zero-copy. However, a `memcpy` and byteswap-in-place should be much faster than `GetEntry` for every entry. Rough expectation for "much faster": at least a factor of 10, at most a factor of 40, on uncompressed data and slightly less for LZ4.

        Attachments

          Activity

            People

            • Assignee:
              pcanal Philippe Canal
              Reporter:
              pivarski Jim Pivarski
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                PlannedEnd:
                PlannedStart:

                Time Tracking

                Estimated:
                Original Estimate - 3 days
                3d
                Remaining:
                Remaining Estimate - 3 days
                3d
                Logged:
                Time Spent - Not Specified
                Not Specified