Compiling functions on vectors with Numba#

First, install and import Vector and Numba.

[1]:

import vector
import numba as nb
import numpy as np

Numba is a just-in-time (JIT) compiler for a mathematically relevant subset of NumPy and Python. It allows you to write fast code without leaving the Python environment. The drawback of Numba is that it can only compile code blocks involving objects and functions that it recognizes.

The Vector library includes extensions to inform Numba about vector objects, arrays of vectors, and arrays of Awkward Arrays. At the time of writing, the implementation of vector NumPy arrays is incomplete (see issue #43).

Consider the following function:

[2]:

def compute_mass(v1, v2):
    return (v1 + v2).mass

[3]:

%timeit compute_mass(vector.obj(px=1, py=2, pz=3, E=4), vector.obj(px=-1, py=-2, pz=-3, E=4))

18.3 μs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

[4]:

@nb.njit
def compute_mass(v1, v2):
    return (v1 + v2).mass

[5]:

%timeit -n 1 compute_mass(vector.obj(px=1, py=2, pz=3, E=4), vector.obj(px=-1, py=-2, pz=-3, E=4))

The slowest run took 27658.64 times longer than the fastest. This could mean that an intermediate result is being cached.
93 ms ± 227 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

When the two MomentumObject4D objects are passed as arguments, Numba recognizes them and replaces the Python objects with low-level structs. When it compiles the function, it recognizes + as the 4D add function and recognizes .mass as the tau component of the result.

Although this demonstrates that Numba can manipulate vector objects, there is no performance advantage (and a likely disadvantage) to compiling a calculation on just a few vectors. The @nb.njit result showcases this behavior, where the run (the actual just-in-time compilation + the run) takes much longer to run that the non-JIT version.

Once the function has been JIT-compiled, the subsequent runs are comparable to the non-JIT version, but is no performance advantage on just a few vectors.

[6]:

%timeit compute_mass(vector.obj(px=1, py=2, pz=3, E=4), vector.obj(px=-1, py=-2, pz=-3, E=4))

24.1 μs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

However, the real advantage comes when many vectors are involved, in arrays.

[7]:

# This is still not a large number. You want millions.
array = vector.Array(
    [
        [
            dict(
                {x: np.random.normal(0, 1) for x in ("px", "py", "pz")},
                E=np.random.normal(10, 1),
            )
            for inner in range(np.random.poisson(1.5))
        ]
        for outer in range(50)
    ]
)
array

[7]:

[[{x: -1.47, y: -0.254, z: 1.44, t: 11}, ..., {x: -0.96, y: -0.5, z: ..., ...}],
 [{x: 0.46, y: -1.97, z: -0.815, t: 9.54}, ..., {x: 0.399, y: 1.36, ...}],
 [],
 [{x: -0.484, y: 1.29, z: -0.875, t: 9.92}, {x: 1.05, y: -1.39, ...}],
 [{x: -0.441, y: -0.38, z: -1.99, t: 10.2}],
 [{x: -0.974, y: -0.324, z: 0.0935, t: 8.76}, {x: 0.989, y: 0.577, ...}],
 [{x: -2.34, y: 0.919, z: 2.36, t: 10.6}],
 [{x: 0.109, y: 0.589, z: -0.555, t: 11.3}, ..., {x: 0.681, y: -0.827, ...}],
 [{x: 0.716, y: 0.122, z: 0.21, t: 10}, ..., {x: -1.41, y: -1.53, z: ..., ...}],
 [],
 ...,
 [{x: 1.51, y: -1.67, z: 0.00979, t: 10.5}, ..., {x: -0.347, y: -0.968, ...}],
 [{x: 0.765, y: -0.107, z: 0.0998, t: 10.2}],
 [{x: 1.96, y: -0.0284, z: -0.204, t: 8.2}, {x: 0.411, y: -1.67, ...}],
 [{x: 0.398, y: 0.242, z: -0.609, t: 8.86}],
 [{x: -0.429, y: 1.23, z: 1.44, t: 10.1}],
 [{x: -0.52, y: -0.614, z: 0.586, t: 10.6}, ..., {x: -0.732, y: 0.394, ...}],
 [{x: -0.21, y: -0.225, z: 0.503, t: 11.6}],
 [],
 [{x: -1.63, y: 0.324, z: 0.701, t: 9.05}, {x: -0.000179, y: -0.59, ...}]]
---------------------------------------------------------------------------------------------
backend: cpu
nbytes: 3.0 kB
type: 50 * var * Momentum4D[
    x: float64,
    y: float64,
    z: float64,
    t: float64
]

[8]:

def compute_masses(array):
    out = np.empty(len(array), np.float64)
    for i, event in enumerate(array):
        total = vector.obj(px=0.0, py=0.0, pz=0.0, E=0.0)
        for vec in event:
            total = total + vec
        out[i] = total.mass
    return out

[9]:

%timeit compute_masses(array)

98.5 ms ± 305 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

[10]:

@nb.njit
def compute_masses(array):
    out = np.empty(len(array), np.float64)
    for i, event in enumerate(array):
        total = vector.obj(px=0.0, py=0.0, pz=0.0, E=0.0)
        for vec in event:
            total = total + vec
        out[i] = total.mass
    return out

[11]:

%timeit -n 1 compute_masses(array)

The slowest run took 2514.01 times longer than the fastest. This could mean that an intermediate result is being cached.
69.1 ms ± 169 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This time, given that the function operates on a large array, the subsequent runs are much faster (by a considerable factor) than the non-JIT version runs.

[12]:

%timeit compute_masses(array)

194 μs ± 2.57 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

[ ]: