Awkward Arrays of vectors#

First, install and import Vector and Awkward Array.

[1]:
import vector
import awkward as ak

Making an Awkward Array of vectors#

Awkward Arrays are arrays with more complex data structures than NumPy allows, such as variable-length lists, nested records, missing and even heterogeneous data (different data types in the same array).

Vectors can be included among those data structures. In this context, vectors are Awkward “records,” objects with named fields, that can be nested inside of other structures. The vector properties and methods are implemented through Awkward Array’s behavior mechanism. Unlike vector objects and NumPy subclasses, the vectors can’t be ordinary Python classes because they might be nested within other data structures, such as variable-length lists, and these lists are implemented in a columnar way that isn’t open to Python’s introspection.

Let’s start with an example. Below, we create an Awkward Array using its ak.Array constructor, but include with_name and behavior arguments:

[2]:
arr = ak.Array(
    [
        [{"x": 1.1, "y": 2.2}, {"x": 3.3, "y": 4.4}],
        [],
        [{"x": 5.5, "y": 6.6}],
    ],
    with_name="Vector2D",
    behavior=vector.backends.awkward.behavior,
)
arr
[2]:
[[{x: 1.1, y: 2.2}, {x: 3.3, y: 4.4}],
 [],
 [{x: 5.5, y: 6.6}]]
--------------------------------------
type: 3 * var * Vector2D[
    x: float64,
    y: float64
]

The above array contains 3 lists, the first has length 2, the second has length 0, and the third has length 1. The lists contain records with field names "x" and "y", and the record type is named "Vector2D". In addition, this array has behavior from vector.backends.awkward.behavior, which is a large dict containing classes and functions to implement vector operations.

For instance, we can compute rho and phi coordinates in the same way as with the NumPy subclasses, an array at a time:

[3]:
arr.rho
[3]:
[[2.46, 5.5],
 [],
 [8.59]]
-----------------------
type: 3 * var * float64
[4]:
arr.phi
[4]:
[[1.11, 0.927],
 [],
 [0.876]]
-----------------------
type: 3 * var * float64

As with NumPy, performing operations an array at a time is usually much faster than writing Python for loops. What Awkward Array provides on top of that is the ability to do these operations through variable-length lists and other structures.

An Awkward Array needs all of the following for its records to be interpreted as vectors:

  1. the record name, which can be assigned using ak.with_name or as a constructor argument, must be one of "Vector2D", "Momentum2D", "Vector3D", "Momentum3D", "Vector4D", and "Momentum4D"

  2. the field names must be recognized coordinate names, following the same conventions as vector objects

  3. the array must have vector.backends.awkward.behavior as its behavior.

When Awkward Arrays are saved in files, such as with ak.to_parquet, they retain their record names and field names, so conditions 1 and 2 above are persistent. They don’t preserve condition 3, the behaviors, since these are Python classes and functions.

To make sure that Vector behaviors are always available, you can call vector.register_awkward at the beginning of every script, like this:

import awkward as ak
import vector
vector.register_awkward()

This function copies Vector’s behaviors into Awkward’s global ak.behavior so that any array with the right record and field names (such as one read from a file) automatically have Vector behaviors.

Vector also has a vector.Array constructor, which works like ak.Array but sets with_name automatically, as well as vector.zip, which works like ak.zip and sets with_name automatically. However, these functions still require you to set field names appropriately and if you need to do something complex, it’s easier to use Awkward Array’s own functions and assign the record name after the array is built, using ak.with_name.

Using an Awkward array of vectors#

First, let’s make some arrays to use in examples:

[5]:
import numpy as np
import awkward as ak
import vector

vector.register_awkward()
[6]:
def array_of_momentum3d(num_vectors):
    return ak.zip(
        {
            "px": np.random.normal(0, 1, num_vectors),
            "py": np.random.normal(0, 1, num_vectors),
            "pz": np.random.normal(0, 1, num_vectors),
        },
        with_name="Momentum3D",
    )


def array_of_lists_of_momentum3d(mean_num_per_list, num_lists):
    num_per_list = np.random.poisson(mean_num_per_list, num_lists)
    return ak.unflatten(
        array_of_momentum3d(np.sum(num_per_list)),
        num_per_list,
    )


a = array_of_momentum3d(10)
b = array_of_lists_of_momentum3d(1.5, 10)
[7]:
a
[7]:
[{px: 0.585, py: -1.23, pz: 0.198},
 {px: 0.0287, py: 1.03, pz: 1.12},
 {px: 0.348, py: 2.5, pz: 3.49},
 {px: 0.27, py: 1.08, pz: 0.149},
 {px: 0.715, py: 0.228, pz: -0.251},
 {px: 0.0405, py: 0.79, pz: 1.39},
 {px: -1.24, py: -0.574, pz: -0.312},
 {px: 0.0892, py: -0.038, pz: -0.428},
 {px: 1.1, py: -0.237, pz: 1.81},
 {px: -0.849, py: -0.613, pz: -1.24}]
--------------------------------------
type: 10 * Momentum3D[
    px: float64,
    py: float64,
    pz: float64
]
[8]:
b
[8]:
[[],
 [{px: -1.05, py: -2.07, pz: -0.611}, {px: 0.588, py: -0.826, ...}],
 [{px: 0.167, py: -0.77, pz: -1.14}, {px: 0.649, py: -0.446, ...}],
 [],
 [{px: 0.559, py: -0.517, pz: -1.25}],
 [{px: 1.23, py: 0.333, pz: 2.27}],
 [{px: 1.07, py: 1.52, pz: 1.41}, ..., {px: -0.263, py: -0.755, pz: 1.01}],
 [{px: -0.0202, py: -0.264, pz: -0.905}],
 [{px: -0.363, py: -0.737, pz: -0.0312}, ..., {px: 0.477, py: -0.00146, ...}],
 [{px: -0.987, py: -0.703, pz: -0.414}]]
------------------------------------------------------------------------------
type: 10 * var * Momentum3D[
    px: float64,
    py: float64,
    pz: float64
]

Awkward Array uses array-at-a-time functions like NumPy, so if we want to compute dot products of each vector in a with every vector of each list in b, we’d say:

[9]:
a.dot(b)
[9]:
[[],
 [-2.83, -2.44],
 [-5.85, 0.928],
 [],
 [0.596],
 [3.46],
 [-2.64, 0.215, 1.37, 0.445],
 [0.395],
 [-0.281, -2.01, 1.25],
 [1.78]]
-----------------------------
type: 10 * var * float64

Note that a and b have different numbers of vectors, but the same array lengths. The operation above broadcasts array a into b, like the following code:

[10]:
for i in range(len(a)):
    print("[", end="")

    for j in range(len(b[i])):
        out = a[i].dot(b[i, j])

        print(out, end=" ")

    print("]")
[]
[-2.833228962677397 -2.440420426614325 ]
[-5.8533879315268695 0.9278622045448855 ]
[]
[0.5955225435577661 ]
[3.4581243277777727 ]
[-2.6409970055321192 0.21548192577104403 1.3710478554473153 0.4453391894898468 ]
[0.39535925961287455 ]
[-0.28085928194857784 -2.007933327389994 1.247929350704072 ]
[1.7841371709739775 ]

Like NumPy, the array-at-a-time expression is more concise and faster:

[11]:
a = array_of_momentum3d(10000)
b = array_of_lists_of_momentum3d(1.5, 10000)
[12]:
%%timeit -n1 -r1

out = np.zeros(10000)

for i in range(len(a)):
    for j in range(len(b[i])):
        out[i] += a[i].dot(b[i, j])
5.6 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
[13]:
%%timeit

out = np.sum(a.dot(b), axis=1)
1.36 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

(Note the units.)

Just as with NumPy, all of the coordinate transformations and vector operations are implemented for Awkward Arrays of vectors.

Some troubleshooting hints#

Make sure that the Vector behaviors are actually installed and applied to your data. In the data type, the record type should appear as "Vector2D", "Momentum2D", "Vector3D", "Momentum3D", "Vector4D", or "Momentum4D", rather than the generic curly brackets { and }, and if you extract one record from the array, can you perform a vector operation on it?

Make sure that your arrays broadcast the way that you want them to. If the vector behaviors are clouding the picture, make simpler arrays with numbers in place of records. Can you add them with +? (Addition uses the same broadcasting rules as all other operations.)

If your code runs but doesn’t give the results you expect, try slicing the arrays to just the first two items with arr[:2]. Step through the calculation on just two elements, observing the results of each operation. Are they what you expect?

Advanced: subclassing Awkward-Vector behaviors#

It is possible to write subclasses for Awkward-Vector behaviors as mixins to extend the vector functionalities. For instance, the MomentumAwkward classes can be extended in the following way:

[14]:
behavior = vector.backends.awkward.behavior


@ak.mixin_class(behavior)
class TwoVector(vector.backends.awkward.MomentumAwkward2D):
    pass


@ak.mixin_class(behavior)
class ThreeVector(vector.backends.awkward.MomentumAwkward3D):
    pass


# required for transforming vectors
# the class names must always end with "Array"
TwoVectorArray.ProjectionClass2D = TwoVectorArray  # noqa: F821
TwoVectorArray.ProjectionClass3D = ThreeVectorArray  # noqa: F821
TwoVectorArray.MomentumClass = TwoVectorArray  # noqa: F821
# populate properties to the level of a record
TwoVectorRecord.ProjectionClass2D = TwoVectorRecord  # noqa: F821
TwoVectorRecord.ProjectionClass3D = ThreeVectorRecord  # noqa: F821
TwoVectorRecord.MomentumClass = TwoVectorRecord  # noqa: F821

ThreeVectorArray.ProjectionClass2D = TwoVectorArray  # noqa: F821
ThreeVectorArray.ProjectionClass3D = ThreeVectorArray  # noqa: F821
ThreeVectorArray.MomentumClass = ThreeVectorArray  # noqa: F821
ThreeVectorRecord.ProjectionClass2D = TwoVectorRecord  # noqa: F821
ThreeVectorRecord.ProjectionClass3D = ThreeVectorRecord  # noqa: F821
ThreeVectorRecord.MomentumClass = ThreeVectorRecord  # noqa: F821
[15]:
vec = ak.zip(
    {
        "pt": [[1, 2], [], [3], [4]],
        "phi": [[1.2, 1.4], [], [1.6], [3.4]],
    },
    with_name="TwoVector",
    behavior=behavior,
)
vec
[15]:
[[{pt: 1, phi: 1.2}, {pt: 2, phi: 1.4}],
 [],
 [{pt: 3, phi: 1.6}],
 [{pt: 4, phi: 3.4}]]
----------------------------------------
type: 4 * var * TwoVector[
    pt: int64,
    phi: float64
]

The binary operators are not automatically registered by Awkward, but Vector methods can be used to perform operations on subclassed vectors.

[16]:
vec.add(vec)
[16]:
[[{rho: 2, phi: 1.2}, {rho: 4, phi: 1.4}],
 [],
 [{rho: 6, phi: 1.6}],
 [{rho: 8, phi: -2.88}]]
------------------------------------------
type: 4 * var * TwoVector[
    rho: float64,
    phi: float64
]

Similarly, other vector methods can be used by the new methods internally.

[17]:
import numbers
[18]:
@ak.mixin_class(behavior)
class LorentzVector(vector.backends.awkward.MomentumAwkward4D):
    @ak.mixin_class_method(np.divide, {numbers.Number})
    def divide(self, factor):
        return self.scale(1 / factor)


# required for transforming vectors
# the class names must always end with "Array"
LorentzVectorArray.ProjectionClass2D = TwoVectorArray  # noqa: F821
LorentzVectorArray.ProjectionClass3D = ThreeVectorArray  # noqa: F821
LorentzVectorArray.ProjectionClass4D = LorentzVectorArray  # noqa: F821
LorentzVectorArray.MomentumClass = LorentzVectorArray  # noqa: F821
# populate properties to the level of a record
LorentzVectorRecord.ProjectionClass2D = TwoVectorRecord  # noqa: F821
LorentzVectorRecord.ProjectionClass3D = ThreeVectorRecord  # noqa: F821
LorentzVectorRecord.ProjectionClass4D = LorentzVectorRecord  # noqa: F821
LorentzVectorRecord.MomentumClass = LorentzVectorRecord  # noqa: F821
[19]:
vec = ak.zip(
    {
        "pt": [[1, 2], [], [3], [4]],
        "eta": [[1.2, 1.4], [], [1.6], [3.4]],
        "phi": [[0.3, 0.4], [], [0.5], [0.6]],
        "energy": [[50, 51], [], [52], [60]],
    },
    with_name="LorentzVector",
    behavior=behavior,
)
vec
[19]:
[[{pt: 1, eta: 1.2, phi: 0.3, energy: 50}, {pt: 2, eta: 1.4, ...}],
 [],
 [{pt: 3, eta: 1.6, phi: 0.5, energy: 52}],
 [{pt: 4, eta: 3.4, phi: 0.6, energy: 60}]]
-------------------------------------------------------------------
type: 4 * var * LorentzVector[
    pt: int64,
    eta: float64,
    phi: float64,
    energy: int64
]
[20]:
vec / 2
[20]:
[[{rho: 0.5, phi: 0.3, eta: 1.2, t: 25}, {rho: 1, phi: 0.4, ...}],
 [],
 [{rho: 1.5, phi: 0.5, eta: 1.6, t: 26}],
 [{rho: 2, phi: 0.6, eta: 3.4, t: 30}]]
------------------------------------------------------------------
type: 4 * var * LorentzVector[
    rho: float64,
    phi: float64,
    eta: float64,
    t: float64
]
[21]:
vec.like(vector.obj(x=1, y=2))
[21]:
[[{rho: 1, phi: 0.3}, {rho: 2, phi: 0.4}],
 [],
 [{rho: 3, phi: 0.5}],
 [{rho: 4, phi: 0.6}]]
------------------------------------------
type: 4 * var * TwoVector[
    rho: int64,
    phi: float64
]
[22]:
vec.like(vector.obj(x=1, y=2, z=3))
[22]:
[[{rho: 1, phi: 0.3, eta: 1.2}, {rho: 2, phi: 0.4, eta: 1.4}],
 [],
 [{rho: 3, phi: 0.5, eta: 1.6}],
 [{rho: 4, phi: 0.6, eta: 3.4}]]
--------------------------------------------------------------
type: 4 * var * ThreeVector[
    rho: int64,
    phi: float64,
    eta: float64
]

It is also possible to manually add binary operations in vector’s behavior dict to enable binary operations.

[23]:
_binary_dispatch_cls = {
    "TwoVector": TwoVector,
    "ThreeVector": ThreeVector,
    "LorentzVector": LorentzVector,
}
_rank = [TwoVector, ThreeVector, LorentzVector]

for lhs, lhs_to in _binary_dispatch_cls.items():
    for rhs, rhs_to in _binary_dispatch_cls.items():
        out_to = min(lhs_to, rhs_to, key=_rank.index)
        behavior[(np.add, lhs, rhs)] = out_to.add
        behavior[(np.subtract, lhs, rhs)] = out_to.subtract
[24]:
vec + vec
[24]:
[[{rho: 2, phi: 0.3, eta: 1.2, t: 100}, {rho: 4, phi: 0.4, eta: 1.4, ...}],
 [],
 [{rho: 6, phi: 0.5, eta: 1.6, t: 104}],
 [{rho: 8, phi: 0.6, eta: 3.4, t: 120}]]
---------------------------------------------------------------------------
type: 4 * var * LorentzVector[
    rho: float64,
    phi: float64,
    eta: float64,
    t: int64
]
[25]:
vec.to_2D() + vec.to_2D()
[25]:
[[{rho: 2, phi: 0.3}, {rho: 4, phi: 0.4}],
 [],
 [{rho: 6, phi: 0.5}],
 [{rho: 8, phi: 0.6}]]
------------------------------------------
type: 4 * var * TwoVector[
    rho: float64,
    phi: float64
]

Finally, instead of manually registering the superclass ufuncs, one can use the utility copy_behaviors function to copy behavior items for a new subclass -

[26]:
behavior.update(ak._util.copy_behaviors("Vector2D", "TwoVector", behavior))
behavior.update(ak._util.copy_behaviors("Vector3D", "ThreeVector", behavior))
behavior.update(ak._util.copy_behaviors("Momentum4D", "LorentzVector", behavior))
[27]:
vec + vec
[27]:
[[{rho: 2, phi: 0.3, eta: 1.2, t: 100}, {rho: 4, phi: 0.4, eta: 1.4, ...}],
 [],
 [{rho: 6, phi: 0.5, eta: 1.6, t: 104}],
 [{rho: 8, phi: 0.6, eta: 3.4, t: 120}]]
---------------------------------------------------------------------------
type: 4 * var * Momentum4D[
    rho: float64,
    phi: float64,
    eta: float64,
    t: int64
]
[28]:
vec.to_2D() + vec.to_2D()
[28]:
[[{rho: 2, phi: 0.3}, {rho: 4, phi: 0.4}],
 [],
 [{rho: 6, phi: 0.5}],
 [{rho: 8, phi: 0.6}]]
------------------------------------------
type: 4 * var * Vector2D[
    rho: float64,
    phi: float64
]