arviz_base.xarray_sel_iter#

arviz_base.xarray_sel_iter(data, var_names=None, combined=None, skip_dims=None, dim_to_idx=None, reverse_selections=False)[source]#

Convert xarray data to an iterator over variable names and selections.

Iterates over each var_name and all of its dimensions, returning the variable names and selections that allow properly obtain the data subsets from data as desired. The iterable returned defines an exhaustive collection of subsets. Both the input object and the selections defined can have any dimensionality, and the selections from each element in the iterable can have different dimensionality between them.

When looping within a dimension, this can be done over the dimension itself or via unique items of explicit indexes for that dimension.

Parameters:
dataxarray.Dataset or xarray.DataArray

Posterior data in an xarray

var_namesiterator of hashable, optional

Should be a subset of data.data_vars. Defaults to all of them.

combinedbool, optional

Whether to combine chains or leave them separate. By default (None), this is ignored and the chain dimension is looped over or skipped based on skip_dims. If set to True/False then skip_dims is modified in order to ensure combining chains or not.

skip_dimsset, optional

Dimensions to not iterate over. Defaults to rcParam data.sample_dims.

dim_to_idxmapping {hashablehashable}, optional

Mapping from dimensions to indexes to loop over these dimensions using unique items in the provided index.

reverse_selectionsbool

Whether to reverse selections before iterating.

Yields:
var_namestr

Variable name to which selection and iselection correspond to.

selectiondict of {hashableany}

Keys are coordinate names and values are scalar coordinate values. To get the values of the variable at these coordinates, do data[var_name].sel(selection) for Dataset or data.sel(selection) for DataArray.

iselectiondict of {hashableany}

Keys are dimension names and values are positional indexes (might not be scalars). To get the values of the variable at these coordinates, do data[var_name].isel(iselection) for Dataset or data.isel(iselection) for DataArray.

See also

xarray_var_iter

Return a similar iterator whose elements also include the selected subset as a DataArray.

Examples

Let’s create a 3d DataArray with dimensions “chain”, “draw” and “obs_dim”.

import xarray as xr
import numpy as np
from arviz_base import xarray_sel_iter
xr.set_options(display_expand_data=False, display_expand_indexes=True)

data = xr.DataArray(
    np.random.default_rng(2).normal(size=(2,3,7)),
    dims=["chain", "draw", "obs_dim"],
    coords={"chain": [1, 2]},
    name="sample"
)
data
<xarray.DataArray 'sample' (chain: 2, draw: 3, obs_dim: 7)> Size: 336B
0.1891 -0.5227 -0.4131 -2.441 1.8 1.144 ... 0.8685 -1.13 -0.4219 0.2429 1.801
Coordinates:
  * chain    (chain) int64 16B 1 2
Dimensions without coordinates: draw, obs_dim

By default, xarray_sel_iter will return an iterable with the subsets that are generated from looping over all dimensions not in rcParams["data.sample_dims"] (the default value for skip_dims). Here, it will be an iterable of length 7, selecting each position in the “obs_dim” dimension:

list(xarray_sel_iter(data))
[('sample', {'obs_dim': 0}, {'obs_dim': 0}),
 ('sample', {'obs_dim': 1}, {'obs_dim': 1}),
 ('sample', {'obs_dim': 2}, {'obs_dim': 2}),
 ('sample', {'obs_dim': 3}, {'obs_dim': 3}),
 ('sample', {'obs_dim': 4}, {'obs_dim': 4}),
 ('sample', {'obs_dim': 5}, {'obs_dim': 5}),
 ('sample', {'obs_dim': 6}, {'obs_dim': 6})]

Here we are using a DataArray, so the first position in each tuple, var_name is always the same and corresponds to its name.

If we want to iterate over each _sample_ (pair of “chain”, “draw” values) we can use skip_dims:

list(xarray_sel_iter(data, skip_dims={"obs_dim"}))
[('sample', {'chain': 1, 'draw': 0}, {'chain': 0, 'draw': 0}),
 ('sample', {'chain': 1, 'draw': 1}, {'chain': 0, 'draw': 1}),
 ('sample', {'chain': 1, 'draw': 2}, {'chain': 0, 'draw': 2}),
 ('sample', {'chain': 2, 'draw': 0}, {'chain': 1, 'draw': 0}),
 ('sample', {'chain': 2, 'draw': 1}, {'chain': 1, 'draw': 1}),
 ('sample', {'chain': 2, 'draw': 2}, {'chain': 1, 'draw': 2})]

Now there are 6 elements, 3 values for “draw” times 2 values for “chain”. Note also how the two returned selections now differ. The _coordinate_ values for “chain” are 1, 2 whereas their corresponding _positions_ are 0, 1.

To go further in the examples, and show the usage of dim_to_idx we need to add some explicit indexes to the DataArray. We do that by adding new coordinates and then setting them as indexes.

data = data.assign_coords(
    {"obs_id": ("obs_dim", np.arange(7)), "label_id": ("obs_dim", list("babacbc"))}
).set_xindex("obs_id").set_xindex("label_id")
data
<xarray.DataArray 'sample' (chain: 2, draw: 3, obs_dim: 7)> Size: 336B
0.1891 -0.5227 -0.4131 -2.441 1.8 1.144 ... 0.8685 -1.13 -0.4219 0.2429 1.801
Coordinates:
  * chain     (chain) int64 16B 1 2
  * obs_id    (obs_dim) int64 56B 0 1 2 3 4 5 6
  * label_id  (obs_dim) <U1 28B 'b' 'a' 'b' 'a' 'c' 'b' 'c'
Dimensions without coordinates: draw, obs_dim

Note that both the “Coordinates” and the “Indexes” sections of the output have been updated. We can now loop over the “obs_dim” dimension by “itself” (like we did in the first example) or using either of these two new indexes. If we use “label_id”, the returned iterator will have length 3, as there are only 3 unique values in “label_id”, a, b, c.

list(xarray_sel_iter(data, dim_to_idx={"obs_dim": "label_id"}))
[('sample', {'label_id': 'b'}, {'obs_dim': array([0, 2, 5])}),
 ('sample', {'label_id': 'a'}, {'obs_dim': array([1, 3])}),
 ('sample', {'label_id': 'c'}, {'obs_dim': array([4, 6])})]

Note that the order of the coordinate values is preserved. Moreover, now not only the values in the selection dict values are different, also their keys. “label_id” is a coordinate+index, but not a dimension, so it can not be used for positional indexing.