arviz_base.xarray_sel_iter#
- arviz_base.xarray_sel_iter(data, var_names=None, combined=None, skip_dims=None, dim_to_idx=None, reverse_selections=False)[source]#
Convert xarray data to an iterator over variable names and selections.
Iterates over each var_name and all of its dimensions, returning the variable names and selections that allow properly obtain the data subsets from
data
as desired. The iterable returned defines an exhaustive collection of subsets. Both the input object and the selections defined can have any dimensionality, and the selections from each element in the iterable can have different dimensionality between them.When looping within a dimension, this can be done over the dimension itself or via unique items of explicit indexes for that dimension.
- Parameters:
- data
xarray.Dataset
orxarray.DataArray
Posterior data in an xarray
- var_namesiterator of hashable, optional
Should be a subset of data.data_vars. Defaults to all of them.
- combinedbool, optional
Whether to combine chains or leave them separate. By default (
None
), this is ignored and the chain dimension is looped over or skipped based on skip_dims. If set toTrue
/False
then skip_dims is modified in order to ensure combining chains or not.- skip_dims
set
, optional Dimensions to not iterate over. Defaults to rcParam
data.sample_dims
.- dim_to_idxmapping {hashablehashable}, optional
Mapping from dimensions to indexes to loop over these dimensions using unique items in the provided index.
- reverse_selectionsbool
Whether to reverse selections before iterating.
- data
- Yields:
- var_name
str
Variable name to which selection and iselection correspond to.
- selection
dict
of {hashableany
} Keys are coordinate names and values are scalar coordinate values. To get the values of the variable at these coordinates, do
data[var_name].sel(selection)
forDataset
ordata.sel(selection)
forDataArray
.- iselection
dict
of {hashableany
} Keys are dimension names and values are positional indexes (might not be scalars). To get the values of the variable at these coordinates, do
data[var_name].isel(iselection)
forDataset
ordata.isel(iselection)
forDataArray
.
- var_name
See also
xarray_var_iter
Return a similar iterator whose elements also include the selected subset as a DataArray.
Examples
Let’s create a 3d
DataArray
with dimensions “chain”, “draw” and “obs_dim”.import xarray as xr import numpy as np from arviz_base import xarray_sel_iter xr.set_options(display_expand_data=False, display_expand_indexes=True) data = xr.DataArray( np.random.default_rng(2).normal(size=(2,3,7)), dims=["chain", "draw", "obs_dim"], coords={"chain": [1, 2]}, name="sample" ) data
<xarray.DataArray 'sample' (chain: 2, draw: 3, obs_dim: 7)> Size: 336B 0.1891 -0.5227 -0.4131 -2.441 1.8 1.144 ... 0.8685 -1.13 -0.4219 0.2429 1.801 Coordinates: * chain (chain) int64 16B 1 2 Dimensions without coordinates: draw, obs_dim
By default,
xarray_sel_iter
will return an iterable with the subsets that are generated from looping over all dimensions not inrcParams["data.sample_dims"]
(the default value for skip_dims). Here, it will be an iterable of length 7, selecting each position in the “obs_dim” dimension:list(xarray_sel_iter(data))
[('sample', {'obs_dim': 0}, {'obs_dim': 0}), ('sample', {'obs_dim': 1}, {'obs_dim': 1}), ('sample', {'obs_dim': 2}, {'obs_dim': 2}), ('sample', {'obs_dim': 3}, {'obs_dim': 3}), ('sample', {'obs_dim': 4}, {'obs_dim': 4}), ('sample', {'obs_dim': 5}, {'obs_dim': 5}), ('sample', {'obs_dim': 6}, {'obs_dim': 6})]
Here we are using a
DataArray
, so the first position in each tuple,var_name
is always the same and corresponds to its name.If we want to iterate over each _sample_ (pair of “chain”, “draw” values) we can use skip_dims:
list(xarray_sel_iter(data, skip_dims={"obs_dim"}))
[('sample', {'chain': 1, 'draw': 0}, {'chain': 0, 'draw': 0}), ('sample', {'chain': 1, 'draw': 1}, {'chain': 0, 'draw': 1}), ('sample', {'chain': 1, 'draw': 2}, {'chain': 0, 'draw': 2}), ('sample', {'chain': 2, 'draw': 0}, {'chain': 1, 'draw': 0}), ('sample', {'chain': 2, 'draw': 1}, {'chain': 1, 'draw': 1}), ('sample', {'chain': 2, 'draw': 2}, {'chain': 1, 'draw': 2})]
Now there are 6 elements, 3 values for “draw” times 2 values for “chain”. Note also how the two returned selections now differ. The _coordinate_ values for “chain” are
1, 2
whereas their corresponding _positions_ are 0, 1.To go further in the examples, and show the usage of dim_to_idx we need to add some explicit indexes to the
DataArray
. We do that by adding new coordinates and then setting them as indexes.data = data.assign_coords( {"obs_id": ("obs_dim", np.arange(7)), "label_id": ("obs_dim", list("babacbc"))} ).set_xindex("obs_id").set_xindex("label_id") data
<xarray.DataArray 'sample' (chain: 2, draw: 3, obs_dim: 7)> Size: 336B 0.1891 -0.5227 -0.4131 -2.441 1.8 1.144 ... 0.8685 -1.13 -0.4219 0.2429 1.801 Coordinates: * chain (chain) int64 16B 1 2 * obs_id (obs_dim) int64 56B 0 1 2 3 4 5 6 * label_id (obs_dim) <U1 28B 'b' 'a' 'b' 'a' 'c' 'b' 'c' Dimensions without coordinates: draw, obs_dim
Note that both the “Coordinates” and the “Indexes” sections of the output have been updated. We can now loop over the “obs_dim” dimension by “itself” (like we did in the first example) or using either of these two new indexes. If we use “label_id”, the returned iterator will have length 3, as there are only 3 unique values in “label_id”,
a, b, c
.list(xarray_sel_iter(data, dim_to_idx={"obs_dim": "label_id"}))
[('sample', {'label_id': 'b'}, {'obs_dim': array([0, 2, 5])}), ('sample', {'label_id': 'a'}, {'obs_dim': array([1, 3])}), ('sample', {'label_id': 'c'}, {'obs_dim': array([4, 6])})]
Note that the order of the coordinate values is preserved. Moreover, now not only the values in the selection dict values are different, also their keys. “label_id” is a coordinate+index, but not a dimension, so it can not be used for positional indexing.