Converting emcee objects to DataTree#

DataTree is the data format ArviZ relies on.

This page covers multiple ways to generate a DataTree from emcee objects.

See also

Conversion from Python, numpy or pandas objects
xarray_for_arviz for an overview of InferenceData and its role within ArviZ.
schema describes the structure of InferenceData objects and the assumptions made by ArviZ to ease your exploratory analysis of Bayesian models.

We will start by importing the required packages and defining the model. The famous 8 school model.

import arviz_base as az
import numpy as np
import emcee

J = 8
y_obs = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

def log_prior_8school(theta):
    mu, tau, eta = theta[0], theta[1], theta[2:]
    # Half-cauchy prior, hwhm=25
    if tau < 0:
        return -np.inf
    prior_tau = -np.log(tau**2 + 25**2)
    prior_mu = -((mu / 10) ** 2)  # normal prior, loc=0, scale=10
    prior_eta = -np.sum(eta**2)  # normal prior, loc=0, scale=1
    return prior_mu + prior_tau + prior_eta


def log_likelihood_8school(theta, y, s):
    mu, tau, eta = theta[0], theta[1], theta[2:]
    return -(((mu + tau * eta - y) / s) ** 2)


def lnprob_8school(theta, y, s):
    prior = log_prior_8school(theta)
    like_vect = log_likelihood_8school(theta, y, s)
    like = np.sum(like_vect)
    return like + prior

nwalkers = 40  # called chains in ArviZ
ndim = J + 2
draws = 1500
pos = np.random.normal(size=(nwalkers, ndim))
pos[:, 1] = np.absolute(pos[:, 1])
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob_8school, args=(y_obs, sigma))
sampler.run_mcmc(pos, draws);

Structuring the posterior as multidimensional variables#

This way of calling from_emcee stores each eta as a different variable, called eta#, however, they are in fact different dimensions of the same variable. This can be seen in the code of the likelihood and prior functions, where theta is unpacked as:

mu, tau, eta = theta[0], theta[1], theta[2:]

ArviZ has support for multidimensional variables, and there is a way to tell it how to split the variables like it was done in the likelihood and prior functions:

After checking the default variable names, the trace of one dimension of eta can be plotted using ArviZ syntax:

#az.plot_trace(idata2, var_names=["var_2"], coords={"var_2_dim_0": 4});

`blobs`: unlock sample stats, posterior predictive and miscellanea#

Emcee does not store per-draw sample stats, however, it has a functionality called blobs that allows to store any variable on a per-draw basis. It can be used to store some sample_stats or even posterior_predictive data.

You can modify the probability function to use this blobs functionality and store the pointwise log likelihood, then rerun the sampler using the new function:

def lnprob_8school_blobs(theta, y, s):
    prior = log_prior_8school(theta)
    like_vect = log_likelihood_8school(theta, y, s)
    like = np.sum(like_vect)
    return like + prior, like_vect


sampler_blobs = emcee.EnsembleSampler(
    nwalkers,
    ndim,
    lnprob_8school_blobs,
    args=(y_obs, sigma),
)
sampler_blobs.run_mcmc(pos, draws);

You can now use the blob_names argument to indicate how to store this blob-defined variable. As the group is not specified, it will go to sample_stats. Note that the argument blob_names is added to the arguments covered in the previous examples and we are also introducing coords and dims arguments to show the power and flexibility of the converter. For more on coords and dims see page_in_construction.

Converting emcee objects to DataTree#

Manually set variable names#

Structuring the posterior as multidimensional variables#

`blobs`: unlock sample stats, posterior predictive and miscellanea#

Multi-group blobs#

Converting emcee objects to DataTree#

Manually set variable names#

Structuring the posterior as multidimensional variables#

blobs: unlock sample stats, posterior predictive and miscellanea#

Multi-group blobs#

`blobs`: unlock sample stats, posterior predictive and miscellanea#