User Guide#

Overview#

This user guide provides a step-by-step walkthrough of how to use curepy to perform retrievals. It covers the process of instantiating a RetrievalInput, setting up the necessary components such as measurements, priors, and ancillary parameters, and then running a retrieval using a chosen retrieval method. Finally, it explains how to access and interpret the results of the retrieval. A full example of a retrieval can be found in the curepy example notebook, which is available here and in the examples directory of the CoMet website.

Instantiating a Retrieval Input#

Every retrieval method requires a RetrievalInput object to run a retrieval:

from curepy.retrieval_methods.retrieval_input import RetrievalInput
inp = RetrievalInput()

This object can be instatiated using a combination of containers or by using the RetrievalInput().build_retrieval_inputs() function. Individual containers can also be built within the RetrievalInput object using the individual ‘build’ functions:

inp.build_measurement()
inp.build_prior()
inp.build_ancillary()
inp.build_measurement_function()

These functions build the Measurement, Prior, AncillaryParameter, and MeasurementFunction objects, respectively.

Every retrieval method requires a Measurement and MeasurementFunction input to be set in the RetrievalInput. Prior and AncillaryParameter objects are optional.

Measurement#

The Measurement object stores the measurements, \(y\), and any related uncertainty and correlation information:

data = xr.open_dataset("my_data.nc")['measurements'].values
inp.build_measurement(y = data,
                    u_y_total = data*0.05,
                    corr_y = 'syst'
                    )

The measurement uncertainties can be set using u_y_total and corr_y (which can be a string indicating random - ‘rand’ - or systematic -‘syst’- correlation, or a custom correlation matrix), or they can be set using u_y_rand and u_y_syst if separated into random and systematic components.

MeasurementFunction#

The MeasurementFunction object stores the measurement function, \(f()\) and an initial guess for the values of \(x\). There is also an optional Boolean input multiple_guess_measurements, if False, the initial guess input is a valid input to the measurement function, if True, the initial guess input is made up of multiple valid inputs to the measurement function joined along the first dimension. By default, this is set to False:

def meas_func(x, b1, b2):
    return b1*x + b2

inp.build_measurement_function(measurement_func = meas_func,
                            initial_guess = [np.array(5)],
                            )

Note

If the measurement function contains multiple \(x\) to retrieve, the initial guess should be a list where each entry is a guess of the state variable, \(x_{i}\). These \(x_{i}\) can be arrays or floats/ints, the only requirement being the shape must be the same as the expected input to the measurement function.

Prior#

The Prior object stores information used to define the prior distribution. The inputs are prior_shape, a List of the shapes of each prior, prior_params, a List of Dictionaries of each priors’ parameters, and an optional prior_correlation, a correlation matrix describing the correlation between each prior distribution. The length of prior_shape, prior_params, and the side length of prior_correlation must be equal to the number of components in \(\underline{x}\):

inp.build_prior(prior_shape = ['uniform'],
                prior_params = {'minimumn': -10, 'maximum': 10}
                )

Note

If prior_correlation is not defined, it is set to random as default (the identity matrix).

The table of valid prior shapes and associated parameters can be found below.

Shape

Parameters

uniform

  • minimum

  • maximum

normal

  • mu (mean)

  • sigma (standard deviation)

Note

If no Prior object is set within the RetrievalInput, all prior distributions are set to be uniform with minimum \(-\infty\) and maximum \(\infty\).

AncillaryParameter#

The AncillaryParameter object stores the ancillary parameters, \(b\), of the measurement function, with associated uncertainty and correlation information. b, u_b, and corr_b should all be input as Lists of length equal to the number of ancillary inputs to the measurement function, then corr_between_b should be a square matrix with side length equal to the number of ancillary inputs to the measurement function. If the MCMC retrieval method is being used, the kwargs b_MC_steps and b_samples can be set. b_MC_steps is an integer defining the number of MC samples of the ancillary parameters to be drawn, and b_samples is an optional array that can be given instead of drawing an MC sample:

inp.build_ancillary(b = [0.5, 10],
                    u_b = [0.001, 1],
                    corr_between_b = np.eye(2),
                    b_MC_steps = 10)

Instantiating a Retrieval Method#

Retrieval method objects can be instantiated directly:

from curepy.retrieval_methods.optimal_estimation import OE
ret = OE()

or by using the RetrievalFactory:

from curepy.retrieval_methods.retrieval_method_factory import RetrievalFactory
ret = RetrievalFactory().make_retrieval_object('oe')

make_retrieval_object can also take any retrieval method-specific args and kwargs that could be given to the object directly:

from curepy.retrieval_methods.retrieval_method_factory import RetrievalFactory
ret = RetrievalFactory().make_retrieval_object('mcmc', nwalkers = 100, steps = 1000, burn_in = 100)

The table of valid retrieval methods and associated parameters can be found below.

Method

Parameters

OE

  • Jx - pre-calculated Jacobian for the measurement function w.r.t x

MCMC

  • nwalkers - number of walkers (or chains to run)

  • steps - number of steps to take in chain

  • burn_in - number of samples to discard at the start of the chain

  • progress - bool, show progress bar

  • parallel_cores - int, number of cores to use

Running a Retrieval#

Every retrieval is run using the run_retrieval method, this function’s interface is identical for all retrieval methods, the only required input is a RetrievalInput object:

results = ret.run_retrieval(inp)

The table of optional parameters for each retrieval method can be found below.

Method

Optional run_retrieval Parameters

OE

  • return_corr - bool, return correlation matrix between x values

  • reshape_results - bool, reshape x values and uncertainties to shape of initial guess

MCMC

  • return_corr - bool, return correlation matrix between x values

  • reshape_results - bool, reshape x values and uncertainties to shape of initial guess

  • return_samples - bool, return samples used to approximate posterior distribution

  • return_b_samples - bool, return MC samples of ancillary parameters

The output of run_retrieval is a RetrievalResult object, this stores the retrieved values of x with associated uncertainties, and any other requested information such as correlation and samples.

Retrieval results#

Output data can be accessed from the RetrievalResult object using the following accessors:

x = results.values
u_x = results.uncertainties
corr_x = results.correlation
samples = results.samples
b_samples = results.b_samples

There are also some helper functions within the RetrievalResult object, such as build_obsarray(), which builds an obsarray.ObsArray object containing the retrieved values and uncertainties of x. There is also a function get_chisq(), which returns the chi-squared value of the retrieved values.