User Guide#
Overview#
This user guide provides a step-by-step walkthrough of how to use curepy to perform retrievals. It covers the process of instantiating a RetrievalInput, setting up the necessary components such as measurements, priors, and ancillary parameters, and then running a retrieval using a chosen retrieval method. Finally, it explains how to access and interpret the results of the retrieval.
A full example of a retrieval can be found in the curepy example notebook, which is available here and in the examples directory of the CoMet website.
Instantiating a Retrieval Input#
Every retrieval method requires a RetrievalInput object to run a retrieval:
from curepy.retrieval_methods.retrieval_input import RetrievalInput
inp = RetrievalInput()
This object can be instatiated using a combination of containers or by using the RetrievalInput().build_retrieval_inputs() function.
Individual containers can also be built within the RetrievalInput object using the individual ‘build’ functions:
inp.build_measurement()
inp.build_prior()
inp.build_ancillary()
inp.build_measurement_function()
These functions build the Measurement, Prior, AncillaryParameter, and MeasurementFunction objects, respectively.
Every retrieval method requires a Measurement and MeasurementFunction input to be set in the RetrievalInput.
Prior and AncillaryParameter objects are optional.
Measurement#
The Measurement object stores the measurements, \(y\), and any related uncertainty and correlation information:
data = xr.open_dataset("my_data.nc")['measurements'].values
inp.build_measurement(y = data,
u_y_total = data*0.05,
corr_y = 'syst'
)
The measurement uncertainties can be set using u_y_total and corr_y (which can be a string indicating random - ‘rand’ - or systematic -‘syst’- correlation, or a custom correlation matrix),
or they can be set using u_y_rand and u_y_syst if separated into random and systematic components.
MeasurementFunction#
The MeasurementFunction object stores the measurement function, \(f()\) and an initial guess for the values of \(x\).
There is also an optional Boolean input multiple_guess_measurements, if False, the initial guess input is a valid input to the measurement function,
if True, the initial guess input is made up of multiple valid inputs to the measurement function joined along the first dimension. By default, this is set to False:
def meas_func(x, b1, b2):
return b1*x + b2
inp.build_measurement_function(measurement_func = meas_func,
initial_guess = [np.array(5)],
)
Note
If the measurement function contains multiple \(x\) to retrieve, the initial guess should be a list where each entry is a guess of the state variable, \(x_{i}\). These \(x_{i}\) can be arrays or floats/ints, the only requirement being the shape must be the same as the expected input to the measurement function.
Prior#
The Prior object stores information used to define the prior distribution. The inputs are prior_shape, a List of the shapes of each prior, prior_params,
a List of Dictionaries of each priors’ parameters, and an optional prior_correlation, a correlation matrix describing the correlation between each prior distribution.
The length of prior_shape, prior_params, and the side length of prior_correlation must be equal to the number of components in \(\underline{x}\):
inp.build_prior(prior_shape = ['uniform'],
prior_params = {'minimumn': -10, 'maximum': 10}
)
Note
If prior_correlation is not defined, it is set to random as default (the identity matrix).
The table of valid prior shapes and associated parameters can be found below.
Shape |
Parameters |
|---|---|
uniform |
|
normal |
|
Note
If no Prior object is set within the RetrievalInput, all prior distributions are set to be uniform with minimum \(-\infty\) and maximum \(\infty\).
AncillaryParameter#
The AncillaryParameter object stores the ancillary parameters, \(b\), of the measurement function, with associated uncertainty and correlation information.
b, u_b, and corr_b should all be input as Lists of length equal to the number of ancillary inputs to
the measurement function, then corr_between_b should be a square matrix with side length equal to the number of ancillary inputs to
the measurement function. If the MCMC retrieval method is being used, the kwargs b_MC_steps and b_samples can be set.
b_MC_steps is an integer defining the number of MC samples of the ancillary parameters to be drawn, and b_samples is an optional array
that can be given instead of drawing an MC sample:
inp.build_ancillary(b = [0.5, 10],
u_b = [0.001, 1],
corr_between_b = np.eye(2),
b_MC_steps = 10)
Instantiating a Retrieval Method#
Retrieval method objects can be instantiated directly:
from curepy.retrieval_methods.optimal_estimation import OE
ret = OE()
or by using the RetrievalFactory:
from curepy.retrieval_methods.retrieval_method_factory import RetrievalFactory
ret = RetrievalFactory().make_retrieval_object('oe')
make_retrieval_object can also take any retrieval method-specific args and kwargs that could be given to
the object directly:
from curepy.retrieval_methods.retrieval_method_factory import RetrievalFactory
ret = RetrievalFactory().make_retrieval_object('mcmc', nwalkers = 100, steps = 1000, burn_in = 100)
The table of valid retrieval methods and associated parameters can be found below.
Method |
Parameters |
|---|---|
OE |
|
MCMC |
|
Running a Retrieval#
Every retrieval is run using the run_retrieval method, this function’s interface is identical for all
retrieval methods, the only required input is a RetrievalInput object:
results = ret.run_retrieval(inp)
The table of optional parameters for each retrieval method can be found below.
Method |
Optional |
|---|---|
OE |
|
MCMC |
|
The output of run_retrieval is a RetrievalResult object, this stores the retrieved values of x with associated uncertainties, and any other requested information
such as correlation and samples.
Retrieval results#
Output data can be accessed from the RetrievalResult object using the following accessors:
x = results.values
u_x = results.uncertainties
corr_x = results.correlation
samples = results.samples
b_samples = results.b_samples
There are also some helper functions within the RetrievalResult object, such as build_obsarray(), which builds an obsarray.ObsArray object containing the retrieved values and uncertainties of x.
There is also a function get_chisq(), which returns the chi-squared value of the retrieved values.