CAPAM PI Webinar on data weighting
CAPAM PI Mark Maunder gave a seminar titled "Integrated analysis: the worst thing that happened to fisheries stock assessment" in Seattle, January 22, 2015 as part of the NOAA-NWFSC Monster Seminar JAM series. Links to a PDF of the presentation, recording of the seminar as well as the abstract are provided below:
Abstract:
Integrated analysis: the worst thing that happened to fisheries stock assessment
Mark Maunder and Kevin Piner
Contemporary fisheries stock assessment models often use multiple diverse data sets to extract as much information as possible about all model processes. This has led to the mindset that integrated models can compensate for lack of good data (e.g. surveys and catch-at-age). However, models are, by definition, simplifications of reality, and model misspecification can cause degradation of results when including additional data sets. The process, observation, and sampling components of the model must all be approximately correct to minimize biased results. Unfortunately, even the basic processes that we assume we understand well (e.g. growth and selectivity) are misspecified in most, if not all, stock assessments. These misspecified processes, in combination with composition data, result in biased estimates of absolute abundance and abundance trends, which are often evident as “data conflicts”. This is further compounded by over-weighting of composition data in many assessments from misuse of data weighting approaches. The law of conflicting data states that since data is true, conflicting data implies model misspecification, but needs to be interpreted in the context of random sampling error, and down weighting or dropping conflicting data is not necessarily appropriate because it may not resolve the model misspecification. Data sets could be analyzed outside the integrated model and the resulting parameter estimates for population processes and their uncertainty used in the integrated model (e.g. as a prior), but these analyses typically involve more assumptions, implicit or explicit, that are potentially misspecified leading to biased results. Model misspecification and process variation can be accounted for in the variance parameters of the likelihoods (observation error), but it is unclear when this is appropriate. The appropriate method to deal with data conflicts depends on if it is caused by random sampling error, observation model misspecification, or system dynamics model misspecification. Diagnostic approaches are urgently needed to test goodness of fit and identify model misspecification. We recommend external estimation of the sampling error variance used in likelihood functions, including process variation in the integrated model, and internal estimation of the process error variance. The required statistical framework is computationally intensive, but practical approaches are being developed.