Elsevier

Medical Image Analysis

Volume 17, Issue 2, February 2013, Pages 194-208
Medical Image Analysis

Non-local statistical label fusion for multi-atlas segmentation

https://doi.org/10.1016/j.media.2012.10.002Get rights and content

Abstract

Multi-atlas segmentation provides a general purpose, fully-automated approach for transferring spatial information from an existing dataset (“atlases”) to a previously unseen context (“target”) through image registration. The method to resolve voxelwise label conflicts between the registered atlases (“label fusion”) has a substantial impact on segmentation quality. Ideally, statistical fusion algorithms (e.g., STAPLE) would result in accurate segmentations as they provide a framework to elegantly integrate models of rater performance. The accuracy of statistical fusion hinges upon accurately modeling the underlying process of how raters err. Despite success on human raters, current approaches inaccurately model multi-atlas behavior as they fail to seamlessly incorporate exogenous intensity information into the estimation process. As a result, locally weighted voting algorithms represent the de facto standard fusion approach in clinical applications. Moreover, regardless of the approach, fusion algorithms are generally dependent upon large atlas sets and highly accurate registration as they implicitly assume that the registered atlases form a collectively unbiased representation of the target. Herein, we propose a novel statistical fusion algorithm, Non-Local STAPLE (NLS). NLS reformulates the STAPLE framework from a non-local means perspective in order to learn what label an atlas would have observed, given perfect correspondence. Through this reformulation, NLS (1) seamlessly integrates intensity into the estimation process, (2) provides a theoretically consistent model of multi-atlas observation error, and (3) largely diminishes the need for large atlas sets and very high-quality registrations. We assess the sensitivity and optimality of the approach and demonstrate significant improvement in two empirical multi-atlas experiments.

Highlights

► We propose a novel statistical fusion algorithm, Non-Local STAPLE (NLS). ► NLS seamlessly integrates intensity into the estimation process. ► NLS provides a theoretically consistent model of multi-atlas observation error. ► NLS largely diminishes the need for large atlas sets and high quality registration. ► We demonstrate superior performance over the state-of-the-art approaches.

Introduction

Segmentation of anatomical structures on medical images is essential for scientific inquiry into the complex relationships between biological structure and function as well as clinical diagnosis, treatment, and assessment. The long-held “gold standard” for highly robust segmentation has been through expert manual delineation (Crespo-Facorro et al., 1999, Tsang et al., 2008). Yet, manual delineation is extremely resource consuming and plagued by inter- and intra-rater variability (e.g., 10–20% by volume (Ashton et al., 2003, Joe et al., 1999)). Alternatively, fully-automated algorithms often result in robust and accurate estimations for specific classes of problems (e.g., brain-tissue classification (Cocosco et al., 2003, Van Leemput et al., 1999, Wells et al., 1996), optic nerve segmentation (Noble and Dawant, 2011)). Unfortunately, the success of automated techniques is often dependent upon the application, modality, and image quality (Fischl et al., 2002, Heckemann et al., 2006, Rohlfing et al., 2004a, Yeo et al., 2008).

Atlas-based segmentation methods form a middle-ground between fully-manual and fully-automatic segmentation approaches (Collins et al., 1995, Gee et al., 1993). In atlas-based models, spatial information is transferred from an existing dataset (labeled atlas) to a previously unseen context (target) through deformable registration. Proposed extensions enable the summary of multiple atlases into a common coordinate system by constructing (1) unbiased average atlases (Guimond et al., 2000, Joshi et al., 2004) and (2) target-specific atlases (Commowick et al., 2009, Ericsson et al., 2008). Yet, the accuracy of single-atlas based methods is limited due to the bias concerns and lack of correspondence to the target (Ashburner and Friston, 2005, Han and Fischl, 2007). Thus, an alternative strategy that independently utilizes multiple atlases (i.e., multi-atlas segmentation) has come to represent the de facto standard baseline for atlas techniques. In multi-atlas segmentation (Heckemann et al., 2006, Rohlfing et al., 2004b), multiple atlases are separately registered to the target and the voxelwise label conflicts between the registered atlases are resolved using label fusion.

Perhaps surprisingly, a majority vote, the simplest fusion strategy, has been shown to result in highly robust segmentations (Aljabar et al., 2009, Heckemann et al., 2006, Rohlfing et al., 2004a, Rohlfing and Maurer, 2007). More recently, weighted voting strategies that use global (Artaechevarria et al., 2009, Chen et al., 2012), local (Isgum et al., 2009, Sabuncu et al., 2010, Wang et al., 2011), semi-local (Sabuncu et al., 2010, Wang et al., 2012), and non-local (Coupé et al., 2011) intensity similarity metrics have demonstrated consistent improvement in segmentation accuracy. Particularly for neurological applications, highly local weights have provided the most consistent results in segmentation quality (Artaechevarria et al., 2009, Sabuncu et al., 2010).

In contrast to ad hoc voting, statistical fusion strategies (e.g., Simultaneous Truth and Performance Level Estimation, STAPLE (Warfield et al., 2004)) directly integrate a stochastic model of rater behavior into the estimation process. Despite elegant theory and success on human raters, applications to the multi-atlas context have proven problematic (Asman and Landman, 2011a, Sabuncu et al., 2010, Wang et al., 2011, Wang et al., 2012). In response, a myriad of advancements to the STAPLE framework have been proposed to account for (1) spatially varying task difficulty (Asman and Landman, 2011b, Rohlfing et al., 2004b), (2) spatially varying rater performance (Asman and Landman, 2011a, Asman and Landman, 2012a, Commowick et al., 2012, Weisenfeld and Warfield, 2011), and (3) instabilities in the rater performance level parameters (Commowick and Warfield, 2010, Landman et al., 2011b). Yet, these advanced techniques remain inherently models of human observation error as they fail to directly incorporate the image intensity differences between the atlases and the target. Moreover, initial attempts to incorporate intensity into the STAPLE framework have relied upon ad hoc extensions that simply ignore voxels based upon a priori similarity measures (Cardoso et al., 2011, Weisenfeld and Warfield, 2011).

Regardless of the approach, label fusion models have consistently made an implicit assumption that the use of multiple atlases results in a voxelwise, collectively unbiased representation of the target. This assumption is manifested through the fact that nearly all fusion algorithms determine the optimal label using only directly corresponding intensity and label information. Ergo, multi-atlas methods are generally dependent upon highly accurate registration and the use of large numbers of atlases. We are left with several problems in multi-atlas segmentation: (1) a dependence on large-scale, high-quality registrations, (2) voting-based algorithms lack the theoretical underpinning of statistical fusion observation models and (3) statistical fusion algorithms fail to incorporate intensity information. Thus, previous approaches have failed to accurately model the stochastic process of registered atlas observation error.

Meanwhile, a relatively new framework in the field of image analysis, non-local means, has gained momentum in terms of quantifying complex image characteristics (e.g., noise structure, spatially varying correspondence). In non-local means, images are deconstructed into a collection of small volumetric patches and the similarity or correspondence between these patches is quantified to learn the underlying image structure (Buades et al., 2005). The non-local means framework has emerged in the context of image de-noising (Buades et al., 2005, Coupé et al., 2006, Kervrann et al., 2007, Liu et al., 2008, Manjón et al., 2008, Van De Ville and Kocher, 2009). However, more recent work has demonstrated the applicability of non-local means to new applications such as synthesizing image contrast (Roy et al., 2010a), in-painting (Sun and Tappen, 2011), and image segmentation (Coupé et al., 2011, Roy et al., 2010b).

Herein, we propose a novel statistical fusion algorithm (Non-Local STAPLE – NLS) that reformulates the STAPLE framework from a non-local means perspective. NLS models the registered atlases as collections of volumetric patches containing both intensity and label information and uses the non-local criteria (Buades et al., 2005, Coupé et al., 2011) to resolve imperfect correspondence. Through this reformulation, we seamlessly integrate exogenous intensity information into the estimation process to provide a theoretically consistent model of multi-atlas observation error. NLS provides a model in which we learn which label each atlas would have observed given perfect correspondence with the target. This presentation is an extension and generalization of a recently published conference paper (Asman and Landman, 2012b). Herein, we provide additional examples, derivations and insights that were not part of the original conference publication.

In this manuscript, we begin by deriving the theoretical basis and the parameters for initialization and convergence governing NLS. Next, we demonstrate significant improvement over the state-of-the-art fusion algorithms on two distinct datasets: (1) computed tomography (CT) images for thyroid segmentation and (2) structural magnetic resonance (MR) images for whole-brain segmentation. For whole-brain segmentation, we demonstrate that NLS dramatically lessens the need for large-scale and highly accurate non-rigid registration. Lastly, we provide insight into the sensitivity of NLS to the various model parameters, assess the optimality of the algorithm, and provide a comparison to a direct application of non-local voting.

Section snippets

Theory

The following presentation provides the theoretical model governing NLS in the commonly used Expectation–Maximization (EM) framework (Dempster et al., 1977). For clarity and consistency, the notation closely follows the presentation of the original STAPLE algorithm (Warfield et al., 2004).

Methods and results

An implementation of the Non-Local STAPLE algorithm is available as part of the Java Image Science Toolkit (JIST, www.nitrc.org/projects/jist).

Discussion

Non-Local STAPLE represents the first statistical fusion algorithm that seamlessly incorporates intensity into the estimation process and creates a cohesive theoretical model specifically targeting registered atlas observation behavior. Additionally, NLS largely overcomes several of the current obstacles that plague multi-atlas segmentation including the need for high-quality non-rigid registration and large numbers of atlases. These goals are accomplished through the reformulation of the

Conclusions

We have derived and investigated Non-Local STAPLE, a new statistical fusion algorithm for multi-atlas segmentation. Through a reformulation from a non-local means perspective, NLS represents the first statistical fusion algorithm that (1) creates a cohesive theoretical model specifically targeting registered atlas observation behavior, and (2) seamlessly incorporates intensity into the core of the STAPLE estimation framework. As a result, NLS largely overcomes the need for high-quality

Acknowledgements

This research was supported by NIH Grants 1R21NS064534 (Prince/Landman), 2R01EB006136 (Dawant), 1R03EB012461 (Landman) and R01EB006193 (Dawant). This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University, Nashville, TN. The authors are grateful to Dr. Benoit Dawant for the labeled thyroid dataset and Dr. Andrew Worth (NeuroMorphometrics, Inc.) for the exquisitely labeled whole-brain dataset.

References (63)

  • S. Joshi et al.

    Unbiased diffeomorphic atlas construction for computational anatomy

    Neuroimage

    (2004)
  • J.M.P. Lotjonen et al.

    Fast and robust multi-atlas segmentation of brain magnetic resonance images

    Neuroimage

    (2010)
  • J.V. Manjón et al.

    MRI denoising using non-local means

    Medical Image Analysis

    (2008)
  • J.H. Noble et al.

    An atlas-navigated optimal medial axis and deformable model algorithm (NOMAD) for the segmentation of the optic nerves and chiasm in MR and CT images

    Medical Image Analysis

    (2011)
  • T. Rohlfing et al.

    Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains

    Neuroimage

    (2004)
  • M. Sdika

    Combining atlas based segmentation and intensity classification with nearest neighbor transform and accuracy weighted vote

    Medical Image Analysis

    (2010)
  • X. Artaechevarria et al.

    Combination strategies in multi-atlas image segmentation: application to brain MR data

    IEEE Transactions on Medical Imaging

    (2009)
  • E.A. Ashton et al.

    Accuracy and reproducibility of manual and semiautomated quantification of MS lesions by MRI

    Journal of Magnetic Resonance Imaging

    (2003)
  • A. Asman et al.

    Characterizing spatially varying performance to improve multi-atlas multi-label segmentation

    Information Processing in Medical Imaging (IPMI)

    (2011)
  • A. Asman et al.

    Robust statistical label fusion through consensus level, labeler accuracy and truth estimation (COLLATE)

    IEEE Transactions on Medical Imaging

    (2011)
  • A.J. Asman et al.

    Formulating spatially varying performance in the statistical fusion framework

    IEEE Transactions on Medical Imaging

    (2012)
  • A.J. Asman et al.

    Non-local STAPLE: an intensity-driven multi-atlas rater model

    Medical Image Computing and Computer-Assisted Intervention (MICCAI)

    (2012)
  • A.J. Asman et al.

    Simultaneous Segmentation and Statistical Label Fusion

    (2012)
  • R. Bellman

    Dynamic programming and Lagrange multipliers

    Proceedings of the National Academy of Sciences of the United States of America

    (1956)
  • A. Buades et al.

    A non-local algorithm for image denoising

    Computer Vision and Pattern Recognition (CVPR), IEEE

    (2005)
  • Cardoso, M.J., Leung, K., Modat, M., Barnes, J., Ourselin, S., 2011. Locally Ranked STAPLE for Template based...
  • A. Chen et al.

    Evaluation of multiple-atlas-based strategies for segmentation of the thyroid gland in head and neck CT images for IMRT

    Physics in Medicine and Biology

    (2012)
  • D.L. Collins et al.

    Automatic 3-D model-based neuroanatomical segmentation

    Human Brain Mapping

    (1995)
  • Commowick, O., Warfield, S., 2010. Incorporating priors on expert performance parameters for segmentation validation...
  • Commowick, O., Warfield, S., Malandain, G., 2009. Using Frankenstein’s creature paradigm to build a patient specific...
  • O. Commowick et al.

    Estimating a reference standard segmentation with spatially varying performance parameters: local MAP STAPLE

    IEEE Transactions on Medical Imaging

    (2012)
  • Cited by (195)

    • Evolution of multiorgan segmentation techniques from traditional to deep learning in abdominal CT images – A systematic review

      2022, Displays
      Citation Excerpt :

      STAPLE approach assigns weights to propagation labels based on estimated accuracy but voxel intensity inconsistencies are ignored here. Non local STAPLE reformulates STAPLE framework and provides consistent model with reducing the need for large datasets and good quality registrations [4,3]. Among all weighted schemes, intensity similarity weight assignment is the most successful.

    View all citing articles on Scopus
    View full text