Automatic delineation of ribs and clavicles in chest radiographs using fully convolutional DenseNets

Automatic delineation of ribs and clavicles in chest radiographs using fully convolutional DenseNets

Computer Methods and Programs in Biomedicine 180 (2019) 105014 Contents lists available at ScienceDirect Computer Methods and Programs in Biomedicin...

NAN Sizes 0 Downloads 0 Views

Computer Methods and Programs in Biomedicine 180 (2019) 105014

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine journal homepage: www.elsevier.com/locate/cmpb

Automatic delineation of ribs and clavicles in chest radiographs using fully convolutional DenseNets Yunbi Liu, Xiao Zhang, Guangwei Cai, Yingyin Chen, Zhaoqiang Yun, Qianjin Feng, Wei Yang∗ School of Biomedical Engineering, Southern Medical University, 1023-1063 Shatai South Road, Baiyun District, 510515, Guangzhou, China

a r t i c l e

i n f o

Article history: Received 17 May 2019 Revised 4 August 2019 Accepted 4 August 2019

Keywords: Chest radiograph Rib and clavicle delineation Fully convolutional DenseNet

a b s t r a c t Background and Objective: In chest radiographs (CXRs), all bones and soft tissues are overlapping with each other, which raises issues for radiologists to read and interpret CXRs. Delineating the ribs and clavicles is helpful for suppressing them from chest radiographs so that their effects can be reduced for chest radiography analysis. However, delineating ribs and clavicles automatically is difficult by methods without deep learning models. Moreover, few of methods without deep learning models can delineate the anterior ribs effectively due to their faint rib edges in the posterior-anterior (PA) CXRs. Methods: In this work, we present an effective deep learning method for delineating posterior ribs, anterior ribs and clavicles automatically using a fully convolutional DenseNet (FC-DenseNet) as pixel classifier. We consider a pixel-weighted loss function to mitigate the uncertainty issue during manually delineating for robust prediction. Results: We conduct a comparative analysis with two other fully convolutional networks for edge detection and the state-of-the-art method without deep learning models. The proposed method significantly outperforms these methods in terms of quantitative evaluation metrics and visual perception. The average recall, precision and F-measure are 0.773 ± 0.030, 0.861 ± 0.043 and 0.814 ± 0.023 respectively, and the mean boundary distance (MBD) is 0.855 ± 0.642 pixels of the proposed method on the test dataset. The proposed method also performs well on JSRT and NIH Chest X-ray datasets, indicating its generalizability across multiple databases. Besides, a preliminary result of suppressing the bone components of CXRs has been produced by using our delineating system. Conclusions: The proposed method can automatically delineate ribs and clavicles in CXRs and produce accurate edge maps. © 2019 Published by Elsevier B.V.

1. Introduction Chest radiography is the most common screening procedure in daily clinical routine due to its advantages of easy accessibility, economy and low radiation dose. Radiologists can make a general detection from these images, which may reveal some unsuspected pathological alterations such as pulmonary nodules. Nevertheless, superimposition of multiple anatomical structures, such as ribs and clavicles, make it difficult for radiologists to read and interpret CXRs and may hide important findings within lung areas. A study showed that suppression of bony structures in CXRs could improve detection performance for the diagnosis of pulmonary structures [1]. In recent years, many bone suppression researches have achieved impressive results [2–7]. Locating rib and clavicle



Corresponding author. E-mail address: [email protected] (W. Yang).

https://doi.org/10.1016/j.cmpb.2019.105014 0169-2607/© 2019 Published by Elsevier B.V.

can provide effective edge priors for distinguishing between the ribs and clavicles and other components in CXRs so that bone suppression researches can be more effective to be more practical, hence providing accurate diagnosis for chest radiography analysis. Moreover, rib and clavicle delineation can be used to improve the image quality of the soft-tissue images estimated by bone suppression methods as post-processing. Therefore, it is important to delineate ribs and clavicles in CXRs accurately. PA CXR contains posterior ribs, anterior ribs, and clavicles. The posterior ribs are more distinct than the anterior ribs due to the posterior ribs have higher contrast with surrounding structures. An accurate automatic delineation of ribs and clavicles is challenging for several reasons. First, the appearance of ribs and clavicles has variability across different patients in rib shape, spacing, width and length, which may result in inaccurate delineation of ribs in some template matching detection methods. Second, CXRs contain other anatomical structures with strong edges such as lung field borders,

2

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

vertebrae and ribcages. These structures may confuse edge detectors to detect unconcerned edges. Typical edge detectors, such like Canny detector, mainly focus on extracting local cues of intensity and color gradients [8]. These classical edge detectors cannot easily distinguish between specific bone edges and other unconcerned edges. Over the past decades, many researchers have investigated on segmentation or delineation of ribs and clavicles [9–14]. Related researches mainly focused on edge detection and fitting quadratic curves to delineate the ribs and clavicles at the early time. Wechsler et al. [15] proposed a rib detection method combining various image processing techniques including filtering, edge detection, and Hough transforms. Toriwaki et al. [16] designed a set of rib detection algorithms that focused on representing the rib borders as quadratic curves. Brace et al. [17] and Li et al. [18] located rib edge points and linked together edge points by different methods, such as by fitting quadratic curves through them. These methods achieved impressive results but required a huge amount of computations. Thereafter, some methods developed mainly focused on constructing parabolas or ellipse models of the ribs or the rib cage. Vogelsang et al. [13] proposed the sinking lead algorithm that performs a matched template technique. To describe a rib entirely, they used four parabolas, namely, two for the left and right borders of the ventral ribs and two for the lower and upper borders of the dorsal ribs. Ginneken et al. [19,20] fitted the global rib cage directly to a radiograph instead of detecting rib borders locally. In their method, each posterior rib is modeled by two parallel parabolas. Moreira et al. [21] used two different directional filters to obtain edge points expected to belong to the lower borders of ribs, described the rib borders as parabola, and determined the upper borders of ribs by measuring the distance between the upper and lower curves. Og˘ ul et al. [22] fitted a parabola curve to all rib seeds obtained through a log Gabor filtering approach and extended the center curve by a problem-specific region growing technique to delineate the entire rib, which does not necessarily follow a general parabolic model of the rib cage. However, in these methods, selection of model parameters and parabolas is difficult because the rib shape varies among people. In addition, most of these methods can be applied only to detect the borders of the posterior ribs or the clavicles, but the borders of the anterior ribs with relatively low contrast could be ignored. Furthermore, these methods usually need combine multiple image processing methods; such practice is, time-consuming and computationally expensive. Thus, automatic delineation of ribs and clavicles remains challenging. To our knowledge, few CNN-based methods have been proposed to delineate ribs and clavicles in CXRs. In 2013, CernazanuGlavan and Holban [10] proposed a segmentation method for X-ray images using a CNN, in which they briefly mentioned that they used a CNN to extract an accurate contour of the bones. Deep learning methods had been successfully applied to many tasks, such as image classification, image segmentation, and edge detection. Long et al. [23] proposed that fully convolutional networks (FCNs) can efficiently make dense predictions for per-pixel tasks, such as semantic segmentation. Inspired by their work, Xie et al. [24] proposed an efficient and accurate edge detector, namely, HED, which is composed of the first five convolution (conv) stages of the Vgg16 network and connects a side output layer in each stage. Some well-known networks including dilated convolutional network, DenseNet, and ResNet, can also be used for edge detection tasks. For example, Chen et al. [25] proposed a lightweight network, CAN, which is composed of multiple dilated conv layers with different dilation factors that can enlarge receptive fields by layer. This network has advantages of lightweight parameters and large receptive fields, which is helpful when data are limited as in most medical image analysis tasks. Huang et al. [26] introduced DenseNet, which connects each layer to every other layer in a feed-

forward propagation. The dense connection of DenseNet strengthens feature propagation and encourages feature reuse, which benefits the extraction of abundant features of images. Basing on DenseNet, Jegou et al. [27] proposed a FC-DenseNet with an architecture composed of a downsampling path, an upsampling path, and skip connections. The connections combine information of multiple scales; as such, the network can learn both high- and lowlevel features. The capability of extracting abundant and discriminative features is important for identifying rib and clavicle edges in CXRs. Inspired by these methods, we adopted a FC-DenseNet for predicting rib and clavicle delineations of CXRs. This work aims to automatically delineate ribs and clavicles in standard PA CXRs using a FC-DenseNet [27]. To construct a training dataset, one author manually delineated 82 CXRs randomly selected from a dual-energy subtraction (DES) dataset. We then trained the FC-DenseNet model by using the paired CXRs and manual delineations. In the application stage, a preprocessed CXR was fed into the pre-trained model, which produced the estimated delineation of the CXR. Fig. 1 illustrated the visual delineation results of different anatomical regions by the proposed method. Different anatomical regions corresponding to clavicle, posterior ribs and anterior ribs are zoomed in Fig. 1, respectively. 2. Methods In this work, we propose a method for delineating ribs and clavicles from a given CXR using a FC-DenseNet. A detailed description of the proposed method is provided in the following sections. Section 2.1 describes how to generate ground-truth delineations and preprocess CXRs before they are fed into the network. Section 2.2 briefly reviews the architecture of the FCDenseNet and introduces its use in delineating ribs and clavicles. Section 2.3 presents the loss functions for training the FC-DenseNet model. Section 2.4 introduces the evaluation metrics for measuring the performance of models. 2.1. Generating ground-truth delineations and preprocessing CXR 2.1.1. Generating ground-truth delineations We developed an in-house tool for manually delineating ribs and clavicles using MATLAB R2016a. Each manual delineation consists of the borders of posterior ribs, anterior ribs and clavicles, which are about 3 pixels wide after being dilated. Fig. 2 shows an example of manual delineation of ribs and clavicles. The edges of different types are colored differently for better view. We mainly focus on delineating ribs and clavicles within the lung area. The first and the last two pair ribs are usually located outside the lung area and have low contrast with nearby soft tissues, thereby increasing the difficulty for manually and accurately delineate their borders. We attempted to delineate all the ribs and clavicles, but some of the ribs outside the lung area were missing in manual delineation. Finally, 82 manual delineations were generated as the ground-truth delineations. 2.1.2. Preprocessing CXR The overall intensity and contrast shift is reflected in different CXRs due to the differences in acquisition conditions and patient variability. The intensity inconsistency of chest radiographs may affect the prediction performance of the network. Therefore, a preprocessing step for contrast normalization is necessary to achieve consistency of chest radiographs. We adopted a guided image filter [28] to enhance the structural details and normalize the contrast of the chest radiographs as in our previous work [29]. For given chest radiograph I, the smoothed image by a guided image filter with a large radius is used as base layer I0 . The detailed layer is

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

3

Fig. 1. Illustration of delineation results of different anatomical regions by the proposed method. Detected borders are outlined in white. Red rectangles are zoomed areas of clavicle, posterior ribs and anterior ribs respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 2. Example of manual delineation of the ribs and clavicles. The color coding is red: clavicles, green: posterior ribs, and blue: anterior ribs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Id = I – I0 . The normalized I is computed as

I←

I d − μd

σd

,

(1)

where μd andσ d are the intensity mean and standard deviation of Id , respectively. The original spatial resolution of the chest radiographs is very large, with size ranging from 2011 × 2011 pixels to 2048 × 2048 pixels. Delineating the chest radiographs of original resolution is very demanding for the network structure and memory capacity. Thus, we downscaled the normalized I to the size of 512 × 512. Fig. 3 shows an example of CXR before and after preprocessing. After preprocessing, the structural details of the enhanced CXR appear clearer and the spatial consistency of contrast is improved. 2.2. FC-DenseNet for delineating ribs and clavicles A typical U-Net architecture consists of a downsampling path, an upsampling path, and skip connections [30]. Skip connections are introduced between feature maps of the same size in the down-sampling and up-sampling paths, which combine high- and

Fig. 3. Illustration of preprocessing CXR using a guided filter. The left is the original CXR and the right is the enhanced CXR after preprocessing.

low-level features. FC-DenseNet extends the U-Net by replacing most conventional convolutional layers with dense blocks as the main components in their architectures [27]. In DenseNets [26], each dense block is an iterative concatenation of previous feature maps in favor of feature reuse and deep supervision. Let xl be the output of the lth layer. It is defined as xl = H([xl -1 ,xl -2 ,…,x0 ]), where […] represents the concatenation operation. In this case, H represents a series of operation including Batch Normalization (BN), followed by ReLU, a convolution and dropout. Such connectivity pattern can maximize the use of features extracted in previous layers and strengthen the supervision of each convolutional layer. Fig. 4 shows the architecture of the FC-DenseNet for delineating ribs and clavicles. This architecture comprises 9 dense blocks, 4 Transition Down (TD) layers and 4 Transition Up (TU) layers. Each dense block is composed of 4 convolution layers, and the growth rate is 12. Each convolution layer is applied to create 12 feature maps, which are concatenated to the previous feature maps. All feature maps of the 4 layers are concatenated together as the output of each dense block; thus, each block contains 4 × 12 feature maps. The FC-DenseNet has few trainable parameters, which can be trained from scratch rather than loading the pre-trained

4

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

Fig. 4. Architecture of the FC-DenseNet for delineating ribs and clavicles. Each cuboid represents the current feature map layer, generated from the preceding layer. The spatial resolution of each layer is printed above and the channel number is printed underneath. The top right corner presents comments on each colored rectangle, the concrete structure of each dense block, and the operator. Each colored rectangle corresponds to the cuboid with the same color. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

model. FC-DenseNet has strong ability of aggregating contextual information by skip connections. The dense connection of the dense blocks strengthens feature propagation and encourages feature reuse. Given these advantages of FC-DenseNet, it is very appropriate for dense prediction tasks such as delineating ribs and clavicles in CXRs. In our experiments, the preprocessed input CXR is fed into the FC-DenseNet to predict the delineation of ribs and clavicles. The corresponding manual delineation is regarded as ground truth. Our goal is to predict the rib and clavicle delineation of the input CXR as close as possible to the corresponding ground truth. Let us denote the input CXR as I, the ground truth as B and the predicted delineation as B∗ . B∗ =F(I) should be as close as possible to B, where F is the mapping function from I to B learned by the FC-DenseNet model. Actually, B∗ , as the final output of the FC-DenseNet produced by the last convolution layer followed by sigmoid, is a probability map that indicates the possibility of every pixel being the borders of rib and clavicle. The value of B∗ is consecutive, ranging from 0 to 1. However, B is binary the value of which is 0 or 1. 0/1 represents non-edge/edge pixels. A proper threshold should be selected to threshold B∗ to produce final predicted delineation. In a deep convolutional network, a proper loss function is important because of its role in computing the difference between the output of the network and the ground truth and updating the gradient iteratively to compute the value of all the trainable parameters through the network. In the next part, we will introduce the loss functions used in the proposed method.

2.3. Loss function We consider delineating ribs and clavicles as a binary prediction task. The cross entropy loss (CE) is the most common loss function

for binary prediction and is formulated



CE =

− log( p) i f y = 1, − log(1 − p) otherwise.

(2)

In the above y∈{0, 1} specifies the ground-truth class and p∈(0, 1) specifies the predicted probability being the borders of rib and clavicle of every pixel. However, whether the pixels around the manually delineated borders of rib and clavicle are also edge pixels remains uncertain due to the uncertainty of the edge pixels inherent in CXRs and the inaccuracy of manual delineation. Regarding these pixels as negative samples may cause ambiguity during training and lead to inaccurate prediction during testing. Throughout this paper, we will use the term “uncertain” to refer to pixels locating around the manually delineated borders. In this case, we consider a pixel-weighted loss function. The pixel-weighted loss function set the loss weights of the uncertain pixels to 0 and the others to 1. Each dilated manual delineation Edilated is produced by dilating its manual delineation E using a 3 × 3 disk structure element. Weighted mask W is computed as W = 1−(Edilated −E). The weighted mask W assigns the loss weight for each pixel. We compute the loss of every pixel and then multiply by w(p), the corresponding weight value of W. Thus, the pixel-weighted loss function for training the model is defined as

l ( p) = w( p) × CE ( p)

(3)

The loss function ignores the contribution of these uncertain points by setting their loss weights to 0 during training. Class imbalance exists in binary classification tasks. Based on the cost-sensitive loss function proposed by Hwang and Liu [31], Xie and Tu [24] proposed a class-balanced CE loss function by introducing class weight factor β to the CE, which is defined as

l ( p) = −β



j∈Y+

log( p j ) − (1 − β )



j∈Y−

log(1 − p j )

(4)

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

In the above Y− and Y+ denote the edge and non-edge ground truth label sets, respectively. In Eq. (4), β = |Y− |/|Y| and 1 − β = |Y+ |/|Y| are the biases for positive (edge) or negative (nonedge) training data, respectively. However, except the issue of positive and negative class imbalance, hard and easy examples imbalance also exists in binary classification tasks. Easy example is far more than hard example, and most of easy examples belong to the negative class. A large number of easy examples devote their small loss values to the loss sum, which can overwhelm the rare class. Thus, Lin et al. [32] proposed a novel loss function, named as Focal Loss (FL), for tackling hard/easy imbalance issue; the function is defined as

F L( pt ) = −αt (1 − pt )γ log( pt )

(5)

In Eq. (5), pt is defined as:



pt =

p if t = 1 1 − p otherwise.

(6)

The balanced coefficient α t is defined the same way as pt . We also considered the pixel-weighted class-balanced CE and the pixel-weighted FL to ignore the losses of all uncertain pixels during training for more robust prediction. 2.4. Evaluation metrics The delineation of ribs and clavicles in CXRs is a dense binary prediction task, in which the pixels on the borders of rib and clavicle are marked as the positive class and the rest are marked as the negative class. We use four widely used metrics to evaluate our work, namely, the mean boundary distance (MBD), recall, precision and F-measure. These metrics were defined and computed as follows. MBD is the average distance between the estimated boundary and the ground truth boundary. In our work, it refers to the average distance (in pixels) of all detected edge pixels in the estimated delineation S and reference edge pixels corresponding to manual delineation T. Let si and tj be the points on S and T, respectively. The minimum distance of point si on S to T was computed as:





d (si , T ) = min si − t j 

(7)

j

For MBD computation, the minimum distance for each point on S to T was calculated, and vice versa. These minimum distances were averaged as MBD:

1 MBD(S, T ) = 2



i

d ( si , T ) + |{si }|



d (t j , S )

  {t j }

j



(8)

Let us denote TP (true positive) and FP (false positive) as the number of pixels correctly and incorrectly labeled as the pixels on the borders of rib and clavicle, respectively. Recall reflects the proportion of the detected pixels that really locate at the borders of rib and clavicle in all true edge pixels, which is computed as

recall =

TP . the number o f positives

(9)

Precision reflects the proportion of the pixels correctly labeled as edge in all pixels classified as edge, which is computed as

precision =

TP . TP + FP

(10)

An inverse relationship exists between precision and recall, where it is possible to increase one at the cost of reducing the other. F-measure, the harmonic mean of precision and recall, strikes a balance between precision and recall. It is computed as

F =2×

precision × recall . precision + recall

(11)

5

For the sake of fairness, we do not consider uncertain pixels ignored during training in the test phrase. These pixels are not involved in the calculation of evaluation metrics. 3. Results 3.1. Experimental dataset and settings The experimental dataset consists of 504 posterior-anterior DES (dual-energy subtraction) chest radiographs acquired with a DES system (Revolution XR/d, GE) at Nanfang Hospital, Guangzhou, China. The DES dataset contains conventional CXRs, their softtissue and bone images. In our work, only the conventional CXRs in the DES dataset were used. The images of size range from 2011 × 2011 pixels to 2048 × 2048 pixels were restored in DICOM format with a 14-bit depth. All images were anonymized and did not involve patient privacy before the experiment. We developed a tool for manually delineating ribs and clavicles using MATLAB R2016a. It takes about 1 h to manually delineate a CXR, which is fairly time and labor consuming. We randomly selected 82 CXRs from the DES database to manually delineate their borders of the ribs and clavicles. In 82 paired CXRs and manual delineations, a total of 73 pairs were randomly selected as the training set, and the remaining 9 pairs were set as the test set. Considering of the small amount of paired CXRs and manual delineations, we only divided them into training set and test set. All analyses were carried out in accordance with the relevant guidelines and regulations, and the requirement to obtain informed consent was waived. This study was approved by the local institutional review board. Considering the data-driven characteristic of CNN, an effective way to improve the prediction performance of our FC-DenseNet model is to augment the training samples. In our scheme, we augmented the training dataset on-the-fly by random flip, rotation, translation and zoom. The methods of data augmentation imitate the possible cases in practice. For instance, there are cases where the heart is on the right and there are no cases where the head is on the bottom. Thus, we only performed horizontal flip and did not use vertical flip. In general, children have smaller thoracic cages than the adult and fat people have larger thoracic cages than the thin. Therefore, we also scaled the images in a scale of 0.2 considering that different people have different thoracic sizes. We also augmented the training dataset by random rotation (in a range of [−5, 5] degrees) and translation (in a scale of 0.1) so that the trained FC-DenseNet model will not be sensitive to rotation and translation. The loss function in Eq. (3) was minimized using the Adam optimizer, with a learning rate of 10−4 and an epoch times of 600. The batch size was set to 2 due to memory capacity limitation. The filter weights of each layer were initialized with MSRA initialization [33] and the initial biases of each convolution layer were set to 0 except the last convolution layer. The bias of the last convolution layer was set to log(−(1 − 0.01)/0.01). Our method was implemented under the framework of TensorFlow. On a workstation equipped with 2 GPUs (TITANX), training a single FC-DenseNet model took approximately 6 h depending on the FC-DenseNet architecture. 3.2. Loss functions and thresholds for delineating ribs and clavicles In Fig. 5, we identified the effect of different loss functions on the delineation performance. The loss functions include the CE, the class-balanced CE [31], the FL [32] and their pixel-weighted loss functions. Fig. 5 shows the effect of different loss functions on the curve trend of precision and recall. High F-measure was achieved when the pixel-weighted CE was used. Moreover, the pixel-weighted CE improved from 0.682 to 0.814 on F-measure than the original CE. The pixel-weighted FL performs similar to the

6

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

Fig. 5. Precision and recall curves of the trained FC-DenseNet model with different loss functions.

aggregation network named CAN [25]. HED is trained by cooperating multiple output losses and produces multiple predictions. Here we selected the fuse output layer as the final prediction. We fine-tuned the HED network after loading the weights of the pretrained VGG16 network model. CAN is a lightweight network composed of dilated convolutional layers. Considering its advantages of few trainable parameters and strong ability of aggregating contextual information, we regard it as an appropriate network for delineating ribs and clavicles in CXRs. Fig. 8 shows different rib and clavicle delineations predicted by HED, CAN and the FC-DenseNet models. HED detected more edge pixels but many non-edge pixels were misclassified as the positive samples. Some isolated or discontinuous points were detected to be edge pixels in the edge maps predicted by HED, which caused the MBD to increase and the detection precision to decrease. CAN detected fewer edge pixels, which caused the detection recall to decrease. By contrast, the FC-DenseNet achieved higher performance with the F-measure of 0.814 ± 0.023 and the overall MBD of 0.855 ± 0.642 pixels, and produced visually more appealing rib and clavicle delineations. In addition, the average detection recall of clavicle, anterior ribs and posterior ribs with the FC-DneseNet were calculated and they are 0.712, 0.761 and 0.786, respectively. The detection recall of clavicle is lowest because the overlap of clavicle and ribs increases the difficulty of detecting the borders of clavicle. All the predicted results of the three models appeared thicker than the ground truth. As such, the pixel-weighted loss function ignored the loss of the suspicious points around the ground-truth edges, rather than computed their losses by classifying them as non-edge points during training. The quantitative evaluation results of the three network models are summarized in Table 1. 3.4. Comparison with traditional methods

Fig. 6. Evaluation metrics of the trained FC-DenseNet model using the pixelweighted CE with different thresholds.

pixel-weighted CE. Alternatively, the pixel-weighted CE was used as the loss function in our method. In general, as the threshold increases, the precision will increase and the recall will decrease. Fig. 6 shows such a trend of line charts of evaluation metrics with different thresholds when the pixel-weighted CE was used as the loss function during training the FC-DenseNet model. In Fig. 6, the F-measure reaches its peak value of 0.817 when T = 0.40. Therefore, the average recall, precision, and F-measure in our method are 0.798, 0.839, and 0.817, respectively. Fig. 7 illustrates two examples of rib and clavicle delineations predicted by the proposed method. The posterior ribs, the anterior ribs and the clavicles were automatically delineated, as shown in Fig. 7. 3.3. Comparison with other FCNs for delineating ribs and clavicles We trained two other FCNs for delineating ribs and clavicles: a popular edge detection network named HED [24] and a context

Ogul et al. [22] reported an unsupervised rib delineation by an integrative approach and achieved a reasonably good performance. They delineated ribs in five consecutive steps: 1) finding outmost rib starts, 2) selecting the best template, 3) estimating rib thickness, 4) parabola fitting, and 5) region growing. However, Ogul et al.’s method only apply to detect posterior ribs; the anterior rib and clavicle borders cannot be detected at the same time. The Canny detector [8] is famous for its application to edge detection. However, the Canny detector mainly concerns local cues such as gradients. We compared the performance on delineating ribs and clavicles in CXRs of the proposed method with the Canny detector and Ogul et al.’s method. Fig. 9 shows two examples of rib and clavicle delineation produced by their methods and the proposed method. In Fig. 9, Ogul et al.’s method can only detect the third to the tenth posterior ribs in the CXR. Considering the other rib edges are outside the lung field, we can ignore them and focus on the rib edges detected jointly by Ogul et al.’s method and our method. We found that the rib edges predicted by Ogul et al.’s method deviated from the real edges, especially the last pair of ribs. The Canny detector detected many irrelevant edges besides those of ribs and clavicle. By contrast, the proposed method can achieve better visual performance than Ogul et al.’s method and the Canny detector. 3.5. Cross-dataset generalization To investigate the generalization of the proposed method, the trained FC-DenseNet model was tested on two public datasets, JSRT dataset and NIH Chest X-ray14 dataset. Fig. 10 illustrates the delineation examples on the JSRT dataset and NIH Chest Xray14 dataset by the proposed method. Most edges of the ribs and clavicles in the CXRs from the two datasets were detected.

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

7

Fig. 7. Examples of rib and clavicle delineations predicted by the proposed method. The left column is the input CXR, the middle is the ground truth and the right is the delineations predicted by the proposed method.

Fig. 8. Comparison of the experimental results among different FCN models. Shown from left to right are the input CXRs, the ground truth, and the corresponding predictions of FC-DenseNet, HED, and CAN. Table 1 Performance of HED, CAN AND FC-DenseNet. Network

Precision

Recall

F-measure

MBD (pixel)

HED CAN FC-DenseNet (Ours)

0.692 ± 0.030 0.704 ± 0.036 0.861 ± 0.043

0.711 ± 0.026 0.639 ± 0.026 0.773 ± 0.030

0.701 ± 0.018 0.669 ± 0.012 0.814 ± 0.023

2.339 ± 0.932 1.281 ± 0.545 0.855 ± 0.642

8

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

Fig. 9. Examples of delineation results by the traditional methods without deep learning models and the proposed method.

Fig. 10. Examples of delineation results on the JSRT dataset and NIH Chest X-ray14 dataset by the proposed method.

However, the JSRT chest X-rays were stored as scanned films and the marginal margins of the ribcages showed strong intensity and low contrast with surrounding structures. Therefore, the proposed method cannot delineate the marginal edges of the ribcages very well as shown in Fig. 10. The scanned film radiographs in the JSRT database, the NIH Chest X-ray14, and the CXRs we collected for training our model exhibited different levels of contrast and intensity due to differences in acquisition conditions and image storage formats. The proposed method can provide reasonable delineations on different type of CXRs without training the FC-DenseNet model from scratch, which demonstrated its generalizability.

3.6. Bone suppression using the estimated bone delineations To validate the usefulness of our delineating system for bone suppression, namely, estimating soft-tissue images of CXRs, we suppressed the bone components by using the rib and clavicle delineations produced by our system based on gradient field transformation [34]. First, the preprocessed CXRs were fed to the trained FC-DenseNet model to delineate ribs and clavicles. Second, we suppressed the bone components of these CXRs by using the estimated rib and clavicle delineations with gradient field transformation method. This method uses cross projection tensors derived from local edge structures in one image to suppress edges in a second image. Here, the delineations containing the edge information

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

9

Fig. 11. Illustration of bone suppression by using the predicted delineation generated from our delineating system. Two patches in the red boxes from the CXR and the bone suppressed image at the same location are selected and zoomed for better view. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

of CXRs were used to suppress edges in CXRs. Fig. 11 shows an example of bone suppression by using the rib and clavicle delineations. Compared with the zoomed patches from the bone suppressed image and the original CXR, the bone components in the original CXR were suppressed effectively. In this work, we produced a preliminary result of suppressing bone components by using such a simple way. In our previous work, we proposed to the use of a cascaded convolutional network to suppress bone components in CXRs in gradient domain or in wavelet domain and have achieved appealing results [29,35]. 4. Discussion In our work, we propose a method to automatically delineate ribs and clavicles in CXRs based on a FC-DenseNet. In principle, modern edge detection networks, such as HED and CAN, can also be trained to detect particular boundaries of ribs and clavicles. Compared with these networks, FC-DenseNet exhibits higher performance and visually more appealing results in our experiments. The use of FC-DenseNet makes our method a completely automatic one in contrast to the previous methods without deep learning models. Methods without deep learning models usually delineated ribs or clavicles by employing multiple steps, whereas the proposed method can delineate ribs and clavicles simultaneously when they are placed in the same class. From generating the manual delineations, preprocessing, model selection and loss function, we elaborated our scheme to achieve better performance for delineating rib and clavicle. However, the performance of the proposed method was affected for some reasons. On one hand, some errors occurred during delineating manually for generating ground-truth delineations. The errors are caused by the following: 1) The ribs covering the heart region could not be seen clearly due to motion artifacts caused by breathing; thus, we intuitively connected the rib boundaries in these regions. 2) The rib head and the bottom two pairs of the rib contours outside the lung area may be missing in manual delineation. To some extent, these human errors may affect the quantitative evaluation metrics. On the other hand, our model was trained using a handful of training samples due to the difficulties of generating manual delineations. The data deficiency greatly limits the learning ability of the network, thereby affecting

the final performance. Thus, using more training samples will further improve the prediction performance of the proposed method. In addition, the proposed method cannot recognize the costal head well according to the predicted results. One possible reason is that the features of the costal head edge are more similar to the vertebral body rather than the other parts of a rib; as such, the network tends to classify the costal head edges as the background class. However, detecting the costal heads outside the lung area is not necessary for suppressing bone component s within the lung area. In the following work, the proposed method will be further improved to be more practical. We plan to first segment the lung areas of each CXR and then delineate the ribs and clavicles within the lung areas. In addition, some residual edges exist in the predicted soft-tissue images of our previous work relevant to bone suppression task. Inspired of Fan et al’s work [36], we plan to use the edge information estimated by our delineating system to design an edge-guided CNN for better suppressing bone components in CXRs. Besides, a faint remnant of the edges exist in the DES softtissue images that are used as the ground truth for training endto-end CNN models from CXRs to soft-tissue images. Therefore, we plan to use the edge information estimated by our delineating system to remove the residual edges without any information loss in the DES soft-tissue images so that the DES soft-tissue images are better to be used as the ground-truth for training the models of estimating “virtual” soft-tissue images.

5. Conclusion In this study, we propose a method for automatically delineating posterior ribs, anterior ribs, and clavicles in CXRs. Combined classic FC-DenseNet with well-designed pixel-weighted CE loss function, the proposed method could automatically delineate ribs and clavicles of CXRs in multiple public databases and produce accurate binary edge maps. Besides, the proposed method also was validated the usefulness for bone suppression of CXRs.

Acknowledgments This work was supported by the grants from National Natural Science Foundation of China (no. 81771916 and no. 61471187),

10

Y. Liu, X. Zhang and G. Cai et al. / Computer Methods and Programs in Biomedicine 180 (2019) 105014

Guangdong Provincial Key Laboratory of Medical Image Processing (no. 2014B030301042). Declaration of competing interest The authors declare that there is no conflict of interest regarding the publication of this paper. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted. Supplementary materials Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.cmpb.2019.105014. References [1] F. Li, R. Engelmann, L.L. Pesce, K. Doi, C.E. Metz, H. MacMahon, Small lung cancers: improved detection by use of bone suppression imaging—comparison with dual-energy subtraction chest radiography, Radiology 261 (2011) 937– 949, doi:10.1148/radiol.11110192. [2] L. Hogeweg, C.I. Sánchez, B. van Ginneken, Suppression of translucent elongated structures: applications in chest radiography, IEEE Trans. Med. Imaging 32 (2013) 2099–2113, doi:10.1109/tmi.2013.2274212. [3] J.-.S. Lee, J.-.W. Wang, H.-.H. Wu, M.-.Z. Yuan, A nonparametric-based rib suppression method for chest radiographs, Comput. Math. Appl. 64 (2012) 1390– 1399, doi:10.1016/j.camwa.2012.03.084. [4] T. Rasheed, B. Ahmed, M.A. Khan, M. Bettayeb, S. Lee, T.-.S. Kim, Rib suppression in frontal chest radiographs: a blind source separation approach, in: 2007 9th International Symposium on Signal Processing and Its Applications, IEEE, 2007, pp. 1–4, doi:10.1109/isspa.2007.4555516. [5] G. Simkó, G. Orbán, P. Máday, G. Horváth, Elimination of clavicle shadows to help automatic lung nodule detection on chest radiographs, in: 4th European Conference of the International Federation for Medical and Biological Engineering, Springer, 2009, pp. 488–491, doi:10.1007/978- 3- 540- 89208- 3_116. [6] A. Zarshenas, J. Liu, P. Forti, K. Suzuki, Mixture of deep-learning experts for separation of bones from soft tissue in chest radiographs, in: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2018, pp. 1321– 1326, doi:10.1109/SMC.2018.00231. [7] A. Zarshenas, J. Liu, P. Forti, K. Suzuki, Separation of bones from soft tissue in chest radiographs: anatomy-specific orientation-frequency-specific deep neural network convolution, Med. Phys. 46 (2019) 2232–2242, doi:10.1002/mp.13468. [8] J. Canny, A computational approach to edge detection, Readings in computer vision, Morgan Kaufmann, 1987, pp. 184–203, doi:10.1016/ B978- 0- 08- 051581- 6.50024- 6. [9] L. Hogeweg, C.I. Sánchez, P.A. de Jong, P. Maduskar, B. van Ginneken, Clavicle segmentation in chest radiographs, Med. Image Anal. 16 (2012) 1490–1502, doi:10.1016/j.media.2012.06.009. [10] C. Cernazanu-Glavan, S. Holban, Segmentation of bone structure in X-ray images using convolutional neural network, Adv. Electr. Comput. Eng 13 (2013) 87–94, doi:10.4316/aece.2013.01015. [11] X. Li, S. Luo, Q. Hu, An automatic rib segmentation method on X-Ray radiographs, in: International Conference on Multimedia Modeling, 2015, pp. 128– 139, doi:10.1007/978- 3- 319- 14445-0_12. [12] S. Candemir, S. Jaeger, S. Antani, U. Bagci, L.R. Folio, Z. Xu, G. Thoma, Atlasbased rib-bone detection in chest X-rays, Comput. Med. Imaging Graph 51 (2016) 32–39, doi:10.1016/j.compmedimag.2016.04.002. [13] F. Vogelsang, F. Weiler, J. Dahmen, M.W. Kilbinger, B.B. Wein, R.W. Guenther, Detection and compensation of rib structures in chest radiographs for diagnostic assistance, medical imaging 1998: image processing, Int. Soc. Opt. Photonics (1998) 774–786, doi:10.1117/12.310957.

[14] B. van Ginneken, B.M. ter Haar Romeny, Automatic delineation of ribs in frontal chest radiographs, medical imaging 20 0 0: image processing, Int. Soc. Opt. Photonics (20 0 0) 825–837, doi:10.1117/12.387746. [15] H. Wechsler, Automatic detection of rib contours in chest radiographs, (1975), doi:10.1007/978- 3- 0348- 5767- 3. [16] J.I. Toriwaki, Y. Suenaga, T. Negoro, T. Fukumura, Pattern recognition of chest X-ray images, Comput. Graph. Image Process. 2 (1973) 252–271. [17] C.M. Brace, J. Kulick, T. Challis, Automatic Rib Detection in Chest Radiographs, Queen’s University, 1977. [18] С. Li, С. Fong, Extraction of dorsal rib contours in chest radiographs by parallel-type processing, Proc. Int. Comput. Symp. (1978) 705–712. [19] B.V. Ginneken, B.M.T.H. Romeny, Automatic delineation of ribs in frontal chest radiographs, 3979 (20 0 0), doi:10.1117/12.387746. [20] B.V. Ginneken, Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning, Radiol. Phys. Technol. 10 (2017) 23–32, doi:10.1007/s12194- 017- 0394- 5. [21] R. Moreira, A.M. Mendonça, A. Campilho, Detection of rib borders on X-ray chest radiographs, in: International Conference Image Analysis and Recognition, Springer, 2004, pp. 108–115, doi:10.1007/978- 3- 540- 30126- 4_14. [22] B.B. Ogul, E. Sümer, H. Ogul, Unsupervised rib delineation in chest radiographs by an integrative approach, in: VISAPP (1), 2015, pp. 260–265, doi:10.5220/ 0 0 0536160260 0265. [23] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3431– 3440, doi:10.1109/cvpr.2015.7298965. [24] S. Xie, Z. Tu, Holistically-nested edge detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1395–1403, doi:10.1109/ ICCV.2015.164. [25] Q. Chen, J. Xu, V. Koltun, Fast image processing with fully-convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2516–2525, doi:10.1109/iccv.2017.273. [26] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4700–4708, doi:10.1109/CVPR.2017.243. [27] S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, Y. Bengio, The one hundred layers Tiramisu: fully convolutional densenets for semantic segmentation, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 11–19, doi:10.1109/ cvprw.2017.156. [28] K. He, J. Sun, X. Tang, Guided image filtering, in: European Conference on Computer Vision, Springer, 2010, pp. 1–14, doi:10.1007/978- 3- 642- 15549- 9_1. [29] W. Yang, Y. Liu, L. Lin, Z. Yun, Z. Lu, Q. Feng, W. Chen, Lung field segmentation in chest radiographs from boundary maps by a structured edge detector, IEEE J. Biomed. Health Inf. 22 (3) (2017) 842–851, doi:10.1109/JBHI.2017.2687939. [30] O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks for biomedical image segmentation, in: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, 2015, pp. 234–241. [31] J.-.J. Hwang, T.-.L. Liu, Contour detection using cost-sensitive convolutional neural networks, ArXiv14126857 Cs. (2014). http://arxiv.org/abs/1412.6857. [32] T.-.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988, doi:10.1109/iccv.2017.324. [33] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034, doi:10.1109/ iccv.2015.123. [34] A. Agrawal, R. Raskar, R. Chellappa, Edge suppression by gradient field transformation using cross-projection tensors, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2006, pp. 2301–2308, doi:10.1109/cvpr.2006.106. [35] Y. Chen, X. Gou, X. Feng, Y. Liu, G. Qin, Q. Feng, W. Yang, W. Chen, Bone suppression of chest radiographs with cascaded convolutional networks in wavelet domain, IEEE Access (2019), doi:10.1109/access.2018.2890300. [36] Q. Fan, J. Yang, G. Hua, B. Chen, D. Wipf, A generic deep architecture for single image reflection removal and image smoothing, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3238–3247, doi:10.1109/ iccv.2017.351.