In order to reduce the potential radiation risk, low-dose CT has gained increased attention in medical imaging community. Currently, patients go through multiple x-ray CT scans during image-guided radiation therapy, which elevates the potential risk for tissue damage and radiation-induced cancer [1, 2]. However, simply lowering the radiation dose will significantly degrade the image quality. Therefore, there is increasing demand for fast image reconstruction algorithms that can produce higher quality images in clinically relevant time. In this chapter, we propose a new noise reduction method for low-dose CT via deep learning without accessing original projection data. A deep convolutional neural network is used to map low-dose CT images towards its corresponding normal-dose counterparts using recently proposed residual learning method [3]. Qualitative results demonstrate a great potential of the proposed method on artifact reduction and structure preservation. In terms of the quantitative metrics, the proposed method has showed a substantial improvement on PSNR, RMSE and SSIM than the competing state-of-art methods like Block Matching 3D (BM3D) [4] and Weighted Nuclear Norm Minimization (WNNM) [5]. Furthermore, the speed of our method is significantly faster than the iterative and linear reconstruction methods discussed in previous chapters.
7.1 Theory
7.1.1 Deep Neural Networks for x-ray Image Denoising
Most clinical x-ray CT scanners currently being used employ some version of analytical reconstruction algorithms like FBP or FDK. However, in low dose x-ray CT, the linear reconstruction algorithms introduce severe artifacts typically due to beam hardening, photon starvation, scatter and other causes which reduces the diagnostic reliability. Therefore, high quality diagnostically relevant low dose x-ray CT reconstruction is a topic of major research effort. In previous chapters, we have observed that model based image reconstruction problems perform reliably well but they are still computationally expensive even with the introduction of multiple GPUs in parallel. As a result, we have explored the possibility of leveraging tremendous potential of artificial intelligence especially deep convolutional neural networks to perform x-ray CT image denoising.
The concept of first feed forward supervised deep multilayer perceptron was introduced by Alexey Ivakhnenko in 1965 [6]. Other researcher subsequently used deep learning in computer vision, speech recognition problems, however their application and adoption was somewhat limited by the astronomically high computational cost. In 2009, Nvidia was involved in what was called the “big bang” of deep learning, as deep-learning neural networks were trained with Nvidia graphics processing units (GPUs). GPUs speed up training algorithms by orders of magnitude, reducing running times from weeks to days.
In May 2016, IEEE Transactions on Medical Imaging published a special issue on ‘‘Deep Learning in Medical Imaging’’ [7] containing 18 special issue articles outlined the tremendous potential of deep learning based algorithms in the medical imaging domain. Over the year several researchers have tried to harness the sophisticated pattern recognition power of deep networks and apply that to low dose CT denoising field [8-11]. Deep Convolutional Neural Network (CNN) can easily learn high dimensional features through a hierarchical framework. The main advantage of this approach is the low computational burden along with seamless integration with the post processing workflow from CT scanner reconstruction without ever accessing the sinogram itself.
In this work, we treat the learning problem as a discriminative one i.e. separating the noise from noisy image by feed forward CNN instead of learning over a discriminative model with predefined image prior. We use deep architecture to extract high level image patterns and characteristics [12], batch normalization [3, 13], and residual learning [13, 14] to speed up our learning rate. We have also parallelized our algorithm and implemented it on NVIDIA TITAN X GPUs to reduce computational time. The main advantage of our design is the use of residual learning to learn and extract the pattern of noise itself instead of learning complex organ structures typically present in x-ray CT images.
7.1.2 Residual Learning and Batch Normalization
The main motivation for the use of deep residual learning proposed by Kaiming et. al [14] stem from the increased difficulty in training deeper networks. They reformulated their learning problem as a residual function with reference to the layer inputs, instead of learning the unreferenced function. With growing evidence in favor of residual mapping being easier to learn rather than original unreferenced mapping, residual networks can learn residual mapping in a few stacked layers thereby increasing training accuracy with increasing network depth. Leveraging this residual network strategy, we can form deep CNN which can easily learn complex noise pattern present in x-ray CT measurements arising from various factors like conebeam artifact, detector edge response, beam hardening and scatter. In our approach, we use a single residual unit to predict the residual image similar to the methods used by Kai Zhang et al. [13].
One of the major problems in training deep networks is the fact that the distribution of the network’s input changes during training which slows down learning rate and requires careful initialization of parameters. As a result, the distribution of internal non- linearity inputs changes during training which is known as internal covariance shift [3]. Batch Normalization (BN) is therefore used to reduce internal covariance shift by introducing a normalization step and performing the performing the normalization for each mini batch of our training CNN model. Batch normalization has shown to increase leaning rate, quantitative accuracy and reduce overall dependence to accurate initialization of parameters [3]. Batch normalization only adds two extra parameters per activation layer and they can be easily updated with back-propagation.
We have proposed that addition of both batch normalization and residual learning can enhance the Deep CNN performance on learning complex x-ray CT noise pattern and at the same time result in fast, robust and stable training regimen. In the subsequent chapters, we have discussed the details of our training network and the performance of our network on simulated low-dose x-ray CT noise.
7.1.2 Proposed Network Model
In this section we have discussed the rationale behind our proposed network architecture and training parameters. Following the improved results from using very small
3×3
convolutions filters for deep network architecture [15], we adopt this architecture instead of pooling layers. Therefore, the size of our receptive field is
2D+1×2D+1
for a network of depth
D
. Higher receptive depth field is advantageous in capturing high level x-ray CT image details and texture information. For our general image denoising task we set a receptive field size of
41×41
with corresponding network depth of
20
.
The input to our Deep CNN is a noisy low dose x-ray CT reconstructed image denoted by
μLD(x)
, where
x
denotes the voxel indices. We can represent our noisy observation as follows
μLDx=μHDx+β(x) | (7.1) |
where
μHDx
is the equivalent high dose (clean) image and
β(x)
is the added measurement noise. The noise model is described in the following chapter but for our current analysis we can assume it as an additive noise model. Our deep CNN residual learning is trained on the residual mapping
β(x)
. We have used averaged mean squared error as our error estimate for training purposes
ℇΘ=12N∑x=1NRμLDx;Θ-μLDx-μHDx2 | (7.2) |
where
Θ
denote all the training parameters,
R.
is the residual mapping function,
ℇ.
is the error function, and
N
is the total number of voxels.
Fig ? The architecture of our proposed deep CNN
For a given depth
D
, we have three different layers shown in different colors in fig ?. The first layer is called Conv+ReLU which stands for a combination of convolutional (Conv) and Rectified Linear Unit (ReLU) layers. Each of these layers consists of a standard ReLU (
max0,.
) function and 64 filters of size
3×3
used to generate 64 feature maps. Conv+BN+ReLU are the next
D-2
layers consisting of 64 filters of size
3×3
used to generate 64 feature maps, batch normalization, and ReLU. The last Conv layer consists of a filter of size
3×3
to reconstruct the residual image output.
For our optimization problem we use mini batch stochastic gradient descent (SGD) method known as ADAM [16]. The main advantage of using Adam’s SGD algorithm is that the hyperparameters have intuitive interpretations and they require minimal tuning. Adam optimization with batch normalization and residual learning paradigm have shown to produce faster convergence and better denoising performance for gaussian noise compared to other state-of-the-art denoising networks [13].
7.2 Experiments
7.2.1 CT Noise Model
The accurate noise model for this study was developed by Dr. Bruce R. Whiting with the support of the NIH grant “Measuring the Impact of Noise on CT Readers”, 5-R01-EB019135-03. The overall noise consists stochastic acquisition noise [17] (both quantum and electronic) since these kinds of noise are directly related to radiation exposure. The basic acquisition noise model in sinogram domain can be treated as a random point process due to little temporal and spatial correlation between measurements [18, 19]. However, in x-ray CT image domain, the noise model is non-local and correlated over many pixels, which makes the standard denoising algorithms like BM3D and WNNM quite ineffective [20].
Fig. ? Simulation flowchart
The amount of synthetic noise
β
added to the high dose image is computed by equating the Noise Equivalent Quanta (NEQ) of the target low dose scan image (reduced by a predetermined factor
ρ
) to the NEQ of the high dose with some added noise
β
.
q2q+β(g,d,ρ)+βs=qρ2qρ+βs, | (7.3) |
where
q
is the flux,
βs
is the system noise,
g
is the gantry index, and
d
is detector index. The magnitude of
β(g,d,ρ)
can be reformulated as done previously in [21],
βg,d,ρ=pd×Q0×Tg,d×1ρ-1+βs×1ρ2-1 | (7.4) |
where
p
is the bowtie profile,
Q0
is flux, and
T
represents tube current modulation. In the data flow described in fig. ?, synthetic noise is injected to create a simulated image. In order to create an ensemble, the random noise generation step is repeated for every image slice.
7.2.2 Training and Testing Data
The data used in this study was collected as a part of the NIH grant “Measuring the Impact of Noise on CT Readers”, 5-R01-EB019135-03, Bruce R. Whiting P.I. We have collected x-ray CT dicom images 60 regular cases and 60 appendicitis cases. Each of these 3D x-ray image volumes on average consists of 400 slices. However, we have only used 20 regular cases and 20 appendicitis cases with a total of ~16000 image slices. The noise level introduced in the image was varied using the parameter
ρ
in equation (7.4). The choice of the parameter
ρ
was selected from the noise observer study with a small random fluctuation of
±1
. We use a patch size of
40×40
for 16000 images of size
512×512
for training.
For testing our deep CNN denoising performance, we use 3 new appendicitis cases out the remaining unused 20 cases. We initialize the weights by the method in [22] and use Adam’s SGD with weight decay of
0.0001
, a momentum of
0.9
and a mini-batch size of
128
. We train
50
epochs for our deep CNN models. The learning rate was decayed exponentially from
1e-1
to
1e-4
for the 50 epochs. We use the MatConvNet package [23] to train the proposed deep CNN models. All the experiments are carried out in the Matlab (R2017b) environment running on a PC with
8
-core Intel
i7-5960X
(
3.0
GHz,
1333
MHz front-side bus),
64
GB RAM (
1.2
GHz) and a Nvidia Titan X GPU. It takes about one and a half day to run our algorithm for 50 epochs on the specified dataset.
7.2.3 Compared Methods
We compare the proposed deep CNN method with two state-of-the-art, non-local similarity-based methods denoising methods: BM3D [4] and WNNM [5]. In BM3D, the image denoising is based on nonlocal image modeling, principal component analysis abs local shape-adaptive anisotropic estimation. The nonlocal image modeling is exploited by grouping similar image patches in 3-D groups. WNNM algorithm on the other hand iteratively finds an analytical fixed-point solution of the data fidelity term constructed over the noisy image and approximate low noise solution. Experimental results clearly show that the proposed WNNM algorithm outperforms BM3D in terms of both quantitative measure and visual perception quality. The implementation codes are downloaded from the authors’ websites and the default parameter settings are used in our experiments.
In order to compare the performance of our Deep CNN based denoising technique with other existing methods, we use Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM), and Root Mean Square Error (RMSE) as image quality metrics. Given a noisy image
K
of size
M×N
, and it’s denoised estimate
I
, the RMSE is defined as
RMSE=1MN∑i=1M∑j=1NKi,j-I(i,j)2. | (7.5) |
If we define the maximum intensity of the denoised image as
MAXI
, then PSNR can (in dB) is defined as:
PSNR=10∙log10MAXI21MN∑i=1M∑j=1NKi,j-I(i,j)2. | (7.6) | |
PSNR=20∙log10MAXI-10∙log101MN∑i=1M∑j=1NKi,j-I(i,j)2. | (7.7) |
The difference between SSIM and other techniques mentioned previously such as RMSE or PSNR is that these approaches estimate absolute errors; on the other hand, SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is a significant activity or “texture” in the image. The SSIM index is calculated on various windows of an image. The measure between two windows
x
and
y
of common size
N×N
is:
SSIM(x,y)=2μxμy+c12σxy+c2μx2+μy2+c1σx2+σy2+c2. | (7.8) |
where
μx
and
μy
are the means of all the pixels in the window
x
and
y
respectively,
σx2
and
σy2
are the variance in the windows
x
and
y
respectively, and
σxy
is the variance between
x
and
y
.
c1=k1L2
and
c2=k2L2
are used to stabilize the division with weak denominator.
L
is the dynamic range of the pixel-values (typically this is
2#bits per pixel-1
). The default values of
k1
and
k1
are
0.01
and
0.03
respectively.
(a) | (b) |
(c) | (d) |
(e)
Fig. ? (a) Clinical abdominal image collected from Siemens Somatom Definition AS scanner. Voxel size =0.576×0.576×1.0 mm. Scan parameters: 180 mAs, pitch 0.75, 19×0.6 mm collimation. Abdominal display window is -160HU to 240 HU. (b) Low dose noisy image. (c) Denoised image BM3D algorithm. (d) Denoised image with WNNM algorithm. (e) Denoised image with our proposed Deep CNN based method.
(a) | (b) |
(c) | (d) |
(e)
Fig. ? (a) Clinical abdominal image collected from Siemens Somatom Definition AS scanner. Voxel size =0.576×0.576×1.0 mm. Scan parameters: 180 mAs, pitch 0.75, 19×0.6 mm collimation. Abdominal display window is -160HU to 240 HU. (b) Low dose noisy image. (c) Denoised image BM3D algorithm. (d) Denoised image with WNNM algorithm. (e) Denoised image with our proposed Deep CNN based method.
(a) | (b) |
(c) | (d) |
(e)
Fig. ? (a) Clinical abdominal image collected from Siemens Somatom Definition AS scanner. Voxel size =0.576×0.576×1.0 mm. Scan parameters: 180 mAs, pitch 0.75, 19×0.6 mm collimation. Abdominal display window is -160HU to 240 HU. (b) Low dose noisy image. (c) Denoised image BM3D algorithm. (d) Denoised image with WNNM algorithm. (e) Denoised image with our proposed Deep CNN based method.
Figure No. ? | PSNR (dB) | SSIM | RMSE |
BM3D | 25.2158 | 0.9121 | 13.9878 |
WNNM | 25.6885 | 0.909 | 13.2470 |
Deep CNN | 27.1579 | 0.9225 | 11.1853 |
Figure No. ? | PSNR (dB) | SSIM | RMSE |
BM3D | 25.6644 | 0.9452 | 13.2837 |
WNNM | 25.9677 | 0.9398 | 12.8279 |
Deep CNN | 27.5299 | 0.9514 | 10.7163 |
Figure No. ? | PSNR (dB) | SSIM | RMSE |
BM3D | 26.3476 | 0.9562 | 12.2789 |
WNNM | 26.39 | 0.9529 | 12.2131 |
Deep CNN | 27.9087 | 0.9614 | 10.2591 |
Table I. The PSNR(dB), SSIM, and RMSE values for the 3 image slices shown in the figures previously.
7.4 Conclusion
Extensive experimental results have demonstrated that the proposed method produces superior image denoising performance both quantitatively and qualitatively. On average, our denoising method outperforms BM3D method by almost
2.2dB
and WNNM method by
1.7dB
. The SSIM and RMSE metric also shows better performance with our denoising method. In addition to visual quality, another important aspect for an image restoration method is the testing speed. We use the Nvidia cuDNN-v5 deep learning library to accelerate the GPU computation of the proposed Deep CNN. We have ignored data transfer between CPU and GPU from our model execution time. With a single TITAN X GPU, we can run our Deep CNN based denoising algorithm on an
512×512
image with an average time of
53ms
whereas BM3D takes on average
2.85s
and WNNM takes on average
773s
on a CPU. With GPU acceleration, BM3D may run slightly faster than our Deep CNN implementation, however, the image quality enhancement is significantly better with our method.
For the deep CNN based denoising algorithm, we assume images are 2D even though they are reconstructed from a spiral scan. As we have discussed before, the spiral scanning introduces its own artifacts. However, we haven’t incorporated that into the noise model for our analysis. To the best of my knowledge, 3D multislice x-ray CT images haven’t been denoised with neural networks by other research groups. Further work is needed to model 3D nature of the noise statistics and incorporate that into our residual denoising method.
References:
1. Rehani, M.M., et al., Managing patient dose in computed tomography. Ann ICRP, 2000. 30(4): p. 7-45.
2. Kan, M.W., et al., Radiation dose from cone beam computed tomography for image-guided radiation therapy. International Journal of Radiation Oncology* Biology* Physics, 2008. 70(1): p. 272-279.
3. Ioffe, S. and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. in International conference on machine learning. 2015.
4. Dabov, K., et al. BM3D image denoising with shape-adaptive principal component analysis. in SPARS’09-Signal Processing with Adaptive Sparse Structured Representations. 2009.
5. Gu, S., et al. Weighted nuclear norm minimization with application to image denoising. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
6. Ivakhnenko, A.G.e. and V.G. Lapa, Cybernetic predicting devices. 1966, PURDUE UNIV LAFAYETTE IND SCHOOL OF ELECTRICAL ENGINEERING.
7. Greenspan, H., B. van Ginneken, and R.M. Summers, Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, 2016. 35(5): p. 1153-1159.
8. Kang, E., J. Min, and J.C. Ye, A deep convolutional neural network using directional wavelets for low‐dose X‐ray CT reconstruction. Medical physics, 2017. 44(10).
9. Chen, H., et al., Low-dose CT via convolutional neural network. Biomedical optics express, 2017. 8(2): p. 679-694.
10. Chen, H., et al., Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE transactions on medical imaging, 2017. 36(12): p. 2524-2535.
11. Wang, G., A perspective on deep imaging. IEEE Access, 2016. 4: p. 8914-8924.
12. Krizhevsky, A., I. Sutskever, and G.E. Hinton. Imagenet classification with deep convolutional neural networks. in Advances in neural information processing systems. 2012.
13. Zhang, K., et al., Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 2017. 26(7): p. 3142-3155.
14. He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
15. Simonyan, K. and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
16. Kingma, D.P. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
17. Whiting, B.R., et al., Properties of preprocessed sinogram data in x‐ray computed tomography. Medical physics, 2006. 33(9): p. 3290-3303.
18. Wang, J., et al., Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low-dose X-ray computed tomography. IEEE transactions on medical imaging, 2006. 25(10): p. 1272-1283.
19. Hsieh, J., O. Gurmen, and K.F. King, Investigation of a solid-state detector for advanced computed tomography. IEEE transactions on medical imaging, 2000. 19(9): p. 930-940.
20. Britten, A., et al., The addition of computer simulated noise to investigate radiation dose and image quality in images with spatial correlation of statistical noise: an example application to X-ray CT of the brain. The British journal of radiology, 2004. 77(916): p. 323-328.
21. Massoumzadeh, P., et al., Validation of CT dose‐reduction simulation. Medical physics, 2009. 36(1): p. 174-189.
22. He, K., et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. in Proceedings of the IEEE international conference on computer vision. 2015.
23. Vedaldi, A. and K. Lenc. Matconvnet: Convolutional neural networks for matlab. in Proceedings of the 23rd ACM international conference on Multimedia. 2015. ACM.