Machine Learning-Based Analysis of Bladder Cytology Images
1 Table of Contents
Bladder cancer becomes the ninth most common cancer for both male and female combined, for example in 2002, there are approximately 35700 bladder cancer cases occurs; it is the eighth frequency cancer for male and sixteenth frequency cancer for female(Parkin, 2008). Most patients are diagnosed with the superficial and non-muscle-invasive tumour. More than two-thirds of these tumours recur after surgeon attempt’s to remove cancer, and in which around ten to thirty per cent progresses to muscle-invasive tumours(Saeb-Parsy et al., 2012). Because of this, bladder cancer patients will often undergo a surveillance program to reduce and mortality and morbidity(Saeb-Parsy et al., 2012). Because of the lifelong surveillance and possibility of needing life-long treatments, the cost of bladder cancer for patients from diagnosis to death is the highest among all cancers, it cost around $96000 to $187000 (2001 value) for each patient(Botteman et al., 2003). In 2001, bladder cancer medical care costs the US around $3.7billion(Botteman et al., 2003).
Besides the financial burden for both patients and the country’s health care system, bladder cancer patients will undergo life-long cystoscopy which is an endoscope inserted into the bladder to identify whether the bladder cancer has returned. It is not only costly to patients but also not comfortable for patients and may cause urinary tract infection (Almallah et al., 2000). In this project, we could to identify whether a patient’s bladder cancer return from a urine sample so that patients could forgo an invasive procedure such as cystoscopy. To reduce cystoscopies, we hope to develop a high sensitivity test. If patients are diagnosed with bladder cancer in a urine cytology test, then they still need to undergo cystoscopies to evaluate the size and location of the tumours. Since we want the procedure has high sensitivity, patients are reported cancer-free could forgo the cystoscopy.
The goal of this project is implementing a patch-based convolutional neural network algorithm to predict bladder cancer from bladder cytology images. False-negative results are extremely dangerous to patients since cancer patients could not get proper treatments. The algorithm ensures a very high sensitivity at the expense of specificity in an attempt to reduce cystoscopies. Only patients classified with cancer need to go to a hospital for cystoscopies.
The main challenges presented in this project are as follows:
· Computational impossible training on gigabytes of WSI
· Extreme imbalanced data
· Finding suitable patch-based convolution neural network architectures
· Algorithm that combines patches-based CNN and predicting WSI with high sensitivity
In this section, we present the relevant background needed to understand the motivation of our work.
Bladder cancer is a heterogeneous disease with around seventy per cent patients have superficial tumours which tend to recur but normally are not life-threatening, and around thirty per cent patients have life-threatening muscle-invasive bladder cancer (Maffezzini and Campodonico, 2010). Up to fifty to seventy per cent of the superficial tumour (stage Ta, T1 or tumours in situ [Tis]) will recur and ten to twenty per cent of superficial tumours will progress to be muscle-invasive (Maffezzini and Campodonico, 2010). Non-muscle-invasive bladder cancer consists of three groups. The first group contain a small number of patients who have low-risk tumours with low chance to recur and progression. The second and largest group of patients would frequently develop non-muscle-invasive tumours but these tumours rarely progress. The third group of patients have relatively aggressive non-muscle-invasive tumours and could develop to muscle-invasive cancer even with the maximum treatment. The major goal to treat patients with non-muscle-invasive bladder cancer is to reduce the high recurrence of tumours and prevent them from progressing to be muscle-invasive.
The most common symptom for all bladder cancer is painless gross hematuria which means the presence of blood cells in the urine. It will cause visible red or brown discolouration of the urine. But hematuria could be presented because of other diseases, so it will cause a high false-positive rate if it is used alone for detecting bladder cancer. Cystoscopy is the gold standard in the treatment of non-muscle-invasive bladder cancer(van der Heijden and Witjes, 2009). Cystoscopy is the procedure which an endoscope is inserted into a patient’s bladder to allow doctors to determine whether the bladder cancer is presented. Cytology is also widely applied in the diagnostic of bladder cancer, but it has low sensitivity. Other tests are also used in diagnostic of bladder cancer, for example, BTA stat and NMP22 give an instant result but have low accuracy(van der Heijden and Witjes, 2009). Urine markers could be used to replace some of the cystoscopy and cytology because of its have high sensitivity(van der Heijden and Witjes, 2009).
Since non-muscle-invasive tumours have a high recurrence rate after transurethral resection and the possibility of progress to muscle-invasive tumours, it is crucial to have a follow-up program set up for patients. After transurethral resection of the tumours, patients should have cystoscopy and urine cytology every three months for two years and then semiannually for two years, and yearly after that(Maffezzini and Campodonico, 2010). However, it is found that increasing age was some prognostic value in non-invasive-bladder cancer(p = 0.08) (van der Heijden and Witjes, 2009). So elder people are more likely to suffer from bladder cancer. It is not only costly for healthcare provider but also uncomfortable for patients and may cause infection especially for elder people. On top of this, travelling to the hospital may also be troublesome for some elderly people.
Because of financial burden, discomfort and side effect of cystoscopy in the follow up of non-muscle-invasive bladder cancer, there is an increasing need for developing more reliable urinary biomarkers to identify the occurrence of tumours. In this project, immunofluorescence experiments are used to label proteins of interest in patients’ sample.
After the urine samples were collected, the cells were fixed and purified and then mounted onto a microscope and labelled with biomarkers. An antibody against Pan-Cytokeratin which labels urothelial cells and an antibody against minichromosomal maintenance protein-2(MCM-2) which labels a protein involved in cell proliferation since cancer cell dividing faster than healthy cells. Urothelial cells line the bladder and it is where bladder cancer originates from. The sample once labelled is then digitized and save as czi image file for later analysis.
The manual reporting and analyzing cytology images by doctors are arduous, time-consuming and subjective. It heavily depends on pathologist experiences and may cause variations between different experts. Because of these, there is an increasing number of researchers try to use machine learning techniques to analyze whole slide image (WSI).
Convolutional neural networks apply largely on image processing and computer vision because it mimics the human visual system. Recent years, the performance of some convolution neural networks on ImageNet are even close to human performance. AlphaGo which is built on convolutional neural networks could beat human players. CNN architecture shows its ability in solving various challenging tasks, especially in dealing with medical images. It was used in breast cancer classification of histopathological images(Alom et al., no date), digital mammographic tumor classification(Huynh, Li and Giger, 2016),pulmonary tuberculosis classification(Lakhani and Sundaram, 2017), volumetric medical image segmentation (Milletari, Navab and Ahmadi, 2016) and generating synthetic CT(Han, 2017).
Image classification model trained from scratch could take a long time to train and need a large amount of labelled data to get a satisfying result. Even the dataset is large, training a deep CNN from scratch is computation-intensive, time- consuming and require large memory resources. It also requires constant adjustment on hyperparameters, structures of CNN and expert knowledge to make sure the model is not underfitting nor overfitting (Tajbakhsh et al., 2016). Fine-tuning CNN could perform much better than models trained from scratch if the size of labelled data is limited. Even when the dataset is large, fine-tuning CNN is also proved to be useful because it could capture generic features which are learned from various kinds of images(Chen et al., 2017). The fine-tuning strategy is a good solution to transfer pre-trained CNN with general image recognition ability to a specific domain (Tajbakhsh et al., 2016)(Reyes, Caicedo and Camargo, 2015).
Thanks to the ImageNet Large Scale Visual Recognition Challenge run by ImageNet, we could apply transfer learning on pre-trained CNN which is an excellent baseline and feature extractor. AlexNet outperformed other CNN and won the challenge in 2012, it contains five convolutional layers and three fully connected layers(Krizhevsky, Sutskever and Hinton, no date). The use of ReLu function over sigmoid and dropout layer in AlexNet help models to train faster and prevent overfitting. VGGNet is also another model that has a good performance in ImageNet competition. It replaced kernels of size 11*11 and 5*5 convolutional layer with stacking very small (3*3) convolutional filters. Because of that, it is more receptive which improves significantly on model performance. With the increasing complexity of the neural network, training a model became computation-intensive and time-consuming. GoogleNet’s architecture is carefully designed to efficiently use computation resources by stacking inception modules which were trained to choose different types of feature extractor(Szegedy et al., no date). In theory with the increasing number of layers, the training error should be decreasing, however, in reality, passing a certain point, increasing layers will increasing the training error. State-of-the-art model Resnet uses residual blocks to make training error decreasing with the increasing number of layers even there are more than 100 layers in Resnet(He, no date).
Even though CNN is state-of-art in image classification, training the gigabytes of whole slide image is considered computationally impossible with current hardware. Researchers extracted patches from WSI to train models based on batches and predicting WSI based on the model. With the use of expectation-maximization based method to identify the patches automatically for training, subtypes of cancers could be classified with accuracy similar to pathologists(Hou et al., no date).
Researchers at the University of Texas at Arlington and Tencent AI Lab developed whole-slide histopathological image survival analysis framework(WSISA) using patched-based CNN(Zhu et al., 2017). Instead of extracting patches from region of interest, they randomly sampled patches(512*512) from patient’s WSI, clustering on the phenotypes which are represented by smaller size image(50*50), selecting top 50 clusters based on Principal component analysis(PCA), generating weighted features and integrate patches together to predict patient’s survival(Zhu et al., 2017) . Compared with patches extracted from areas of interest, WSISA performed better with average best accuracy range from 0.638 to 0.703 on three different medical image dataset(Zhu et al., 2017).
Learning rate is one of the most important hyperparameters to tune in the processing of model training. Normally, the learning rate is monotonically decreasing with a step-wise or exponential function so that the model could learn faster at the beginning and slower later to not miss the local minima. Adam is also an optimization algorithm that can be used in model training, however, a lot of current state-of-the-art models are trained using simple SGD with momentum instead of Adam(Loshchilov and Hutter, 2017). It is suspected to obtain a good local minimum but performs not as good as simple SGD with momentum on generalization. The using of cyclical learning rate which means that learning rate cyclically between reasonable upper and lower bound could increase the accuracy of the model quickly by having chance to skip poor local minima(Smith and Ave, 2017). However, cyclical learning on SGD with momentum could benefit more than it with Adam. Also, the implementation of the cyclical learning rate is not computation-intensive which is beneficial to training with limited resources. With the similar thoughts on varying learning rate between maximum and minimum so that model could improve performance and accelerate the training process, Stochastic gradient descent with warm restart is another efficient optimizer (Loshchilov and Hutter, 2017).
Original Ethical application is proved and the author is added as researchers in the ethical amendment application form. The School of Medicine Ethics Committee is delegated to act on behalf of the University Teaching and Research Ethics Committee (UTREC) and approved the ethical amendment application. Please contact the author and supervisor through email if any question regards ethics is presented.
The 1.4 terabytes dataset used in this project contains 1042 whole slide images (WSI) with 502 of them are labelled. The images in our dataset are grey-scale. In the labelled images, 254 WSI have better qualities image with Hoechst staining the DNA and nuclei in the cellular image and Pan Cytokeratin using as a marker for urothelial cells (the dataset will be mentioned as dataset A). In the 254 labelled WSI, 36 is labelled with cancer and 218 WSI is labelled without cancer. For the remaining 248 images (dataset B), there are 33 WSI labelled cancer and 215 WSI labelled without cancer. The datasets are also split into seventy per cent of training data, fifteen per cent of validation data and fifteen per cent of testing data.
Because of the limitation of the computational power, it is impossible to train on the whole slide image (WSI). So, patches of size 224*224 were extracted from WSI after converting greyscale image to RGB images. Accessing information from CZI file, processing and extracting patches from gigabyte WSI are extremely time-consuming and computational-intensive, patches are saved as png file for latter training.
Another challenge we faced is the imbalanced data, out of 502 labelled WSI, only 69 WSI are labelled cancer. Also, training more than 500 gigabytes of data makes the training process extremely slow. Because of these, five per cent of patches extracted from cancer-free WSI and one-third of patches extracted from cancer WSI are used in the training process.
To prevent overfitting, training discriminative convolution neural network and increase validation accuracies, various of data augmentation strategies can be used in model training, for example, flip, Gaussian noise, jittering, scaling, power, gaussian blur, rotation and shears. In this project, random horizontal flip, vertical flip and 90-degree rotation were applied to training and validation data because of its good performance on increasing training and validation accuracy and easy implementation(Hussain et al., 2017).
In previous chapters, different convolution neural network architectures are mentioned and analyzed for their pros and cons. Pretrained Resnet18 is used in this project because of its excellent performance and low-complexity. The fine-tuning technique is used instead of training the entire network from scratch. Since the learning rate is important hyperparameter in the training process, we use two optimizers to train two separate models for each dataset (dataset A and dataset B).
The first model is trained with usual step optimizer which the learning rate will decay by a factor of 0.1 for every 10 epochs. Another model is trained using a cyclical learning rate to increase the accuracy and learning process(Smith and Ave, 2017). To find the maximum and minimum of the learning rate, learning rate range test is performed by running through the network, capturing loss and learning rate and optimizing the gradients at each step. A learning rate plot in Figure 1 shows that with the increasing of learning rate the loss decreases at first and after learning rate reach around 0.01, the loss increases dramatically. According to Jeremy Howard and his fast.ai courses, the upper bound and lower bound of learning rate are 0.001 and 0.0001667 in this example(Jeremy Howard, no date). So in total, four models are trained to predict batch-level images.
Figure 1 example loss versus learning rate in log10
After patch-based models trained, we need to predict on the WSI. Instead of majority voting which may cause high false-negative rate, we want to ensure a very high sensitivity at the expenses of specificity since it is life-threatening to send cancer patients home without treatments.
· Almallah, Y. Z. et al. (2000) ‘Urinary tract infection and patient satisfaction after flexible cystoscopy and urodynamic evaluation’, Urology, 56(1), pp. 37–39. doi: 10.1016/S0090-4295(00)00555-0.
· Alom, Z. et al. (no date) Breast Cancer Classification from Histopathological Images with Inception Recurrent Residual Convolutional Neural Network. Available at: https://arxiv.org/pdf/1811.04241.pdf (Accessed: 1 August 2019).
· Botteman, M. F. et al. (2003) ‘The Health Economics of Bladder Cancer: A Comprehensive Review of the Published Literature’, PharmacoEconomics, 21(18), pp. 1315–1330. doi: 10.1007/BF03262330.
· Chen, J. et al. (2017) ‘An ensemble of convolutional neural networks for image classification based on LSTM’, Proceedings – 2017 International Conference on Green Informatics, ICGI 2017, 21(1), pp. 217–222. doi: 10.1109/ICGI.2017.36.
· Han, X. (2017) ‘MR-based synthetic CT generation using a deep convolutional neural network method’:, Medical Physics. doi: 10.1002/mp.12155.
· He, K. (no date) ‘Deep Residual Learning for Image Recognition’.
· van der Heijden, A. G. and Witjes, J. A. (2009) ‘Recurrence, Progression, and Follow-Up in Non-Muscle-Invasive Bladder Cancer’, European Urology, Supplements, 8(7), pp. 556–562. doi: 10.1016/j.eursup.2009.06.010.
· Hou, L. et al. (no date) Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification. Available at: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf (Accessed: 12 August 2019).
· Hussain, Z. et al. (2017) ‘Differential Data Augmentation Techniques for Medical Imaging Classification Tasks.’, AMIA … Annual Symposium proceedings. AMIA Symposium. American Medical Informatics Association, 2017, pp. 979–984. Available at: http://www.ncbi.nlm.nih.gov/pubmed/29854165 (Accessed: 12 August 2019).
· Huynh, B. Q., Li, H. and Giger, M. L. (2016) ‘Digital mammographic tumor classification using transfer learning from deep convolutional neural networks’, Journal of Medical Imaging, 3(3), p. 034501. doi: 10.1117/1.jmi.3.3.034501.
· Jeremy Howard (no date) Lesson 3: Data blocks; Multi-label classification; Segmentation. Available at: https://course.fast.ai/videos/?lesson=3 (Accessed: 12 August 2019).
· Krizhevsky, A., Sutskever, I. and Hinton, G. E. (no date) ImageNet Classification with Deep Convolutional Neural Networks. Available at: http://code.google.com/p/cuda-convnet/ (Accessed: 11 August 2019).
· Lakhani, P. and Sundaram, B. (2017) ‘Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks’, Radiology, 284(2), pp. 574–582. doi: 10.1148/radiol.2017162326.
· Loshchilov, I. and Hutter, F. (2017) ‘SGDR: STOCHASTIC GRADIENT DESCENT WITH WARM RESTARTS’, pp. 1–16.
· Maffezzini, M. and Campodonico, F. (2010) ‘Bladder cancer’, ESMO Handbook of Cancer in the Senior Patient. Elsevier Ltd, 374(9685), pp. 121–127. doi: 10.3109/9781841847481.
· Milletari, F., Navab, N. and Ahmadi, S. A. (2016) ‘V-Net: Fully convolutional neural networks for volumetric medical image segmentation’, Proceedings – 2016 4th International Conference on 3D Vision, 3DV 2016. IEEE, pp. 565–571. doi: 10.1109/3DV.2016.79.
· Parkin, D. M. (2008) ‘The global burden of urinary bladder cancer’, Scandinavian Journal of Urology and Nephrology, 42(sup218), pp. 12–20. doi: 10.1080/03008880802285032.
· Reyes, A. K., Caicedo, J. C. and Camargo, J. E. (2015) ‘Fine-tuning deep convolutional networks for plant recognition’, CEUR Workshop Proceedings, 1391.
· Saeb-Parsy, K. et al. (2012) ‘Diagnosis of bladder cancer by immunocytochemical detection of minichromosome maintenance protein-2 in cells retrieved from urine’, British Journal of Cancer. Nature Publishing Group, 107(8), pp. 1384–1391. doi: 10.1038/bjc.2012.381.
· Smith, L. N. and Ave, O. (2017) ‘Cyclical Learning Rates for Training Neural Networks’, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, (April 2015), pp. 464–472. doi: 10.1109/WACV.2017.58.
· Szegedy, C. et al. (no date) ‘Going deeper with convolutions’, pp. 1–12.
· Tajbakhsh, N. et al. (2016) ‘Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?’, IEEE Transactions on Medical Imaging. IEEE, 35(5), pp. 1299–1312. doi: 10.1109/TMI.2016.2535302.
· Zhu, X. et al. (2017) ‘2017 IEEE Conference on Computer Vision and Pattern Recognition WSISA : Making Survival Prediction from Whole Slide Histopathological Images’. doi: 10.1109/CVPR.2017.725.