AI-Powered Medical Image Analysis for Cancer Detection

Jun 2021 | Research Papers


Current  diagnostic technologies are often presented in 2D radiological or microscopic images.  These views are often susceptible to  multiple  errors  ranging  from  improper  staining  to  blurred images  resulting  from   patient   movement   during   radiological test.   However  with  the  use of image analysis and machine  intelligence, medical experts can be assisted in diagnosis and planning.  The  aim  is  to  develop  a  software  that  utilizes  deep  learning to automatically  detect patterns  from  medical images with  high accuracy. This will assist radiologists in concordance with disease diagnosis of disease using x-ray scans.

Current   diagnostic  breast  imaging  technologies  are  often  presented in 2D.4 This limited view can result to risks such as tumor dosage overestimation.5      With  the  use of 3D  image rendering, segmentation and classification, medical experts can be assisted in diagnosis and planning.6

The study aims to develop a new and groundbreaking imaging software that will assist medical practitioners as concordance with lung, breast, and thyroid  cancer diagnosis using radiological and microscopic images.  The software will provide a low-cost, highly accurate, and readily available cancer detection method.  Historical CT, MRI, ultrasound, and microscopic images are needed as the training set for the software’s self-learning algorithm. There is no current risk to the health and privacy of the patient population as the information gathered will have no Personally Identifiable Information.

Related Literature

Developing an automated cancer detection system is very challenging because lesion areas are only defined through intensity changes relative to surrounding tissues as seen on a radiological image.  Some of the factors that has to be taken into account in the system are as follows:

  • different imaging systems with varied image resolutions,
  • different settings for each patient (varied intensity values for each imaging),
  • obfuscation of lesion by partial volumes or artifacts,
  • varied tumor structures (size, extension, localization), and
  • mass effect: growing tumor displace normal tissues which limits reliability of spatial prior knowledge.

Due to the presence of the characteristic speckle noise from these images, image preprocessing, such as filtering, is necessary for accurate automatic detection of breast cancer. The information of the contrast and the texture of the different regions allow a precise isolation of the lesions. Finally, the automatic analysis of the diagnostic criteria by means of different shape and histogram analysis techniques supplies a deep characterization of the cancer nodules.

Medical specialists can identify lesions through experience by examining hundreds of scans.  However with these factors affecting the quality of the scans being examined, isolating lesions through visual inspection may not be enough.  In order to improve the images read by specialists, the objective is to develop a CAD system specifically designed in detecting and isolating possible lesions from radiological images.

This work is a joint venture between DiGenomix and The Medical City that aims to develop an imaging software that utilizes deep learning technologies to automatically detect patterns from medical images with high accuracy. This software will potentially assist radiologists by detecting abnormal artifacts (nodules, masses, and other hazy infiltrates) associated with breast cancer in mammogram images.


  • The project aims to develop an imaging software that will assist medical practitioners as concordance with  breast cancer diagnosis using CT  scan and MRI ultrasound images. Following software launching, early cancer diagnosis will become cheaper, faster, and more readily available. Overdiagnosis and unnecessary treatment  due to diagnostic errors will be prevented, thus minimizing the general cost of early cancer detection.

The software aims to provide a cancer detection system that is low-cost, highly accurate, and readily available. Cases of overdiagnosis and unnecessary treatment  will be lessened once the software is made available to the health sector.  Patients in far-flung areas where medical services are inaccessible and/or unaffordable will benefit greatly once remote early diagnosis is made possible by the software.

With the use of CAD, images can be enhanced and detection can be made more accurate and more precise. Image filtering reduces noise to properly prepare the scan for the subsequent steps; feature extraction, segmentation and classification. These techniques ensure the speed and quality of cancer pre-diagnosis. Current clinical routine on diagnosis is the qualitative evaluation (as indicated by hyperintense signal in CT or MRI images) of an expert or by relying on more definitive tests such as fine needle aspiration biopsy (FNAB). Assisting the current assessment method with a system that is highly accurate and reproducible in detecting lesions would contribute  significantly in the improvement  of the diagnosis and treatment planning of patients.

System Setup Options

There are two options for setting up the rock analyzer app:

Hybrid Option (Cloud and On-Prem)

The rock analyzer will be served as a website and performs all computation in the server upon upload. The expected latency for the result is dependent on the number of samples sent for analysis. This option requires a stable internet connection during system updates.


The rock analyzer will be served on-prem if the internet connection is a problem, that is, all computation will be performed locally in the CPU and the storage will be installed locally as well.

The methodology is divided in three phases: data collection, data anonymization of patient medical files, and AI-powered Radiology system (AIR) for cancer detection.  Figure 4 shows the framework of the implementation.

Data collection for Phase 1 will be implemented by creating a software that will allow a personnel to collect medical information from different systems and sources to build a database. This software will connect and collect information on: 

  • pathology report,
  • patient information system, and
  • radiological images.

The gold standard in this study is the pathology report, for both the malignant and non-malignant groups. Therefore, only images with corresponding Pathology Reports available will be collected. For the MALIGNANT GROUP, the first step will be to list all patients diagnosed with early stage breast, lung or thyroid cancer at The Medical City from March 2017 to December of 2018. From this list, Pathology Reports will be obtained and the diagnostic images taken just prior to biopsy or surgery will be collected and go on to Phase 2. For the BENIGN GROUP, a list of age, gender and relevant demographic characteristics matched control individuals who also have a biopsy of the breast, lung or thyroid done at TMC within the same year but with Non-malignant results will be generated. The Pathology report and images done prior to biopsy or surgery will also be obtained and go on to Phase 2.  Information obtained, that is patient information, pathology report, and radiological images, will then be compiled to create a database, in preparation for Phase 2.  The hospital’s patient information system contains the patient information such as name, age, gender (e.g. demographics) and the pathology report will contain the readings and diagnosis made by radiologist or doctor, and the associated radiological images.  Depending on the type of cancer, the radiological images will be collected from computers connected to the assigned imaging machines.  Specifics on the limitations and scope of the dataset to be obtained is further discussed on the following sections.  

Phase 2 is data anonymization, done to protect the patient’s information.  By virtue of the Republic Act No. 10173, also known as the Data Privacy Act of 201225, no forms of personal information of subject in this study will be collected and stored 46. All information of the patient  (protected health information or PHI) that includes all forms of names, contact information, location or addresses, or any identifiable codes, will be considered highly confidential and will not be collected and stored for the CAD system.  Any and all information which can lead in identifying the patient or individually identifiable health information will be stripped-off to ensure privacy.  This will be implemented by running the patient medical file database through an anonymizer software to strip all the information unnecessary based on the standard protocol and guideline from the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule set by the US Department of Health and Service 47.  The only information that will remain are the patient’s age, gender, and related diagnosis (such as radiological images, diagnosis).  A patient ID number will be assigned to the patient for unique identification.  These will be collected for all the subjects and will form as the anonymized dataset.                   

Phase 3 is the AI-Powered Radiology (AIR) diagnostic system which is a computer-aided diagnosis (CAD) system which will be designed to detect and classify cancer from the medical images. Different radiological images will be used depending on the type of cancer:

  • computed tomography (CT) scans for lung cancer,
  • mammogram and ultrasound images for breast cancer, and           
  • computed tomography (CT) or ultrasound scans for thyroid cancer.        

Figure 4 shows the sample pipeline where the medical images is passed through AIR for image analysis and the resulting detection and diagnosis will be tested. The medical data will be initially passed through the data anonymizer, which will generate two digital outputs: anonymized patient dataset and masterlist. The anonymized patient information will contain the medical images with age, gender, diagnosis, and newly assigned patient ID. The masterlist, on the other hand, contains the original patient information with the associated patient ID (assigned during anonymization).

AIR system will be tested and developed using different machine learning techniques such as artificial neural network, random forest, and others. Upon testing, the best performing algorithm for each cancer will be used.  After testing, the system will serve as a computer-assisted tool in cancer diagnosis. Diagnosis made by AIR will then be given to the doctor for further inspection and final diagnosis.  The performance of the system will be quantified by resulting specificity, sensitivity, and other performance measures which will be further discussed in the Data Collection and Statistical Analysis under Analysis Procedures.