Mexican Deep DICOM
Tags |
---|
Abstract
Transform the diagnostic delivery process in Mexican public hospitals by identifying and understanding the different stages of the process, with the goal of deploying intelligent traffic light signaling to support the diagnosis of breast cancer using convolutional neural networks and mammograms in a real-world environment.
Code
https://github.com/sanchezcarlosjr/breast-cancer-toolkit
https://github.com/sanchezcarlosjr/breast-cancer-pipeline
Introduction
Since breast cancer is the first dead cause in Mexico among women it became a big public health problem —in fact, 2.26 million cases worldwide. In other words, is a type of cancer with the highest incidence and mortality in women: every day at least 14 women, chiefly between 50 to 69 years, die. Indeed, as we can see in the below figure, breast cancer is an increasing tendency compared to other cancers.
Nowadays, doctors carried out analyses using Traditional 2D mammograms from patient requests at public Mexican hospitals. In Ensenada, doctors request private external assistance. Oncologists annotate medical images, and they chiefly say what is the patient's BI-RADS score. "BI-RADS" means Breast Imaging Reporting and Database System, and it's scoring standard radiologists and oncologists use to describe mammogram results. We'll explain further BI-RADS in the section.
Our goals are to build data mining models that understand mammograms and predict breast cancer developing risk, continuing works. Since our model output is a person's future healthy situation, we'll do descriptive and predictive methods, indeed we're going to apply Machine Learning algorithms to datasets. Of course, we don't expect to replace medical doctors but assist them. We know other computer-aided detection systems have been developed for breast cancer detection but no one applies them to regional cities and they are not free.
We expect our project can help thousands of women in quick cancer detection because deep learning is faster and cheaper than humans if we get good metrics, therefore we're contributing to the decrease in the death rate. PACS means Picture archiving and communication system.
Related work
The body of work related to breast cancer diagnostics using mammograms and convolutional neural networks is expansive. Various resources provide complementary perspectives, techniques, and tools.
For instance, the Open Health Imaging Foundation (OHIF) provides an open-source DICOM Viewer available on GitHub. The viewer is a zero-footprint medical image viewer provided as a Meteor package (OHIF, n.d.). It enables practitioners to visualize and navigate medical imaging data directly, enhancing understanding and improving diagnosis accuracy.
The Radiological Society of North America (RSNA) has published numerous papers discussing the importance of certain mammographic findings and terminologies. In one of these papers, they delve into the BI-RADS terminology for mammography reports, explaining what medical residents need to know (RSNA, 2023). This work informs the interpretation and communication of mammography results, which is a crucial step in diagnosing breast cancer.
Methodology
The first step in understanding the methodology for machine learning is familiarizing oneself with the key concepts. Data science is essentially the process of extracting meaningful insights and patterns from large and destructured datasets. This process leverages various techniques such as machine learning, neural networks, and statistical methodologies to decipher raw data, which can often be vast and complex.
An important tool in this context is the Digital Imaging and Communications in Medicine, or DICOM. DICOM is a standard protocol used for the transmission, storage, retrieval, and sharing of medical images. This protocol aids in the visualization and analysis of these images, enabling the identification of potential patterns or traits that might be of particular interest. In the world of data mining, this step is often referred to as "DICOM View."
In the context of mammography analysis, one might utilize the Digital Database for Screening Mammography, or DDSM. DDSM is one of the largest publicly available collections of mammograms. As part of the preprocessing step, the mammograms from DDSM can be analyzed and preprocessed to identify and potentially remove any noise or inconsistencies in the data. This process might involve data cleaning, normalization, transformation, and other techniques to prepare the data for further analysis.
Finally, the methodology wraps up with the engineering process. This is where you design and implement your deep learning or machine learning models based on the preprocessed data. This can involve creating different deep learning architectures and preprocessing tasks. After training the model, you can then validate and test it on a separate dataset to ensure its reliability and effectiveness.
In this process, each step feeds into the next, creating a continuous flow from initial data understanding through to final model creation and evaluation. This framework allows for efficient handling and processing of complex and large-scale medical image data such as mammograms.
Domain Understanding
Breast cancer is a disease characterized by the abnormal and uncontrolled proliferation of cells, often leading to metastasis. It is commonly classified using the TNM staging system, which considers the size and extent of the tumor (T), the involvement of lymph nodes (N), and the presence of metastasis (M). The stage of the cancer is inversely related to survival rates; higher stages generally indicate a shorter lifespan, while lower stages are associated with longer survival.
Early diagnosis is crucial for improving outcomes, and various methods such as regular mammograms, self-exams, and awareness of risk factors are employed for this purpose. Speaking of risk factors, they can range from lifestyle choices like diet and obesity to chronic conditions, as well as environmental, familial, hereditary, benign, hormonal, and reproductive factors.
Additionally, the HER2 gene (Human Epidermal Growth Factor Receptor 2) can play a significant role in the development of breast cancer. Treatments targeting HER2 have been developed and are particularly effective for cases of HER2-positive breast cancer.
The mammography is the best technique to capture mammary microcalcifications.
Medical imaging
https://www.youtube.com/watch?v=DpmF4QZYoH0
Screening and basal
Screening for asymptomatic women can help identify tumors in their early stages. Early detection is crucial in improving the prognosis and survival rates for women diagnosed with breast cancer. Regular screening procedures include mammography, which can detect early signs of breast cancer before symptoms develop.
For more detailed information, you can refer to this publication from the Pontifical Catholic University of Chile: Screening and Early Detection of Breast Cancer.
Mammography
Craniocaudal and Mediolateral Oblique Views
The craniocaudal (CC) and mediolateral oblique (MLO) views are essential in mammography, particularly for identifying lesions that are not typically benign. These standard views help radiologists get a comprehensive look at the breast tissue.
Magnification Mammography
Magnification mammography, along with lateral, focal, and tangential views, is utilized to evaluate small lesions, distortions, and microcalcifications in detail. The purpose of these techniques is to enhance the visualization of specific areas of interest. Based on orthogonal incidences, the radiologist will indicate the area to be magnified for a more precise assessment.
Definition of Calcification
Calcification refers to the accumulation of calcium salts in body tissues, causing the tissue to harden. This process can lead to the formation of bone-like structures within soft tissues.
Breast Calcifications
Breast calcifications are small deposits of calcium that develop within the breast tissue. They are typically identified during mammography and require careful evaluation based on several factors:
- Size: The size of the calcifications can provide important diagnostic information.
- Location: The specific area within the breast where the calcifications are found can help determine their significance.
- Morphology: The shape and form of the calcifications are assessed to distinguish between benign and malignant patterns.
- Distribution: The pattern in which calcifications are spread within the breast tissue is also a critical factor.
These characteristics are described using the BIRADS (Breast Imaging Reporting and Data System) descriptors, which help standardize reporting and guide clinical management.
Microcalcifications and DCIS
Calcifications associated with Ductal Carcinoma In Situ (DCIS) are generally microcalcifications, which are very small (less than 0.5 mm). The presence of these tiny calcifications can indicate early, non-invasive breast cancer.
By evaluating these calcifications based on their size, location, morphology, and distribution, radiologists can make informed decisions about the likelihood of malignancy and the need for further diagnostic procedures.
Cancer progression
Risk factors
Risk factors for breast cancer include diet, lifestyle (such as obesity and chronic conditions), environmental factors, family history, hereditary conditions, benign conditions, hormonal factors, and reproductive history.
- HER2 (Human Epidermal Growth Factor Receptor 2): This gene can influence the development of breast cancer.
- Gail Model: A statistical tool used to estimate a woman's risk of developing breast cancer.
- Genetic Mutations: Including BRCA1, BRCA2, PALB2, ATM, and CHEK2, which are associated with a higher risk of breast cancer.
- Types of Carcinomas:
- Lobular Carcinoma: Originates in the milk-producing lobules.
- Ductal Carcinoma In Situ (DCIS): A non-invasive cancer where abnormal cells are found in the lining of a breast duct.
- Atypical Ductal Hyperplasia (ADH): A condition where abnormal cells are found in the breast ducts.
- Lobular Hyperplasia: A condition where abnormal cells are found in the lobules of the breast.
Medications and Treatments
- Tamoxifen: 20 mg for premenopausal women.
- Raloxifene: 60 mg for postmenopausal women, taken for 5 years.
- Aromatase Inhibitors: Such as exemestane, supported by evidence from studies like MAP-33 and IBIS II6.
Early Diagnosis and Screening
- Early Diagnosis: Includes screening methods like self-examination and clinical exams.
- Monthly Self-Exams: Starting at age 18, around day 10 of the menstrual cycle.
- Annual Clinical Exams: Starting at age 25.
- Annual Screening Mammograms: Starting at age 40, as recommended by the Mexican Consensus on the Diagnosis and Treatment of Breast Cancer (Tenth Colima Meeting, 2023).
Screening Considerations
- Age and Breast Density: The cutoff age for starting mammograms is 40 years due to technological limitations with dense breast tissue.
- Breast Ultrasound: Recommended for women under 40 with breast pathology. Mammography and ultrasound are complementary studies with a combined sensitivity of 87%.
Additional Considerations
- Age Restrictions: Generally, screening is not recommended for women under 25 unless they have a direct relative with breast cancer, in which case screening should start 10 years earlier than the age at which the relative was diagnosed.
Mammography
The only imaging method that reduces breast cancer mortality.
- 40% in women aged 50 to 69 years
- 29% to 48% in women aged 40 to 49 years
Mammography identifies 2 to 8 cases per 1000 studies. Sensitivity of mammography: Dense breast 30-64%, fatty breast: 98%.
The likelihood of survival is directly proportional to the clinical stage at diagnosis, the available treatment options, and the biology of the disease.
Mexican Consensus on the Diagnosis and Treatment of Breast Cancer, 2021
Recommendation: Annual screening mammograms are recommended for asymptomatic women starting at age 40.
Mexican Official Standard 041
Recommendation: Screening mammograms are recommended for apparently healthy women aged 40 to 69 years, every two years.
American Cancer Society
- 40-44
- Starting screening mammograms is optional. The recommended frequency is annually.
- 45-54
- Women in this age range should have annual screening mammograms.
- 55 and older
- The frequency of screening can be annual or biennial. However, screening is mandatory. Screening tests should continue as long as the woman is in good health.
Mexican Consensus on the Diagnosis and Treatment of Breast Cancer. Tenth Colima Meeting 2023.
Techniques
graph TD
A[Different Techniques of Mammography]
A --> B[Conventional Mammography]
B --> C[Analog]
B --> D[Screen-Film Images]
A --> E[Digital Acquisition]
E --> F[Through Integrated or External Detectors]
E --> G[High-Resolution Laser Equipment]
mindmap
root((Digital Mammography))
Telemammography
Tomosynthesis Mammography
Synthesized Mammography
Stereotactic Biopsy with Tomosynthesis
AI-assisted Detection Systems
Contrast-enhanced Mammography
People
An oncologist is a doctor who treats cancer and provides medical care for a person diagnosed with cancer.
Radiologists are experts in evaluating mammograms and other imaging modalities. BIRADS (Breast Imaging Reporting and Data System) is a common language between radiologists and oncologists. Note: BIRADS is a stage before cancer. Cancer can be seen as a "seeding" process. Key imaging methods include digital mammography, breast ultrasound, and magnetic resonance imaging (MRI) for special cases. Prevention and early detection are crucial.
Medical physics is a field that applies physics principles to medicine, primarily in the diagnosis and treatment of diseases. It encompasses various techniques and technologies that are crucial in the fight against cancer, among other medical conditions. Radiotherapy, also known as radiation therapy, is a treatment that uses high doses of radiation to kill cancer cells and shrink tumors. It is a vital tool in oncology, helping to manage and cure various types of cancer. Radiotherapy can be delivered externally using machines or internally through radioactive substances placed near cancer cells. A particle accelerator is a complex machine that uses electromagnetic fields to propel charged particles, such as protons or electrons, to high speeds and to contain them in well-defined beams. In medical physics, particle accelerators are used in radiation therapy to generate high-energy beams that target cancer cells with precision, minimizing damage to surrounding healthy tissues. Cobalt therapy, or cobalt-60 therapy, is a type of radiotherapy that uses gamma rays from the radioactive isotope cobalt-60. It was one of the first widely used radiotherapy methods and remains important in certain types of cancer treatment, particularly in regions where access to advanced technologies may be limited. Chemotherapy involves the use of drugs to kill cancer cells or slow their growth. Unlike radiotherapy, which targets a specific area, chemotherapy works throughout the whole body. It is often used in combination with radiotherapy and surgery to enhance the overall effectiveness of cancer treatment.
The likelihood of survival is directly proportional to the clinical stage at diagnosis, the available treatment options, and the biology of the disease.
BIRADS Scale: Breast Imaging Reporting and Data System
Category | Recommendations |
0 | Insufficient for diagnosis: Evaluation with additional mammographic images or other studies (US) is required, as well as comparison with previous studies. This category should not be used as an indication for MRI. There is a 13% possibility of malignancy. |
1 | Negative: No findings to report. Annual mammography for women over 40 years. |
2 | Benign Findings: Annual mammography for women over 40 years. |
3 | Probably Benign Findings: Less than 2% probability of malignancy. Follow-up with imaging of the affected side with suspicious findings every 6 months, and subsequent bilateral monitoring for 2 years. This category is only recommended for diagnostic mammography. |
4 | Suspicious Abnormality: Needs further evaluation. 4a: Low suspicion of malignancy (2-10%). 4b: Moderate suspicion of malignancy (10-50%). 4c: High suspicion of malignancy (50-95%). Requires biopsy. |
5 | Highly Suggestive of Malignancy: Requires biopsy. Positive predictive value (PPV) >95%. |
6 | Known Biopsy-Proven Malignancy: Awaiting definitive treatment or evaluation of treatment response. |
BIRADS: Breast Imaging Reporting and Data System; US: Ultrasound; MRI: Magnetic Resonance Imaging; PPV: Positive Predictive Value. Source: American College of Radiology, Mammography, 5th ed., 2013
Previous work
Datasets
DDSM
CBIS-DDSM
Experiments
In below experiments, various configurations and methods are explored to train a neural network. The experiments range from basic setups without data preprocessing to more advanced configurations involving various techniques like SAM, transformers, and different architectures like EfficientNet. Finally, the network that performs the best is further tested on a hospital dataset to evaluate its applicability in a real-world setting.
TwoViewDensityNet
We’ve reproduced the workflow proposed by [1] on a notebook.
De-nosing process with Semantic Segmentation in 2D mammograms
Deployment in Mexican hospital
Current Hospital Process
The current process for breast cancer detection in hospitals involves performing a mammogram, where the patient is fitted with a band for the study. Then, the image is sent to the the open-source application K-PACS, installed throughout the hospital. This system currently does not allow for quickly sharing studies with other hospitals or doctors, and there is a plan to implement a national-level electronic record called SINBA to centralize information.
Patients cannot consult the information in the data repository, which includes medical notes, and a unique population registration identity code (CURP) is required to access it. The images are automatically sent in a ZIP file to an external provider which evaluates them and delivers the results (BERAX). The hospitals pay this provider for the service.
There is a proposal to replace the sending of ZIP files with uncompressed images to speed up the process. Currently, the results can take between 1 and 6 days to be delivered, which can cause delays in the diagnosis and treatment of cancer. Patients need the results as quickly as possible so that they can undergo biopsies and additional tests.
We propose to implement a traffic light system for the status of the analyses, allowing users to know when the results are available. Additionally, work could be done on the implementation of an artificial intelligence (AI) project to improve the image analysis process, although this is not foreseen in the short or medium term.
The success of the AI project could attract funding and support for other similar projects in the future. To achieve this, it is important that specific responsibilities are identified and assigned to the people who will feed the necessary information into the system.
DICOM
Our model is going to be deployed over Gradio such that some background service can call our model as API REST.
DICOM stands for Digital Imaging and Communications in Medicine. It's a global standard for handling, storing, printing, and transmitting information in medical imaging. The standard was created by the National Electrical Manufacturers Association (NEMA) and is widely used in hospitals globally. It includes a file format and a network communications protocol, and it defines data structures for medical images and related information like patient data, image acquisition parameters, and diagnostic findings.
DICOM enables the integration of scanners, servers, workstations, printers, and network hardware from multiple manufacturers into a picture archiving and communication system (PACS). The different devices come with DICOM conformance statements that state how they support the DICOM standard.
DICOMWeb, on the other hand, is a term used to denote the family of DICOM RESTful web services. These web services are defined as part of the DICOM standard and provide access to a set of fundamental DICOM functions using familiar web technologies. They are an HTTP-based API for the DICOM protocol, making it more accessible to web-based applications.
There are several services defined under the DICOMWeb umbrella, but we’ve implemented QIDO (Query based on ID for DICOM Objects) with ConQuest DICOM server 1.5.0c and OHIF Viewer.
ConQuest DICOM server released in 1995 by Marcel Van Herk, is a widely used, versatile DICOM server. This server can be found on GitHub and the client https://github.com/sanchezcarlosjr/MexicanPACS.
This version of the server was used by the University of California at Davis for their Personal PACS (Picture Archiving and Communication System). PACS is essential in modern healthcare since it allows for the storage and convenient access of medical images. Personal PACS systems can provide substantial benefits to healthcare professionals by enabling access to patient images and related data from their personal devices.
The ConQuest DICOM server uses a Delphi TCP/IP connection for network communication. Delphi is a programming language and software development kit that supports Windows APIs, including those for establishing and managing TCP/IP connections.
The server software also includes Lua scripting. Lua is a lightweight and efficient scripting language commonly used for extending applications. It provides the capability to incorporate advanced logic into the server's operation without modifying the server's source code.
Overall, the ConQuest DICOM server is an essential tool for managing medical imaging data, particularly in research and clinical contexts where flexibility and customizability are paramount.
On the another hand, medical doctors who wish to analyze mammographies are using DICOM Viewer. In our case, we’ll customize OHIF Viewer.
Acknowledgements
We thank the participating women, mammography facilities, and radiologists for the data they have provided. You can learn more about the BCSC at: http://www.bcsc-research.org/.
Conclusions
References
Load and preprocess images  | TensorFlow Core
[1] TwoViewDensityNet: Two-View Mammographic Breast Density Classification Based on Deep Convolutional Neural Network Mariam Busaleh 1 , Muhammad Hussain 1,* , Hatim A. Aboalsamh 1 , Fazal-e-Amin 2 and Sarah A. Al Sultan 3
Open Health Imaging Foundation. (n.d.). OHIF/Viewers. GitHub. Retrieved June 13, 2023, from https://github.com/OHIF/Viewers
Radiological Society of North America. (n.d.). Radiol.211105. RSNA Journals. Retrieved June 13, 2023, from https://pubs.rsna.org/doi/10.1148/radiol.211105
Sanchez Carlos Jr. (n.d.). Breast-Cancer-risk-estimation-system. GitHub. Retrieved June 13, 2023, from https://github.com/sanchezcarlosjr/Breast-Cancer-risk-estimation-system
Society for Imaging Informatics in Medicine. (n.d.). SIIM. Retrieved June 13, 2023, from https://siim.org/
Ray Project. (n.d.). Pipelining Datasets. Ray Documentation. Retrieved June 13, 2023, from https://docs.ray.io/en/latest/data/pipelining-compute.html#pipelining-datasets
Ray Project. (n.d.). ray.data.read_images. Ray Documentation. Retrieved June 13, 2023, from https://docs.ray.io/en/latest/data/api/doc/ray.data.read_images.html#ray.data.read_images
Ray Project. (n.d.). OCR Example. Ray Documentation. Retrieved June 13, 2023, from https://docs.ray.io/en/latest/data/examples/ocr_example.html
Radiological Society of North America. (2023, May 10). BI-RADS Terminology for Mammography Reports: What Residents Need to Know. RSNA Journals. Retrieved June 13, 2023, from https://pubs.rsna.org/do/10.1148/rg.2019180068.pres/full
Sanchez Carlos Jr. (n.d.). breast-cancer-pipeline. GitHub. Retrieved June 13, 2023, from https://github.com/sanchezcarlosjr/breast-cancer-pipeline
https://github.com/Adamouization/Breast-Cancer-Detection-Mammogram-Deep-Learning-Publication
https://www.thelancet.com/journals/landig/article/PIIS2589-7500(23)00153-X/fulltext
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280841
https://github.com/Project-MONAI/MONAI
https://www.youtube.com/@residenciaimageneshigasanm7739
https://www.youtube.com/watch?v=-KGaoQX6OVQ
https://github.com/SysCV/sam-hq
https://www.imss.gob.mx/sites/all/statics/guiasclinicas/240GRR.pdf
Annexes
Big Data
Updating the ljpeg library was a crucial step in our workflow to download DDSM (Digital Database for Screening Mammography) images and other images in the LJPEG format for further algorithmic processing. The ljpeg library allows you to efficiently handle and decode images encoded in the LJPEG format.
By updating the library, we ensure that you have the latest version with any bug fixes or performance improvements.
https://github.com/sanchezcarlosjr/ljpeg