🦀

Mexican Deep DICOM

Tags

Abstract

Transform the diagnostic delivery process in Mexican public hospitals by identifying and understanding the different stages of the process, with the goal of deploying intelligent traffic light signaling to support the diagnosis of breast cancer using convolutional neural networks and mammograms in a real-world environment.

Code

https://github.com/sanchezcarlosjr/breast-cancer-toolkit

https://github.com/sanchezcarlosjr/breast-cancer-pipeline

Introduction

Since breast cancer is the first dead cause in Mexico among women it became a big public health problem —in fact, 2.26 million cases worldwide. In other words, is a type of cancer with the highest incidence and mortality in women: every day at least 14 women, chiefly between 50 to 69 years, die. Indeed, as we can see in the below figure, breast cancer is an increasing tendency compared to other cancers.

Nowadays, doctors carried out analyses using Traditional 2D mammograms from patient requests at public Mexican hospitals. In Ensenada, doctors request private external assistance. Oncologists annotate medical images, and they chiefly say what is the patient's BI-RADS score. "BI-RADS" means Breast Imaging Reporting and Database System, and it's scoring standard radiologists and oncologists use to describe mammogram results. We'll explain further BI-RADS in the section.

Our goals are to build data mining models that understand mammograms and predict breast cancer developing risk, continuing works. Since our model output is a person's future healthy situation, we'll do descriptive and predictive methods, indeed we're going to apply Machine Learning algorithms to datasets. Of course, we don't expect to replace medical doctors but assist them. We know other computer-aided detection systems have been developed for breast cancer detection but no one applies them to regional cities and they are not free.

We expect our project can help thousands of women in quick cancer detection because deep learning is faster and cheaper than humans if we get good metrics, therefore we're contributing to the decrease in the death rate. PACS means Picture archiving and communication system.

Related work

The body of work related to breast cancer diagnostics using mammograms and convolutional neural networks is expansive. Various resources provide complementary perspectives, techniques, and tools.

For instance, the Open Health Imaging Foundation (OHIF) provides an open-source DICOM Viewer available on GitHub. The viewer is a zero-footprint medical image viewer provided as a Meteor package (OHIF, n.d.). It enables practitioners to visualize and navigate medical imaging data directly, enhancing understanding and improving diagnosis accuracy.

The Radiological Society of North America (RSNA) has published numerous papers discussing the importance of certain mammographic findings and terminologies. In one of these papers, they delve into the BI-RADS terminology for mammography reports, explaining what medical residents need to know (RSNA, 2023). This work informs the interpretation and communication of mammography results, which is a crucial step in diagnosing breast cancer.

Methodology

The first step in understanding the methodology for machine learning is familiarizing oneself with the key concepts. Data science is essentially the process of extracting meaningful insights and patterns from large and destructured datasets. This process leverages various techniques such as machine learning, neural networks, and statistical methodologies to decipher raw data, which can often be vast and complex.

An important tool in this context is the Digital Imaging and Communications in Medicine, or DICOM. DICOM is a standard protocol used for the transmission, storage, retrieval, and sharing of medical images. This protocol aids in the visualization and analysis of these images, enabling the identification of potential patterns or traits that might be of particular interest. In the world of data mining, this step is often referred to as "DICOM View."

In the context of mammography analysis, one might utilize the Digital Database for Screening Mammography, or DDSM. DDSM is one of the largest publicly available collections of mammograms. As part of the preprocessing step, the mammograms from DDSM can be analyzed and preprocessed to identify and potentially remove any noise or inconsistencies in the data. This process might involve data cleaning, normalization, transformation, and other techniques to prepare the data for further analysis.

Finally, the methodology wraps up with the engineering process. This is where you design and implement your deep learning or machine learning models based on the preprocessed data. This can involve creating different deep learning architectures and preprocessing tasks. After training the model, you can then validate and test it on a separate dataset to ensure its reliability and effectiveness.

In this process, each step feeds into the next, creating a continuous flow from initial data understanding through to final model creation and evaluation. This framework allows for efficient handling and processing of complex and large-scale medical image data such as mammograms.

Domain Understanding

Breast cancer is a disease characterized by the abnormal and uncontrolled proliferation of cells, often leading to metastasis. It is commonly classified using the TNM staging system, which considers the size and extent of the tumor (T), the involvement of lymph nodes (N), and the presence of metastasis (M). The stage of the cancer is inversely related to survival rates; higher stages generally indicate a shorter lifespan, while lower stages are associated with longer survival.

Early diagnosis is crucial for improving outcomes, and various methods such as regular mammograms, self-exams, and awareness of risk factors are employed for this purpose. Speaking of risk factors, they can range from lifestyle choices like diet and obesity to chronic conditions, as well as environmental, familial, hereditary, benign, hormonal, and reproductive factors.

Additionally, the HER2 gene (Human Epidermal Growth Factor Receptor 2) can play a significant role in the development of breast cancer. Treatments targeting HER2 have been developed and are particularly effective for cases of HER2-positive breast cancer.

The mammography is the best technique to capture mammary microcalcifications.

Medical imaging

https://monai.io/

https://www.youtube.com/watch?v=DpmF4QZYoH0

Screening and basal

Screening for asymptomatic women can help identify tumors in their early stages. Early detection is crucial in improving the prognosis and survival rates for women diagnosed with breast cancer. Regular screening procedures include mammography, which can detect early signs of breast cancer before symptoms develop.

For more detailed information, you can refer to this publication from the Pontifical Catholic University of Chile: Screening and Early Detection of Breast Cancer.

Mammography

Portachasis especial: aumenta la distancia entre la mama y la placa.. https://www.youtube.com/watch?v=-louFNyRJhw

Craniocaudal and Mediolateral Oblique Views

The craniocaudal (CC) and mediolateral oblique (MLO) views are essential in mammography, particularly for identifying lesions that are not typically benign. These standard views help radiologists get a comprehensive look at the breast tissue.

http://med_physics.i-do.science/topics/diagnose_breast/

Magnification Mammography

Magnification mammography, along with lateral, focal, and tangential views, is utilized to evaluate small lesions, distortions, and microcalcifications in detail. The purpose of these techniques is to enhance the visualization of specific areas of interest. Based on orthogonal incidences, the radiologist will indicate the area to be magnified for a more precise assessment.

Definition of Calcification

Calcification refers to the accumulation of calcium salts in body tissues, causing the tissue to harden. This process can lead to the formation of bone-like structures within soft tissues.

Breast Calcifications

Breast calcifications are small deposits of calcium that develop within the breast tissue. They are typically identified during mammography and require careful evaluation based on several factors:

These characteristics are described using the BIRADS (Breast Imaging Reporting and Data System) descriptors, which help standardize reporting and guide clinical management.

Microcalcifications and DCIS

Calcifications associated with Ductal Carcinoma In Situ (DCIS) are generally microcalcifications, which are very small (less than 0.5 mm). The presence of these tiny calcifications can indicate early, non-invasive breast cancer.

By evaluating these calcifications based on their size, location, morphology, and distribution, radiologists can make informed decisions about the likelihood of malignancy and the need for further diagnostic procedures.

Cancer progression

Risk factors

Risk factors for breast cancer include diet, lifestyle (such as obesity and chronic conditions), environmental factors, family history, hereditary conditions, benign conditions, hormonal factors, and reproductive history.

Medications and Treatments

Early Diagnosis and Screening

Screening Considerations

Additional Considerations

Mammography

The only imaging method that reduces breast cancer mortality.

Mammography identifies 2 to 8 cases per 1000 studies. Sensitivity of mammography: Dense breast 30-64%, fatty breast: 98%.

The likelihood of survival is directly proportional to the clinical stage at diagnosis, the available treatment options, and the biology of the disease.

Mexican Consensus on the Diagnosis and Treatment of Breast Cancer, 2021

Recommendation: Annual screening mammograms are recommended for asymptomatic women starting at age 40.


Mexican Official Standard 041

Recommendation: Screening mammograms are recommended for apparently healthy women aged 40 to 69 years, every two years.


American Cancer Society


Mexican Consensus on the Diagnosis and Treatment of Breast Cancer. Tenth Colima Meeting 2023.

Techniques

graph TD
    A[Different Techniques of Mammography]

    A --> B[Conventional Mammography]
    B --> C[Analog]
    B --> D[Screen-Film Images]

    A --> E[Digital Acquisition]
    E --> F[Through Integrated or External Detectors]
    E --> G[High-Resolution Laser Equipment]
mindmap
  root((Digital Mammography))
    Telemammography
    Tomosynthesis Mammography
    Synthesized Mammography
    Stereotactic Biopsy with Tomosynthesis
    AI-assisted Detection Systems
    Contrast-enhanced Mammography

People

An oncologist is a doctor who treats cancer and provides medical care for a person diagnosed with cancer.

Radiologists are experts in evaluating mammograms and other imaging modalities. BIRADS (Breast Imaging Reporting and Data System) is a common language between radiologists and oncologists. Note: BIRADS is a stage before cancer. Cancer can be seen as a "seeding" process. Key imaging methods include digital mammography, breast ultrasound, and magnetic resonance imaging (MRI) for special cases. Prevention and early detection are crucial.

Medical physics is a field that applies physics principles to medicine, primarily in the diagnosis and treatment of diseases. It encompasses various techniques and technologies that are crucial in the fight against cancer, among other medical conditions. Radiotherapy, also known as radiation therapy, is a treatment that uses high doses of radiation to kill cancer cells and shrink tumors. It is a vital tool in oncology, helping to manage and cure various types of cancer. Radiotherapy can be delivered externally using machines or internally through radioactive substances placed near cancer cells. A particle accelerator is a complex machine that uses electromagnetic fields to propel charged particles, such as protons or electrons, to high speeds and to contain them in well-defined beams. In medical physics, particle accelerators are used in radiation therapy to generate high-energy beams that target cancer cells with precision, minimizing damage to surrounding healthy tissues. Cobalt therapy, or cobalt-60 therapy, is a type of radiotherapy that uses gamma rays from the radioactive isotope cobalt-60. It was one of the first widely used radiotherapy methods and remains important in certain types of cancer treatment, particularly in regions where access to advanced technologies may be limited. Chemotherapy involves the use of drugs to kill cancer cells or slow their growth. Unlike radiotherapy, which targets a specific area, chemotherapy works throughout the whole body. It is often used in combination with radiotherapy and surgery to enhance the overall effectiveness of cancer treatment.

The likelihood of survival is directly proportional to the clinical stage at diagnosis, the available treatment options, and the biology of the disease.

BIRADS Scale: Breast Imaging Reporting and Data System

CategoryRecommendations
0Insufficient for diagnosis: Evaluation with additional mammographic images or other studies (US) is required, as well as comparison with previous studies. This category should not be used as an indication for MRI. There is a 13% possibility of malignancy.
1Negative: No findings to report. Annual mammography for women over 40 years.
2Benign Findings: Annual mammography for women over 40 years.
3Probably Benign Findings: Less than 2% probability of malignancy. Follow-up with imaging of the affected side with suspicious findings every 6 months, and subsequent bilateral monitoring for 2 years. This category is only recommended for diagnostic mammography.
4Suspicious Abnormality: Needs further evaluation. 4a: Low suspicion of malignancy (2-10%).

4b: Moderate suspicion of malignancy (10-50%). 4c: High suspicion of malignancy (50-95%). Requires biopsy.
5Highly Suggestive of Malignancy: Requires biopsy. Positive predictive value (PPV) >95%.
6Known Biopsy-Proven Malignancy: Awaiting definitive treatment or evaluation of treatment response.

BIRADS: Breast Imaging Reporting and Data System; US: Ultrasound; MRI: Magnetic Resonance Imaging; PPV: Positive Predictive Value. Source: American College of Radiology, Mammography, 5th ed., 2013

Previous work

Datasets

DDSM

CBIS-DDSM

Experiments

In below experiments, various configurations and methods are explored to train a neural network. The experiments range from basic setups without data preprocessing to more advanced configurations involving various techniques like SAM, transformers, and different architectures like EfficientNet. Finally, the network that performs the best is further tested on a hospital dataset to evaluate its applicability in a real-world setting.

TwoViewDensityNet

We’ve reproduced the workflow proposed by [1] on a notebook.

De-nosing process with Semantic Segmentation in 2D mammograms

Deployment in Mexican hospital

Current Hospital Process

The current process for breast cancer detection in hospitals involves performing a mammogram, where the patient is fitted with a band for the study. Then, the image is sent to the the open-source application K-PACS, installed throughout the hospital. This system currently does not allow for quickly sharing studies with other hospitals or doctors, and there is a plan to implement a national-level electronic record called SINBA to centralize information.

We propose this process to deploy our model


Patients cannot consult the information in the data repository, which includes medical notes, and a unique population registration identity code (CURP) is required to access it. The images are automatically sent in a ZIP file to an external provider which evaluates them and delivers the results (BERAX). The hospitals pay this provider for the service.


There is a proposal to replace the sending of ZIP files with uncompressed images to speed up the process. Currently, the results can take between 1 and 6 days to be delivered, which can cause delays in the diagnosis and treatment of cancer. Patients need the results as quickly as possible so that they can undergo biopsies and additional tests.


We propose to implement a traffic light system for the status of the analyses, allowing users to know when the results are available. Additionally, work could be done on the implementation of an artificial intelligence (AI) project to improve the image analysis process, although this is not foreseen in the short or medium term.

Prototype


The success of the AI project could attract funding and support for other similar projects in the future. To achieve this, it is important that specific responsibilities are identified and assigned to the people who will feed the necessary information into the system.

DICOM

Our model is going to be deployed over Gradio such that some background service can call our model as API REST.

DICOM stands for Digital Imaging and Communications in Medicine. It's a global standard for handling, storing, printing, and transmitting information in medical imaging. The standard was created by the National Electrical Manufacturers Association (NEMA) and is widely used in hospitals globally. It includes a file format and a network communications protocol, and it defines data structures for medical images and related information like patient data, image acquisition parameters, and diagnostic findings.

DICOM enables the integration of scanners, servers, workstations, printers, and network hardware from multiple manufacturers into a picture archiving and communication system (PACS). The different devices come with DICOM conformance statements that state how they support the DICOM standard.

DICOMWeb, on the other hand, is a term used to denote the family of DICOM RESTful web services. These web services are defined as part of the DICOM standard and provide access to a set of fundamental DICOM functions using familiar web technologies. They are an HTTP-based API for the DICOM protocol, making it more accessible to web-based applications.

There are several services defined under the DICOMWeb umbrella, but we’ve implemented QIDO (Query based on ID for DICOM Objects) with ConQuest DICOM server 1.5.0c and OHIF Viewer.

ConQuest DICOM server released in 1995 by Marcel Van Herk, is a widely used, versatile DICOM server. This server can be found on GitHub and the client https://github.com/sanchezcarlosjr/MexicanPACS.

ConQuest DICOM

This version of the server was used by the University of California at Davis for their Personal PACS (Picture Archiving and Communication System). PACS is essential in modern healthcare since it allows for the storage and convenient access of medical images. Personal PACS systems can provide substantial benefits to healthcare professionals by enabling access to patient images and related data from their personal devices.

The ConQuest DICOM server uses a Delphi TCP/IP connection for network communication. Delphi is a programming language and software development kit that supports Windows APIs, including those for establishing and managing TCP/IP connections.

The server software also includes Lua scripting. Lua is a lightweight and efficient scripting language commonly used for extending applications. It provides the capability to incorporate advanced logic into the server's operation without modifying the server's source code.

Overall, the ConQuest DICOM server is an essential tool for managing medical imaging data, particularly in research and clinical contexts where flexibility and customizability are paramount.

On the another hand, medical doctors who wish to analyze mammographies are using DICOM Viewer. In our case, we’ll customize OHIF Viewer.

Default OHIF Viewer

Acknowledgements

We thank the participating women, mammography facilities, and radiologists for the data they have provided. You can learn more about the BCSC at: http://www.bcsc-research.org/.

Conclusions

References

Load and preprocess images  |  TensorFlow Core

[1] TwoViewDensityNet: Two-View Mammographic Breast Density Classification Based on Deep Convolutional Neural Network Mariam Busaleh 1 , Muhammad Hussain 1,* , Hatim A. Aboalsamh 1 , Fazal-e-Amin 2 and Sarah A. Al Sultan 3

Open Health Imaging Foundation. (n.d.). OHIF/Viewers. GitHub. Retrieved June 13, 2023, from https://github.com/OHIF/Viewers

Radiological Society of North America. (n.d.). Radiol.211105. RSNA Journals. Retrieved June 13, 2023, from https://pubs.rsna.org/doi/10.1148/radiol.211105

Sanchez Carlos Jr. (n.d.). Breast-Cancer-risk-estimation-system. GitHub. Retrieved June 13, 2023, from https://github.com/sanchezcarlosjr/Breast-Cancer-risk-estimation-system

Society for Imaging Informatics in Medicine. (n.d.). SIIM. Retrieved June 13, 2023, from https://siim.org/

Ray Project. (n.d.). Pipelining Datasets. Ray Documentation. Retrieved June 13, 2023, from https://docs.ray.io/en/latest/data/pipelining-compute.html#pipelining-datasets

Ray Project. (n.d.). ray.data.read_images. Ray Documentation. Retrieved June 13, 2023, from https://docs.ray.io/en/latest/data/api/doc/ray.data.read_images.html#ray.data.read_images

Ray Project. (n.d.). OCR Example. Ray Documentation. Retrieved June 13, 2023, from https://docs.ray.io/en/latest/data/examples/ocr_example.html

Radiological Society of North America. (2023, May 10). BI-RADS Terminology for Mammography Reports: What Residents Need to Know. RSNA Journals. Retrieved June 13, 2023, from https://pubs.rsna.org/do/10.1148/rg.2019180068.pres/full

Sanchez Carlos Jr. (n.d.). breast-cancer-pipeline. GitHub. Retrieved June 13, 2023, from https://github.com/sanchezcarlosjr/breast-cancer-pipeline

https://github.com/Adamouization/Breast-Cancer-Detection-Mammogram-Deep-Learning-Publication

https://www.thelancet.com/journals/landig/article/PIIS2589-7500(23)00153-X/fulltext

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280841

https://github.com/Project-MONAI/MONAI

https://www.youtube.com/@residenciaimageneshigasanm7739

https://www.youtube.com/watch?v=-KGaoQX6OVQ

https://github.com/SysCV/sam-hq

https://www.imss.gob.mx/sites/all/statics/guiasclinicas/240GRR.pdf

https://github.com/luca-medeiros/lang-segment-anything

Annexes

Big Data

Updating the ljpeg library was a crucial step in our workflow to download DDSM (Digital Database for Screening Mammography) images and other images in the LJPEG format for further algorithmic processing. The ljpeg library allows you to efficiently handle and decode images encoded in the LJPEG format.

By updating the library, we ensure that you have the latest version with any bug fixes or performance improvements.

https://github.com/sanchezcarlosjr/ljpeg

Spanish slides

Management