|
|
Guest post by Maxi Schulz
I met Maxi at a statistics conference and watched her talk. It was so inspiring that I asked her if she wanted to write a guest post for my newsletter. Here is the result.
|
Deep learning (DL) models in medicine promise to support doctors in making better choices for their patients, such as detecting skin cancer more reliably. In our own work, we discussed developing a model to help doctors diagnose different stages of cervical cancer. Cervical cancer is one of the most common cancers among women, and a leading cause of cancer-related death in women in most parts of Africa (1). To diagnose and grade the different stages of cervical cancer, colposcopy is a widely used technique. This is a medical imaging procedure that allows doctors to visually examine the cervix using a specialized microscope. The images captured during colposcopy make it an ideal candidate for DL models.
We found that 39 other studies had already used DL models to analyze colposcopy images, and 8 more had provided data for these analyses. Instead of starting from scratch, we wanted to build on the existing research and models that had already been developed. To do this, we needed to test how well these models work with our local patient group. We could do this in two ways:
- Using their original image data and running their code.
- Using their model's output directly, which are like the numbers you put into a math equation. These numbers, also called weights, help us figure out the probability that an image shows a certain stage of cancer.
The great thing about sharing model weights is that DL models can be applied in other places without needing the original training data. This is really important in medicine, because the sensitive nature of patient information has always been a major concern for data sharing. That's why we asked to access the model weights or the original images from other studies.
Of the 46 articles in total, 24 included a data sharing statement, with 21 promising to share their data upon 'reasonable' request and none making the data directly available. We contacted the corresponding authors via email and asked for their model weights (or their original data), whether the articles had a positive data sharing statement or not. The first problem we encountered was that some email addresses no longer worked. So, we also tried reaching out to them through ResearchGate. We contacted a total of 92 authors, but the result was upsetting. We received answers to only 14 studies, and most of the responses said no to sharing their data (11 out of 14), while the remaining put off our request. The authors mentioned a lack of resources or patient privacy concerns as reasons for not sharing, with some giving no reason at all. In the end, none of the publications we contacted were willing to share their data.
What's really concerning is that we couldn't get access to datasets that had already been shared with others, like data from the Guanacaste project (2). This dataset had been used before to train AI models (3–7), but the person in charge said they didn't have the resources to prepare and share it with us. It's even more surprising that researchers who created datasets explicitly to help train AI models didn't share their data or even respond to our requests (8, 9).
This shows a persistent lack of data sharing culture. The key takeaways are:
- A major barrier to data sharing is that authors often don't respond or their email addresses become invalid.
- Current data sharing policies are ineffective, such as the requirement by journals to include a data sharing statement.
- Scientists are often too busy to share their data, and possibly to respond to requests for it.
- Patient privacy is still a major concern for data sharing in the medical field. The potential of DL models, which could be achieved by sharing weights only, is not being fully utilized.
- Not sharing data holds back scientific progress. In our case, it stopped our project from moving forward. It also prevents other researchers from testing each other's findings.
We believe that sharing data requires more effort than not sharing it, but the current system doesn't incentivize scientists to do so. It's time for a change. To promote data sharing, we discussed the following ideas:
- Journals could require authors to share their data in a repository as part of the publication process, making it easily accessible to all and eliminating the need for individual requests.
- Funders can play a key role in enhancing data sharing by excluding researchers who don't comply with data sharing rules from future funding opportunities.
|
|
Read the full preprint Maxi wrote.
|
For cited literature, see the bottom of this newsletter post.
In other news...
Open Science Retreat 2026
April 7-11 at the Centre for Alternative Technology in Machynlleth, Wales, UK
|
|
Applications for the
Open Science Retreat 2026
are now open!
|
All the best,
Heidi
Literature Cited
1. Arbyn M, Weiderpass E, Bruni L, Sanjosé S de, Saraiya M, Ferlay J et al. Estimates of incidence and mortality of cervical cancer in 2018: a worldwide analysis. Lancet Glob Health 2020; 8(2):e191-e203.
2. Herrero R, Schiffman MH, Bratti C, Hildesheim A, Balmaceda I, Sherman ME et al. Design and methods of a population-based natural history study of cervical neoplasia in a rural province of Costa Rica: the Guanacaste Project. Rev Panam Salud Publica 1997; 1(5):362–75.
3. Pal A, Xue Z, Befano B, Rodriguez AC, Long LR, Schiffman M et al. Deep Metric Learning for Cervical Image Classification. IEEE Access 2021; 9:53266–75.
4. Lemay A, Hoebel K, Bridge CP, Befano B, Sanjosé S de, Egemen D et al. Improving the repeatability of deep learning models with Monte Carlo dropout. NPJ Digit Med 2022; 5(1):174.
5. Ahmed SR, Befano B, Lemay A, Egemen D, Rodriguez AC, Angara S et al. Reproducible and clinically translatable deep neural networks for cervical screening. Sci Rep 2023; 13(1):21772.
6. Xu T, Zhang H, Xin C, Kim E, Long LR, Xue Z et al. Multi-feature based Benchmark for Cervical Dysplasia Classification Evaluation. Pattern Recognit 2017; 63:468–75.
7. Hu L, Bell D, Antani S, Xue Z, Yu K, Horning MP et al. An Observational Study of Deep Learning and Automated Evaluation of Cervical Images for Cancer Screening. J Natl Cancer Inst 2019; 111(9):923–32.
8. Li Y, Liu Z-H, Xue P, Chen J, Ma K, Qian T et al. GRAND: A large-scale dataset and benchmark for cervical intraepithelial Neoplasia grading with fine-grained lesion description. Med Image Anal 2021; 70:102006.
9. Yu Y, Ma J, Zhao W, Li Z, Ding S. MSCI: A multistate dataset for colposcopy image classification of cervical cancer screening. Int J Med Inform 2021; 146:104352.