One of the biggest challenges of this fellowship has been obtaining data. Being an informatics fellow, data is quite essential to my training. This problem is not unique to me, my current institution or healthcare in general. And it brings me to my next blog topic – data sharing.
What is data sharing and why is it important? Data sharing is the practice of making data available to others through a variety of mechanisms and it has become increasingly important in healthcare. There is a wealth of information that is exponentially growing everyday and includes demographic, clinical and even financial data. All of this data has the potential to make healthcare better by decreasing waste, identifying disparities and improving quality.
Data sharing is the practice of making data available to others through a variety of mechanisms.

Sharing Is Caring—Data Sharing Initiatives in Healthcare
Challenges and Barriers for Data Sharing
Although data sharing can certainly improve healthcare, there are many challenges, barriers and general reluctance to share data. One of the most important and obvious reasons is privacy. Healthcare data is robustly regulated to protect sensitive health information, particularly identifying data such as name, birthdate or any other unique identifiers. This type of information is referred to as personal health information or PHI. These concerns of privacy are not trivial and should be addressed. A study performed in 2000 found that simple demographics could identify a person. More specifically they found that 87% of Americans could be identified based only three pieces of information – zip code, gender and birthdate. This is a privacy issue, especially when data containing this information is publicly available. Employing encryptions to secure sensitive information while also allowing for data-sharing is very important.
There is also a fear of misuse. This fear often limits data access even within institutions. It is important that data be used and analyzed correctly. As I touched on in my previous blog post, data can sometimes have an illusion of objectivity. Those that are analyzing and presenting the data should have knowledge in the field. Different projects will require different expertise depending on the data and the goal of the project. For example, a study that looks at a treatment for a particular disease should involve those that treat the disease. Sounds pretty simple right? Similarly, if financial data is to be analyzed, financial experts should be included in the project.
Other challenges can be rooted in acknowledgement for scientific discoveries and the potential for revenue. Data is powerful and can bring about new, exciting and innovative changes. However, collaborations and data sharing can be viewed as squashing dreams of acknowledgement for creating the next big thing in medicine. Data is also often viewed as a gold mine. Predictive models, novel treatments and the like can be built with data and then used as a source of revenue. Data sharing may limit these possibilities. A counterargument to these points is the reproducibility problem. A study performed in 2016 found that more than 70% of researchers tried and failed to reproduce another scientist’s experiments and more than half failed to reproduce their own experiments. By sharing data and methodologies, studies can be reproduced to see if the same or similar results are achieved.
There are other considerations such as arguments of healthcare data ownership and other reasons that data sharing may be seen as not in the best financial interests of healthcare institutions.
Data Sharing Initiatives
The importance of data sharing and its potential in healthcare was really highlighted by the pandemic. A spotlight was cast on all the shortcomings and deficiencies on the flow of information with siloed data sitting at different institutions across the country and around the world. As COVID spread, there was a demand and expectation of collaboration between healthcare institutions and public health that didn’t necessarily exist. These bridges and tunnels of data sprang up quickly but perhaps arguably should have already been in place. But even before the pandemic, there was recognition that data sharing could be powerful, especially in the realm of public health.
One great example of data sharing for public health is the New York City Macroscope that was developed in 2013. A team of public health professionals were able to leverage medical data stored in primary care electronic medical records from multiple health institutions in NYC. This data was then used to track chronic diseases such as hypertension and obesity in adults throughout the city. Researchers are now able to answer targeted questions and estimate the health of NYC residents easily.
Another interesting data-driven public health initiative is the Childhood Obesity Data Initiative (CODI). The Centers for Disease Control and Prevention (CDC) partnered with MITRE to tackle childhood obesity with data. They recognized that this is a complex issue that encompasses many different factors – individual and community. This initiative set out to bring together different types of data that would give information not only on demographics and clinical health outcomes but also about participation in community programs. This is another great example of how data-sharing can help track chronic diseases in different populations and help us understand and prevent diseases.
Healthcare institutions have recognized the potential of regularly collected data, but how data can be obtained varies substantially. I recently learned of an “umbrella protocol” that was implemented at University of Texas Health Science Center at Houston. This protocol was submitted as “Clinical and administrative data reuse for research and quality improvement” to the Committee for the Protection of Human Subjects. This “protocol stipulates that no project-specific CPHS (IRB) approval is required as long as the following criteria are satisfied: 1. Data do no leave the servers maintained by the Biomedical Informatics Group, 2. No identifiable data are shared with individual researchers or published, and 3. No contact is made with patients (e.g., to collect additional data)”. By setting up this protocol, they were able to provide data to investigators within 1 day for internal requests and 3 days for researchers. As someone that has previously waited weeks to months for data, this is amazing.
Large improvements in healthcare can be made with data sharing within and between healthcare institutions as well as collaborations with other institutions such as public health. There are many challenges and barriers that need to be overcome as mentioned above as well as identifying good, reliable data. Strides have been made in making data more accessible but we still have a long way to go. More information about data sharing in the health care industry can be found in the review article: Sharing is Caring – Data Sharing Initiatives in Healthcare.