Researchers from Johns Hopkins Medicine and The Johns Hopkins University have created and preliminarily tested what they believe may be one of the first models for predicting who has the highest probability of being resistant to COVID-19 in spite of exposure to SARS-CoV-2, the virus that causes it. The study is reported online in the journal PLOS ONE.
“If we can identify which people are naturally able to avoid infection by SARS-CoV-2, we may be able to learn — in addition to societal and behavioral factors — which genetic and environmental differences influence their defense against the virus,” said lead study author Karen (Kai-Wen) Yang, a biomedical engineering graduate student in the Translational Informatics Research and Innovation Lab at The Johns Hopkins University. “That insight could lead to new preventive measures and more highly targeted treatments.”
For its study, the research team set out to determine if a machine-learning statistical model could use health characteristics stored in electronic health records — providing patient data such as comorbidities (other medical conditions) and prescribed medications — as a means to pinpoint people with a natural ability to avoid SARS-CoV-2 infection. Those persons, said Yang, could then be studied to better understand the factors enabling their resistance.
A machine-learning model is a computer program or system that uses mathematical algorithms to find statistical patterns, and then apply the patterns moving forward. This gives such systems the ability to imitate human thinking and reasoning, and similar to the brain, learn over time.
“Using a machine-learning system to recognize complex patterns in large numbers of people with COVID-19 enabled another team of Johns Hopkins Medicine researchers in 2021 to predict the course of an individual patient’s case and determine the likelihood that it would become severe,” said co-senior study author Stuart Ray, M.D., vice chair of medicine for data integrity and analytics, and professor of medicine at the Johns Hopkins University School of Medicine. “Based on their success, our team wondered if the same approach also might be applied to predicting who could be exposed to SARS-CoV-2 in close quarters and still not get infected.”
To demonstrate the model’s ability to predict COVID-19 resistance, the researchers first acquired data from a clinical registry called the Johns Hopkins COVID-19 Precision Medicine Analytics Platform Registry (JH-CROWN). The registry contains information for patients seen within the Johns Hopkins Health System who have been suspected of, or confirmed as, having a SARS-CoV-2 infection.
For their resistance study, the researchers only included individuals who received a COVID-19 test between June 10, 2020, and Dec. 15, 2020, and who reported “potential exposure to the virus” as the reason for testing. The ending date was the point at which large-scale COVID-19 vaccination efforts started in the United States. Choosing this date, the researchers said, enabled them to avoid the effects on their findings of vaccines preventing infection rather than natural resistance.
The 8,536 study participants who reported exposure as their reason for getting COVID tested were divided into two groups: those who did not share a residence (called a “household” in this study) with any COVID-19 patients or their residence had 10 or more patients; and those who shared a residence with 10 or fewer people, with at least one being a COVID-19 patient. The first group, with 8,476 of the participants, was designated as the Training and Testing Set, while the second group, called the Household Index (HHI) Set, had 60 members, and was used as a separate testing set.
Keeping the household number to 10 or fewer, the researchers said, excluded people living in apartment complexes, dormitories and other higher-density, multi-unit living areas where exposure to a particular person positive for SARS-CoV-2 would be less intense.
To identify patterns and cluster participants so that those naturally resistant to SARS-CoV-2 stand out, both study sets were analyzed using the Maximal-frequent All-confident pattern Selection Pattern-based Clustering (MASPC) algorithm. MASPC is specifically designed for electronic health record data analysis that combines patient demographic information (age, sex and race), the International Statistical Classification of Diseases and Related Health Problems (ICD) medical diagnostic codes relevant to each case, outpatient medication orders and the number of comorbidities (other diseases) present.
Limitations to the study, said Ray, include potential bias from self-reporting of COVID-19 exposure by participants, the small number of participants in the HHI group, the possibility that participants tested for SARS-CoV-2 using home kits or at facilities outside the Johns Hopkins system (and therefore, the tests were not recorded in the JH-CROWN database), and the short timeframe of the study itself. He added that future trials using national patient data are needed to validate the model’s ability.