COVID-19 Vulnerability Index
|size (curr.)||3160 kB|
Join us for our webinar on the CV19 Index Friday March 20, 2020 at 2PM central time. Registration: https://zoom.us/webinar/register/WN_E6k2QwTuSl6O-rrfaH3PVA
The COVID-19 Vulnerability Index (CV19 Index)
This repository contains the source code, models, and example usage of the COVID-19 Vulnerability Index (CV19 Index). The CV19 Index is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID-19 (commonly referred to as “The Coronavirus”). The CV19 Index is intended to help hospitals, federal / state / local public health agencies and other healthcare organizations in their work to identify, plan for, respond to, and reduce the impact of COVID-19 in their communities.
Full information on the CV19 Index, including the links to a full FAQ, User Forums, and information about upcoming Webinars is available at http://cv19index.com
This repository provides information for those interested in computing COVID-19 Vulnerability Index scores.
Versions of the CV19 Index
There are 3 different versions of the CV19 Index. Each is a different predictive model for the CV19 Index. The models represent different tradeoffs between ease of implementation and overall accuracy. A full description of the creation of these models is available in the accompanying paper, “Building a COVID-19 Vulnerability Index” (http://cv19index.com).
The 3 models are:
Simple Linear - A simple linear logistic regression model that uses only 14 variables. An implementation of this model is included in this package. This model had a 0.731 ROC AUC on our test set.
Open Source ML - An XGBoost model, packaged with this repository, that uses Age, Gender, and 500+ features defined from the CCSR categorization of diagnosis codes. This model had a 0.810 ROC AUC on our test set.
Free Full - An XGBoost model that fully utilizes all the data available in Medicare claims, along with geographically linked public and Social Determinants of Health data. This model provides the highest accuracy of the 3 CV19 Indexes but requires additional linked data and transformations that preclude a straightforward open-source implementation. ClosedLoop is making a free, hosted version of this model available to healthcare organizations. For more information, see http://cv19index.com.
We evaluate the model using a full train/test split. The models are tested on 369,865 individuals. We express model performance using the standard ROC curves, as well as the following metrics:
Computing the CV19 Index for a patient population
This section describes how to run the Simple Linear and Open Source ML versions of the CV19 Index using this package. The models work off of CSV files or Pandas data frames.
The input to the models is a CSV file or Pandas data frame, where each row represents a person and the columns are different computed features, or attributes, for each person. To use the model you need to calculate all features appropriate for that model. See the Data Preparation section below for an example on how to prepare data.
See the example Jupyter notebook for a description of the output data returned.
An Jupyter notebook with a worked example showing how to prep a data set is provided in the examples folder. This folder begins with a sample set of claims data and performs the necessary transformations to generate the input to the model.
pip install cv19index
To execute the xgboost model.
cv19index input.csv output.csv
Using the provided example data files.
cv19index examples/xgboost/example_input.csv examples/xgboost/example_prediction.csv
Using from within python using a pandas dataframe as input and get the predictions out as a pandas dataframe.
from pkg_resources import resource_filename model_fpath = resource_filename("cv19index", "resources/xgboost/model.pickle") model = read_model(model_fpath) predictions_df = run_model(input_df, model)
Contributing to the CV19 Index
We are not allowed to share the data used to train the models and build the index iwth our collaborators, but there are tons of ways you can help. If you are interested in participating, please contact us at firstname.lastname@example.org
A few examples are:
- Helping us build mappings from common claims data formats for this predictor, such as OMAP and CCLF. https://www.ohdsi.org/data-standardization/the-common-data-model/
- Converting CMS BlueBUtton data into a format usable by this model: https://https://bluebutton.cms.gov/
- Providing install instructions and support on more platforms.