Sign In

Data Resources

Each of the Health System Network (HSN) sites use and maintain the Health Care Systems Research Network (HCSRN) Virtual Data Warehouse (VDW). Projects conducted by the HSN will draw on the VDW, leveraging the processes and standardization of data developed for the VDW.  We have provide an overview of the HMORN VDW below.

The HCSRN VDW currently encompasses twelve data domains. Programmers at each site have transformed EHR and claims data elements from local data systems to a VDW standardized set of variable definitions, names, and codes. The common structure allows for programming code developed at one site to be used at other sites to extract and analyze data for a research. The VDW Operational Committee (VOC) provides direction to each HCSRN site on the implementation of the VDW. The VOC is also responsible for maintaining current documentation of data availability across sites, including site variations and site-specific issues, quality control evaluation of domain-specific data at each site, and documentation of policies and procedures for initiation and conduct of multi-site research within the HMORN.

The VDW's federated model offers an effective means of protecting the identity of patients, providers, and health plans while allowing researchers and analysts to access data from much larger populations than they would otherwise be able to access within their own institution. The VDW serves as the source of standardized data from a variety of data systems in each HCSRN site.

The VDW includes 1) a series of computerized data sets stored behind separate security firewalls at participating HMORN sites that include variables with identical names, formats, and specifications and identical variable definitions, labels, coding, and definitions; 2) a set of informatics tools—hardware and software—that facilitates storage, retrieval, processing, and managing VDW datasets; 3) a set of access policies and procedures governing use of VDW resources; and 4) documentation of all elements of the VDW.

Data standardization involves the following steps: 1) specifying common variable names, labels, coding, and definitions; 2) writing programs to extract and convert variables stored in IDS legacy information systems to the common standards; 3) testing standardized data for consistency and accuracy;  4)  standardizing methods by writing macros that are used across projects; and 5) teaching researchers and their analysts how to use the VDW to guide construction of analysis files for approved research projects.

Syntactic interoperability means a project programmer can use multiple versions of the same data files stored in separate systems to extract and combine information on a selected set of variables and be assured that they have extracted all the available information on these variables from the files and only information on these variables; moreover, these data attributes are homogeneous across settings.  Syntactic interoperability is achieved by using the same database and analysis software to store, retrieve, transform, and analyze data from multiple sites and time periods.

Semantic interoperability means that observations from different sites and times represent valid, reliable, and consistent measures of the same underlying well-defined concept across all the sources and over time.  This allows investigators to pool the observations on given variables across sites or time. The fundamental rationale for the VDW is to perform all the preparatory work for pooling existing data across multiple sites without creating a single concatenated file stored at one site.

The data domains within the Virtual Data Warehouse that are available to the HSN are:

  • Demographics contains date of birth, gender, race and ethnicity, and patient language.
  • Enrollment is based on health plan membership enrollment or geographic coverage of patients with indicators of insurance types, benefits, and effective dates of coverage.
  • Encounters characterizes outpatient visits and inpatient stays, including the associated diagnosis and procedure codes, type of encounter, provider seen, facility, and discharge disposition.
  • Procedures consists of all performed procedures including evaluation and management, surgery, laboratory, radiology, and immunization. Currently only performed procedures are captured and include various procedure coding systems (CPT-4, HCPCS, ICD-9-CM, insurance claims Revenue Codes).
  • Diagnoses includes dates, diagnosis codes and codes types, primary diagnosis, principal diagnosis flag, and diagnosing provider.
  • Providers includes information on the providers such as specialty, age, gender, race, and year graduated.
  • Cancer/Tumor Registry is based on the Surveillance, Epidemiology and End Results (SEER) program standards as many HMORN sites are SEER sites.  The domain consists of detailed stage and grade, date of diagnosis, dates of treatment initiation, and is one of the most complex domains of the VDW.
  • Pharmacy consists of pharmacy dispensing and claims and includes date of dispensing, National Drug Code or GPI code (to standardize across sites), therapeutic class, days supply, and amount dispensed. These data are widely used to assess pharmacy-based disease and co-morbidity classification systems.
  • Vital Signs are collected at most in-person encounters and include height, weight, and blood pressure readings. Tobacco use and type is also included.
  • Laboratory Results includes clinical laboratory test results for chemistry, hematology and coaugulation. Over 100 different lab tests types have been defined for the VDW, but data for every test type is not available at every site. Individual sites have data for the most common test types as well as those high priority tests required for the studies in which they participate.
  • Census provides socioeconomic indicators for patient populations based on geocoded patient addresses and public census data, such as education level, income, and poverty.
  • Mortality includes patient dates of death and causes of death. Mortality data is derived from multiple sources including EMR and utilization data, state death certificate data, and federal death data such as Social Security Administration data and the National Death Index.​​ 510-891-3560     2000 Broadway, Oakland, CA 94612 | NIDA Clinical Trials Network   CTN logo

© Copyright 2017