Cross-Care Dataset

The Cross-Care Dataset provides comprehensive insights into co-occurrence patterns of various diseases. This dataset is invaluable for researchers and healthcare professionals seeking to understand complex disease interactions and trends.


Explore the cutting-edge features of our project, showcasing the power of data in understanding complex health issues.


>1TB Text Analyzed

Large-Scale Datasets (RedPajama + Pile)

More than 1 trillion tokens analyzed.


Co-Occurrence Patterns

Representational Harm (Demographic-Disease-Drug)

Race and Gender representation across 89 clinical terms.


Benchmarking Framework

Smart SRO Generation (Subject-Relation-Object)

Create benchmarks and experiments that mirror the real-world.

Our Data Visualizations

Explore our interactive visualizations showcasing key insights and trends derived from our comprehensive data analysis.

Trends Overview

Discover the evolving trends and patterns identified in our datasets, providing valuable insights into emerging topics and focus areas.

Trends Visualization

Race Distribution Analysis

A detailed breakdown of race distribution, highlighting the demographic diversity in our datasets and bringing attention to representation in health data.

Race Distribution Visualization

Proudly Open Source

Our project is open source and powered by open source software.
The code is available on GitHub.

Led by the Bitterman lab at the AIM Program, Mass General Brigham


Learn more about our work at AIM Lab.

Check out the repo here

Cross-Care Repo