Skip to Main Content

Navigating the Dataset Landscape

This resource is designed to supplement instructional sessions and serve as a reference for researchers, students, and faculty members interested in discovering and utilizing datasets in the sciences and engineering fields.

Discipline-Specific Repositories

National Library of Medicine (NLM) Dataset Catalog - A catalog of biomedical datasets from various repositories, facilitating search and retrieval to accelerate scientific research.
Focus: Biomedical

Vivli -  Enables researchers to access anonymized clinical trial datasets from academia, government, and industry. It supports secure search, request, and analysis of individual participant data (IPD), fostering data reuse and discovery. Harvard researchers receive free storage (up to 500GB) for contributing data.
Focus: Clinical research

Data.CDC.gov - A repository of all available data sets with a Socrata Open Data API. Available categories include: Administrative, Biomonitoring, Child Vaccinations, Flu Vaccinations, Health Statistics, Injury & Violence, Motor Vehicle, NCHS, NNDSS, Pregnancy & Vaccination, STDs, Smoking & Tobacco Use, Teen Vaccinations, Traumatic Brain Injury, Vaccinations and Web Metrics.
Focus: Public health

The DANDI Archive - The BRAIN Initiative archive for publishing and sharing data including electrophysiology, optophysiology, and behavioral time-series, and images from immunostaining experiments.
Focus: Neurophysiology

Brain Observatory Storage Service & Database - BossDB is a volumetric database for 3D and 4D neuroscience data.
Focus: Neuroscience

The Cancer Imaging Archive - TCIA is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download. The data are organized as “collections”; typically patients’ imaging related by a common disease (e.g. lung cancer), image modality or type (MRI, CT, digital histopathology, etc) or research focus.
Focus: Oncology imaging