Skip to Main Content

Navigating the Dataset Landscape: Essential Resources for Scientific Research

This resource is designed to supplement instructional sessions and serve as a reference for researchers, students, and faculty members interested in discovering and utilizing datasets in the sciences and engineering fields.

Discipline-Specific Repositories

Protein Data Bank (PDB) - A global repository for 3D structural data of large biological molecules, such as proteins and nucleic acids, supporting research in molecular biology and biochemistry.
Focus: Structural biology

GigaDB - Organizes and curates data from individually publishable units into datasets, which are provided openly and in as FAIR manner as possible for the global research community.
Focus: Life sciences

Biological Mangnetic Resonance Data Bank - Provides access to the information on biomolecules derived from nuclear magnetic resonance (NMR) spectroscopy.
Focus: Biomolecular spectroscopy

PubChem - A free chemistry database maintained by the National Center of Biotechnology Information (NCBI), providing information on the biological activities of small molecules, including chemical structures, identifiers, and bioactivity data.
Focus: Chemical molecules

UniProt A resource for protein sequence and annotation data, offering detailed information about protein functions, structures, and interactions.
Focus: Chemical molecules

AlphaFold Protein Structure Database - Provides high-accuracy predictions of protein structures generated by DeepMind's AlphaFold AI system. It offers structural models for a vast array of proteins, facilitating research in molecular biology, biochemistry, and related fields.
Focus: Protein structure prediction

MetaboLights - A database for cross-species and cross-platform metabolomics studies, providing access to metabolite structures, experimental protocols, and associated metadata.
Focus: Protein structure prediction

Human Metabolome Database (HMDB) - Provides detailed information about small molecule metabolites found in the human body, including chemical data, clinical information, and biochemical data.
Focus: Human metabolites

BioSample Database - Stores structured metadata about biological samples, linking them to NCBI archives (e.g., SRA, GEO, GenBank). It standardizes attributes, enhances data re-use and integration, supports queries, and enables efficient navigation and collaboration across research datasets.
Focus: Biological samples

FlyBase - A bioinformatics database and the primary repository for genetic and molecular data on Drosophila species. It offers genome annotations, mutant data, expression patterns
Focus: Drosophila genomics

Ensembl - A genome browser providing access to genomic information for various species, integrating sequence data with functional annotation, gene predictions, and comparative genomics.
Focus: Genomics and comparative genomics