Skip to content

🗄️ Biological Databases

Exam Importance: ⭐⭐⭐⭐ (High)

Key topics to focus on:

- Types of biological databases (Primary, Secondary, Specialized)
- NCBI, GenBank, BLAST
- Protein databases (PDB, UniProt)
- Phylogenetic tree construction
- FASTA format

Introduction to Database

Definition: A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via search criteria.

Key Components:

Term Definition
Record/Entry Contains data items
Fields Hold actual data items (names, addresses, dates)
Value Specific piece of information to search
Query Process of searching and retrieving records

Biological Databases

Three Main Categories:

Type Description Examples
Primary Archives of raw sequence/structural data GenBank, PDB
Secondary Computationally processed or manually curated information SWISS-Prot, PIR
Specialized Cater to particular research interest Flybase, HIV database, Ribosomal Database

Various Omics Studies

Study Definition
Genomics Study of complete genome - gene content, structure, variation
Proteomics Large-scale examination of proteome - structures, functions, interactions
Transcriptomics Analysis of complete RNA molecules (transcriptome)
Metabolomics Systematic study of small-molecule metabolites

Relation Between Multi-Omics and Bioinformatics

  • Multi-omics integrates data from multiple biological layers for holistic insights
  • Bioinformatics serves as core computational framework for processing and analyzing omics datasets
  • Integration enables advanced applications in research and medicine

  1. GenBank
  2. BLAST (Basic Local Alignment Search Tool)
  3. PDB (Protein Data Bank)
  4. Swiss-Prot
  5. Uni-Prot

NCBI (National Center for Biotechnology Information)

NCBI

Key Features:

  • Public biological database platform maintained by U.S. NIH
  • Provides free access to DNA, RNA, protein, and genome data

Major Resources:

  • GenBank
  • Protein
  • Genome
  • PubMed
  • BLAST
  • Taxonomy
  • SRA

Website: https://www.ncbi.nlm.nih.gov/


More Nucleotide Sequence Databases

DDBJ and ENA Database Logos

Database Full Name Link
DDBJ DNA Data Bank of Japan https://www.ddbj.nig.ac.jp/
ENA European Nucleotide Archive https://www.ebi.ac.uk/ena/browser/home

GenBank

GenBank

Key Features:

  • Comprehensive public database of DNA and RNA sequences
  • Maintained by NCBI
  • Contains specific gene sequences, whole genomes, plasmid sequences
  • Sequences submitted by researchers worldwide
  • Each sequence assigned unique accession number
  • Used for sequence retrieval, annotation, BLAST analysis, comparative genomics

Accession Number and GI Number

Accession Number

Identifier Description
Accession Number Unique, permanent alphanumeric identifier for each sequence record
GI Number Unique, sequential numeric identifier for each specific version

Important

Any change in sequence results in assignment of a new GI number.


Downloading FASTA File from GenBank

FASTA Format

FASTA Format:

  • Begins with single-line description (defline)
  • Description distinguished by ">" symbol at beginning
  • Followed by lines of sequence data

BLAST (Basic Local Alignment Search Tool)

BLAST

Purpose:

  • Sequence comparison tool developed by NCBI
  • Identifies sequence similarity between query and database sequences
  • Helps in organism identification, gene annotation, homology analysis
  • Essential tool in genomics, microbiology, bioinformatics

Different Types of BLAST

BLAST Types

Type Query Database
BLASTn Nucleotide Nucleotide
BLASTp Protein Protein
BLASTx Translated nucleotide Protein
tBLASTn Protein Translated nucleotide
tBLASTx Translated nucleotide Translated nucleotide

BLAST Outcomes

BLAST Results


Protein Data Bank (PDB)

Key Features:

  • Central, open-access archive of 3D structures
  • Contains proteins, DNA, and RNA structures
  • Founded in 1971 at Brookhaven National Laboratory
  • Each structure assigned unique PDB ID

Structure Determination Methods:

  • X-ray crystallography
  • NMR spectroscopy
  • Cryo-EM

UniProt

Features:

  • Comprehensive protein sequence and annotation database
  • Contains information on protein function, domains, pathways
  • Post-translational modifications

Sections:

Section Description
UniProtKB/Swiss-Prot Reviewed entries
UniProtKB/TrEMBL Unreviewed entries

Each protein entry has unique UniProt accession number.


SWISS-Model

SWISS-Model

Purpose:

  • Web-based automated protein structure homology modeling server
  • Predicts 3D structures when experimental structures unavailable
  • Builds models based on sequence similarity with known PDB structures

Applications:

  • Structural bioinformatics
  • Functional analysis
  • Drug discovery

Types of Protein Structures

Protein Structures

Level Description Stabilized By
Primary Linear sequence of amino acids Peptide bonds
Secondary Local folding (α-helix, β-sheet) Hydrogen bonds
Tertiary Overall 3D structure of single polypeptide Various interactions
Quaternary Association of multiple polypeptide chains Protein-protein interactions

AlphaFold 3: ML-Based Tool in Proteomics

Features:

  • Advanced deep learning model for biomolecular structure prediction
  • Extends beyond proteins to complexes:
  • Protein-protein
  • Protein-DNA
  • Protein-RNA
  • Protein-ligand
  • Uses transformer-based neural networks
  • Predicts high-accuracy 3D structures from sequence

Applications of Biological Databases

Analysis Applications:

Application Description
Phylogenetic Inference Infers evolutionary relationships
Comparative Genomics Compares genomes to identify shared/unique genes
AMR Profiling Detects antibiotic resistance genes
Virulence Analysis Identifies virulence factors
Functional Annotation Assigns biological functions to genes
Pathway Reconstruction Maps genes to metabolic pathways
Pan-genome Analysis Determines core and accessory genes
Protein Structure Prediction Predicts 3D structures
Drug Target Identification Identifies conserved genes for targeting

Galaxy Server: WGS Analysis Tool

Galaxy Server Galaxy Interface 1 Galaxy Interface 2

Features:

  • Open-source, web-based platform
  • Enables bioinformatics analyses without programming
  • All analyses through web browser
  • Ensures reproducible research
  • Supports workflow automation

Three Versions:

  • Galaxy (main)
  • Galaxy Europe
  • Galaxy Australia

Commonly Used Galaxy Tools

Tool Function
FastQC Evaluates quality of raw sequencing reads
SPAdes Assembles reads into contigs/draft genomes
QUAST Assesses genome assembly quality
Prokka Rapid annotation of prokaryotic genomes
ABRicate Screens for resistance and virulence genes
RGI (CARD) Identifies antibiotic resistance genes
PlasmidFinder Detects plasmid replicons
ISEScan Detects insertion sequences
MLST Determines sequence types for strain typing
Roary Pan-genome analysis

Construction of Phylogenetic Tree

Phylogenetic Tree Construction Tree Tools

Steps:

  1. Find and download sequences - NCBI
  2. Align sequences - Clustal W/X, MEGA
  3. Construct tree - MEGA 5/6/7
  4. Visualize tree - iTOL, FigTree

16S rRNA for Species Identification

Why 16S rRNA?

  • Highly conserved among bacteria
  • Contains variable regions for identification
  • Used for:
  • Identifying bacteria
  • Determining taxonomic position
  • Studying evolutionary relationships

For Eukaryotes

18S rRNA is used for species identification in eukaryotes.


Phylogenetic Tree Concepts

Phylogenetic Concept 1 Phylogenetic Concept 2 Phylogenetic Concept 3 Phylogenetic Concept 4

Definition: A phylogenetic tree (evolutionary tree) is the graphical representation of the evolutionary history of biological sequences, visualizing evolutionary relationships.


Understanding Phylogenies

Understanding Phylogenies


Types of Phylogenetic Trees

Tree Types

Type Description
Cladogram Shows only branching pattern without branch length
Phylogram Branch length reflects genetic change
Ultrametric Tree Scaled to time, all taxa equidistant from root

Applications of Phylogenetic Trees

  1. ✅ Study evolutionary relationships between species
  2. ✅ Understand evolutionary processes over time
  3. ✅ Study diversity and distribution of species
  4. ✅ Develop conservation strategies
  5. ✅ Identify origins of pathogens
  6. ✅ Track spread of diseases
  7. ✅ Forensics - identify origins of biological samples
  8. ✅ Organize and classify organisms

📝 Exam Practice Questions

!!! question "Frequently Asked Questions" 1. Name the types of biological databases with examples 2. What is GenBank? Explain its uses 3. Explain the different types of BLAST 4. What is an accession number? 5. Describe the steps for constructing a phylogenetic tree 6. Differentiate between UniProt and PDB 7. What is the significance of 16S rRNA in species identification? 8. List the applications of phylogenetic trees 9. Explain the types of phylogenetic trees (cladogram, phylogram, ultrametric)