🗄️ Biological Databases¶
Exam Importance: ⭐⭐⭐⭐ (High)
Key topics to focus on:
- Types of biological databases (Primary, Secondary, Specialized)
- NCBI, GenBank, BLAST
- Protein databases (PDB, UniProt)
- Phylogenetic tree construction
- FASTA format
Introduction to Database¶
Definition: A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via search criteria.
Key Components:¶
| Term | Definition |
|---|---|
| Record/Entry | Contains data items |
| Fields | Hold actual data items (names, addresses, dates) |
| Value | Specific piece of information to search |
| Query | Process of searching and retrieving records |
Biological Databases¶
Three Main Categories:¶
| Type | Description | Examples |
|---|---|---|
| Primary | Archives of raw sequence/structural data | GenBank, PDB |
| Secondary | Computationally processed or manually curated information | SWISS-Prot, PIR |
| Specialized | Cater to particular research interest | Flybase, HIV database, Ribosomal Database |
Various Omics Studies¶
| Study | Definition |
|---|---|
| Genomics | Study of complete genome - gene content, structure, variation |
| Proteomics | Large-scale examination of proteome - structures, functions, interactions |
| Transcriptomics | Analysis of complete RNA molecules (transcriptome) |
| Metabolomics | Systematic study of small-molecule metabolites |
Relation Between Multi-Omics and Bioinformatics¶
- Multi-omics integrates data from multiple biological layers for holistic insights
- Bioinformatics serves as core computational framework for processing and analyzing omics datasets
- Integration enables advanced applications in research and medicine
Popular Biological Databases & Tools¶
- GenBank
- BLAST (Basic Local Alignment Search Tool)
- PDB (Protein Data Bank)
- Swiss-Prot
- Uni-Prot
NCBI (National Center for Biotechnology Information)¶

Key Features:¶
- Public biological database platform maintained by U.S. NIH
- Provides free access to DNA, RNA, protein, and genome data
Major Resources:¶
- GenBank
- Protein
- Genome
- PubMed
- BLAST
- Taxonomy
- SRA
Website: https://www.ncbi.nlm.nih.gov/
More Nucleotide Sequence Databases¶

| Database | Full Name | Link |
|---|---|---|
| DDBJ | DNA Data Bank of Japan | https://www.ddbj.nig.ac.jp/ |
| ENA | European Nucleotide Archive | https://www.ebi.ac.uk/ena/browser/home |
GenBank¶

Key Features:¶
- Comprehensive public database of DNA and RNA sequences
- Maintained by NCBI
- Contains specific gene sequences, whole genomes, plasmid sequences
- Sequences submitted by researchers worldwide
- Each sequence assigned unique accession number
- Used for sequence retrieval, annotation, BLAST analysis, comparative genomics
Accession Number and GI Number¶

| Identifier | Description |
|---|---|
| Accession Number | Unique, permanent alphanumeric identifier for each sequence record |
| GI Number | Unique, sequential numeric identifier for each specific version |
Important
Any change in sequence results in assignment of a new GI number.
Downloading FASTA File from GenBank¶

FASTA Format:¶
- Begins with single-line description (defline)
- Description distinguished by ">" symbol at beginning
- Followed by lines of sequence data
BLAST (Basic Local Alignment Search Tool)¶

Purpose:¶
- Sequence comparison tool developed by NCBI
- Identifies sequence similarity between query and database sequences
- Helps in organism identification, gene annotation, homology analysis
- Essential tool in genomics, microbiology, bioinformatics
Different Types of BLAST¶

| Type | Query | Database |
|---|---|---|
| BLASTn | Nucleotide | Nucleotide |
| BLASTp | Protein | Protein |
| BLASTx | Translated nucleotide | Protein |
| tBLASTn | Protein | Translated nucleotide |
| tBLASTx | Translated nucleotide | Translated nucleotide |
BLAST Outcomes¶

Protein Data Bank (PDB)¶
Key Features:¶
- Central, open-access archive of 3D structures
- Contains proteins, DNA, and RNA structures
- Founded in 1971 at Brookhaven National Laboratory
- Each structure assigned unique PDB ID
Structure Determination Methods:¶
- X-ray crystallography
- NMR spectroscopy
- Cryo-EM
UniProt¶
Features:¶
- Comprehensive protein sequence and annotation database
- Contains information on protein function, domains, pathways
- Post-translational modifications
Sections:¶
| Section | Description |
|---|---|
| UniProtKB/Swiss-Prot | Reviewed entries |
| UniProtKB/TrEMBL | Unreviewed entries |
Each protein entry has unique UniProt accession number.
SWISS-Model¶

Purpose:¶
- Web-based automated protein structure homology modeling server
- Predicts 3D structures when experimental structures unavailable
- Builds models based on sequence similarity with known PDB structures
Applications:¶
- Structural bioinformatics
- Functional analysis
- Drug discovery
Types of Protein Structures¶

| Level | Description | Stabilized By |
|---|---|---|
| Primary | Linear sequence of amino acids | Peptide bonds |
| Secondary | Local folding (α-helix, β-sheet) | Hydrogen bonds |
| Tertiary | Overall 3D structure of single polypeptide | Various interactions |
| Quaternary | Association of multiple polypeptide chains | Protein-protein interactions |
AlphaFold 3: ML-Based Tool in Proteomics¶
Features:¶
- Advanced deep learning model for biomolecular structure prediction
- Extends beyond proteins to complexes:
- Protein-protein
- Protein-DNA
- Protein-RNA
- Protein-ligand
- Uses transformer-based neural networks
- Predicts high-accuracy 3D structures from sequence
Applications of Biological Databases¶
Analysis Applications:¶
| Application | Description |
|---|---|
| Phylogenetic Inference | Infers evolutionary relationships |
| Comparative Genomics | Compares genomes to identify shared/unique genes |
| AMR Profiling | Detects antibiotic resistance genes |
| Virulence Analysis | Identifies virulence factors |
| Functional Annotation | Assigns biological functions to genes |
| Pathway Reconstruction | Maps genes to metabolic pathways |
| Pan-genome Analysis | Determines core and accessory genes |
| Protein Structure Prediction | Predicts 3D structures |
| Drug Target Identification | Identifies conserved genes for targeting |
Galaxy Server: WGS Analysis Tool¶

Features:¶
- Open-source, web-based platform
- Enables bioinformatics analyses without programming
- All analyses through web browser
- Ensures reproducible research
- Supports workflow automation
Three Versions:¶
- Galaxy (main)
- Galaxy Europe
- Galaxy Australia
Commonly Used Galaxy Tools¶
| Tool | Function |
|---|---|
| FastQC | Evaluates quality of raw sequencing reads |
| SPAdes | Assembles reads into contigs/draft genomes |
| QUAST | Assesses genome assembly quality |
| Prokka | Rapid annotation of prokaryotic genomes |
| ABRicate | Screens for resistance and virulence genes |
| RGI (CARD) | Identifies antibiotic resistance genes |
| PlasmidFinder | Detects plasmid replicons |
| ISEScan | Detects insertion sequences |
| MLST | Determines sequence types for strain typing |
| Roary | Pan-genome analysis |
Construction of Phylogenetic Tree¶

Steps:¶
- Find and download sequences - NCBI
- Align sequences - Clustal W/X, MEGA
- Construct tree - MEGA 5/6/7
- Visualize tree - iTOL, FigTree
16S rRNA for Species Identification¶
Why 16S rRNA?¶
- Highly conserved among bacteria
- Contains variable regions for identification
- Used for:
- Identifying bacteria
- Determining taxonomic position
- Studying evolutionary relationships
For Eukaryotes
18S rRNA is used for species identification in eukaryotes.
Phylogenetic Tree Concepts¶

Definition: A phylogenetic tree (evolutionary tree) is the graphical representation of the evolutionary history of biological sequences, visualizing evolutionary relationships.
Understanding Phylogenies¶

Types of Phylogenetic Trees¶

| Type | Description |
|---|---|
| Cladogram | Shows only branching pattern without branch length |
| Phylogram | Branch length reflects genetic change |
| Ultrametric Tree | Scaled to time, all taxa equidistant from root |
Applications of Phylogenetic Trees¶
- ✅ Study evolutionary relationships between species
- ✅ Understand evolutionary processes over time
- ✅ Study diversity and distribution of species
- ✅ Develop conservation strategies
- ✅ Identify origins of pathogens
- ✅ Track spread of diseases
- ✅ Forensics - identify origins of biological samples
- ✅ Organize and classify organisms
📝 Exam Practice Questions¶
!!! question "Frequently Asked Questions" 1. Name the types of biological databases with examples 2. What is GenBank? Explain its uses 3. Explain the different types of BLAST 4. What is an accession number? 5. Describe the steps for constructing a phylogenetic tree 6. Differentiate between UniProt and PDB 7. What is the significance of 16S rRNA in species identification? 8. List the applications of phylogenetic trees 9. Explain the types of phylogenetic trees (cladogram, phylogram, ultrametric)