Session 15: Introduction to NoSQL Databases

Duration: 2 hours (Lecture: 2 hours, No Lab)

Syllabus Topics:

Lecture:
- Introduction to NoSQL Database
- Features of NoSQL Database
- Structured vs. Semi-structured and Unstructured Data
- Difference between RDBMS and NoSQL Databases
- CAP Theorem
- BASE Model
- Categories of NoSQL Databases: Key-Value Store, Document Store, Column-Oriented, Graph

Lecture Notes (In-Depth)

Definition: NoSQL (Not Only SQL) databases are non-relational databases designed to handle large-scale, diverse, and unstructured data, offering flexibility, scalability, and performance for modern applications.
Purpose:
- Address limitations of traditional RDBMS in handling big data, high traffic, and varied data types.
- Support distributed architectures for cloud-based and web-scale applications.
Key Characteristics:
- Schema-less or flexible schema: No fixed structure, allowing dynamic data models.
- Horizontal scaling: Scale out by adding servers (sharding) rather than upgrading hardware (vertical scaling).
- High performance for specific workloads (e.g., key-value lookups, document queries).
- Support for diverse data types (e.g., JSON, XML, graphs).
Examples:
- MongoDB (Document Store), Redis (Key-Value Store), Cassandra (Column-Oriented), Neo4j (Graph).

Scalability:
- Horizontal scaling across distributed nodes, ideal for cloud environments.
- Example: Add more servers to handle increased traffic in MongoDB or Cassandra.
Flexible Schema:
- No fixed schema; documents or records can have varying structures.
- Example: A MongoDB collection can store documents with different fields without predefined columns.
High Availability:
- Replication and partitioning ensure data availability even during node failures.
- Example: Cassandra’s multi-node replication ensures no single point of failure.
Performance:
- Optimized for specific query patterns (e.g., key lookups in Redis, document retrieval in MongoDB).
- Often faster than RDBMS for certain workloads due to denormalized data and indexing.
Distributed Architecture:
- Data is spread across multiple nodes for load balancing and fault tolerance.
- Example: Sharding in MongoDB distributes data across servers.
Support for Big Data:
- Handles large volumes of data (terabytes to petabytes) and high-velocity data (e.g., real-time analytics).
Key Notes:
- NoSQL databases sacrifice some relational features (e.g., complex joins, strict ACID compliance) for scalability and flexibility.
- Best suited for applications like social media, IoT, e-commerce, and real-time analytics.

Structured Data:
- Highly organized, fits into fixed schemas (e.g., tables with rows and columns).
- Example: RDBMS tables like employees with defined columns (emp_id, name, salary).
- Characteristics: Fixed format, easily queried with SQL, used in traditional business applications.