Characteristics of Mongo DB and it's use case.

MongoDB is a NoSQL, document-oriented database that stores data in a flexible, JSON-like format called BSON. It is designed for high performance, scalability, and ease of development.

May 17, 2025

Stores data in BSON (Binary JSON) format.
Each document is a flexible, schema-less structure — similar to JSON.
Example document:

{
  "_id": ObjectId("..."),
  "name": "Alice",
  "age": 30,
  "skills": ["Python", "MongoDB"]
}

⚡ 2. High Performance

Optimized for fast reads and writes.
Efficient for real-time analytics and large-scale applications.
Indexes on fields improve query speed.

🔄 3. Schema Flexibility (Schema-less)

No rigid schema — each document in a collection can have different fields.
Great for evolving data models or polymorphic data.

🌐 4. Horizontal Scalability

Supports sharding: data is distributed across multiple machines.
Easily scales out to handle large volumes of data.

📊 5. Rich Query Language

Powerful query capabilities: filter, sort, project, aggregate, etc.
Supports nested documents and arrays in queries.

🧠 6. Aggregation Framework

Powerful for data transformation and analytics.
Similar to SQL's GROUP BY, HAVING, JOIN (via $lookup).

🧵 7. Built-in Replication & High Availability

Uses replica sets for fault tolerance.
Automatically fails over to a secondary node if the primary fails.

🛠️ 8. Indexing

Supports single field, compound, multikey, geospatial, text, and hashed indexes.
Speeds up performance on queries.

🔐 9. ACID Transactions

Single-document operations are atomic by default.
Supports multi-document transactions (since MongoDB 4.0+).

☁️ 10. Cloud & Tooling Support

MongoDB Atlas for managed cloud deployment.
Has rich ecosystem: Compass (GUI), connectors for BI tools, drivers for all major languages.

🐯 What is WiredTiger?

WiredTiger is the default storage engine used in MongoDB since version 3.2. It is optimized for performance, concurrency, and compression.

One of the key components of WiredTiger is its in-memory cache, called the WiredTiger Cache.

🧠 What is the WiredTiger Cache?

The WiredTiger Cache is an in-memory area used to:

Hold frequently accessed documents and index entries
Buffer write operations before they're flushed to disk
Improve performance by reducing the need to access the disk frequently

Think of it as MongoDB’s internal “working memory.”

📏 Default Cache Size

By default, MongoDB assigns approximately 50% of (RAM - 1 GB) to the WiredTiger Cache.

Example:

If your system has 16 GB RAM:

OS reserve = 1 GB
Cache size ≈ 50% of (16 - 1) = 7.5 GB

You can override this using the setting:

storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: <value>

🔄 How it Works

Reads:
- When you read a document, MongoDB tries to fetch it from the WiredTiger Cache.
- If it's not in cache, it reads from disk and places it in the cache for future use.
Writes:
- Writes are first applied in the cache and logged to the journal for durability.
- They are later flushed to disk during checkpoints or evictions.
Eviction Policy:
- When the cache is full, the Least Recently Used (LRU) pages are evicted.
- Modified (dirty) pages are flushed to disk before eviction.

db.serverStatus().wiredTiger.cache

Key fields:

bytes currently in the cache
maximum bytes configured
tracked dirty bytes in the cache
pages read into cache
pages written from cache

In MongoDB, the efficiency of a document's size depends on your workload pattern (read-heavy, write-heavy, etc.), hardware constraints, and access patterns. However, MongoDB does set some hard and soft limits and best practices around this.

📏 1. Maximum Document Size

Hard limit: 16 MB (MongoDB enforces this — you cannot exceed it)

✅ 2. Best Practice — Optimal Document Size

For best performance, especially on frequently accessed documents:

🔸 Aim for ≤ 1 KB – 100 KB per document

This keeps reads/writes fast and avoids cache misses or large I/O operations.

🎯 3. Why Keep Documents Small?

| Reason                        | Explanation                                                                         |
| ----------------------------- | ----------------------------------------------------------------------------------- |
| **Better cache fit**          | Smaller documents allow more to fit in WiredTiger cache                             |
| **Less memory/disk pressure** | Reduces memory consumption and disk I/O                                             |
| **Faster reads/writes**       | Smaller payloads = faster network and disk operations                               |
| **Fewer updates overhead**    | MongoDB may relocate large documents on updates if they grow beyond allocated space |

🧠 4. When Larger Documents Are Okay

In document modeling, embedding is encouraged (instead of joins), which might increase size.
Larger documents (100 KB – 1 MB) are fine if:
- You need all embedded data together often
- Access is mostly read-heavy and infrequent
- Cache and memory are well-provisioned

🔄 5. Use `GridFS` for Large Files

If you need to store files larger than 16 MB (e.g., videos, images), use GridFS, MongoDB’s built-in mechanism to store and retrieve large files in chunks.

🧪 6. Measuring and Monitoring

Use the Object.bsonsize(doc) method to get the size of a document:

Object.bsonsize(db.collection.findOne())

Shashank’s Substack

Discussion about this post