What is a document database?

While relational databases rely on rigid structures, document databases are much more natural to work with and can be used for a variety of use cases across industries.

istock 1315600221

Credit: iStock/metamorworks

A document database (also known as a document-oriented database or a document store) is a database that stores information in documents. Document databases offer a variety of advantages, including:

Because of these advantages, document databases are general-purpose databases that can be used in a variety of use cases and industries.

Document databases are considered to be non-relational (or NoSQL) databases. Instead of storing data in fixed rows and columns, document databases use flexible documents. Document databases are the most popular alternative to tabular, relational databases.

What are documents?

A document is a record in a document database. A document typically stores information about one object and any of its related metadata.

Documents store data in field-value pairs. The values can be a variety of types and structures, including strings, numbers, dates, arrays, or objects. Documents can be stored in formats like JSON, BSON, and XML

Below is a JSON document that stores information about a user named Tom.

< "_id": 1, "first_name": "Tom", "email": "tom@example.com", "cell": "765-555-5555", "likes": [ "fashion", "spas", "shopping" ], "businesses": [ < "name": "Entertainment 1080", "partner": "Jean", "status": "Bankrupt", "date_founded": < "$date": "2012-05-19T04:00:00Z" > >, < "name": "Swag for Tweens", "date_founded": < "$date": "2012-11-01T04:00:00Z" > > ] >

Collections

A collection is a group of documents. Collections typically store documents that have similar contents.

Not all documents in a collection are required to have the same fields, because document databases have a flexible schema. Note that some document databases provide schema validation, so the schema can optionally be locked down when needed.

Continuing with the example above, the document with information about Tom could be stored in a collection named users. More documents could be added to the user’s collection in order to store information about other users. For example, the document below that stores information about Donna could be added to the user’s collection.

Note that the document for Donna does not contain the same fields as the document for Tom. The user’s collection is leveraging a flexible schema to store the information that exists for each user.

Key features of document databases

Document databases have the following key features:

Document model: Data is stored in documents (unlike other databases that store data in structures like tables or graphs). Documents map to objects in most popular programming languages, which allows developers to rapidly develop their applications.

Flexible schema: Document databases have a flexible schema, meaning that not all documents in a collection need to have the same fields. Note that some document databases support schema validation, so the schema can be optionally locked down.

Distributed and resilient: Document databases are distributed, which allows for horizontal scaling (typically cheaper than vertical scaling) and data distribution. Document databases provide resiliency through replication.

Querying through an API or query language: Document databases have an API or query language that allows developers to execute the CRUD operations on the database. Developers can query documents based on unique identifiers or field values.

What makes document databases different from relational databases?

Three key factors differentiate document databases from relational databases:

  1. The intuitiveness of the data model: Documents map to the objects in code, so they are much more natural to work with. There is no need to decompose data across tables, run expensive joins, or integrate a separate Object Relational Mapping (ORM) layer. Data that is accessed together is stored together, so developers have less code to write and end users get higher performance.
  1. The ubiquity of JSON documents: JSON has become an established standard for data interchange and storage. JSON documents are lightweight, language-independent, and human-readable. Documents are a superset of all other data models so developers can structure data in the way their applications need — rich objects, key-value pairs, tables, geospatial and time-series data, or the nodes and edges of a graph.
  1. The flexibility of the schema: A document’s schema is dynamic and self-describing, so developers don’t need to first pre-define it in the database. Fields can vary from document to document. Developers can modify the structure at any time, avoiding disruptive schema migrations. Some document databases offer schema validation so you can optionally enforce rules governing document structures.

Summary

Document databases utilize the intuitive, flexible document data model to store data. Document databases are general-purpose databases that can be used for a variety of use cases across industries.

Get started with document databases by creating a database in MongoDB Atlas. Atlas has a generous forever-free tier you can use to kick the tires and explore the document model.

Related content

Are you ready for data hyperaggregation?

The idea of integrating data gathered from multiple sources is hardly new, but interest in digital transformation and AI is pushing enterprises to tackle it again.

By David Linthicum Sep 06, 2024 4 mins Data Integration Cloud Computing Data Management

Cloud application portability remains unrealistic

We’ve known for years that application portability between public cloud providers is not easy or cheap. Here are a few approaches to try instead.

By David Linthicum Sep 03, 2024 6 mins Multi Cloud Cloud Computing Data Management

Dealing with ‘day two’ issues in generative AI deployments

Once you get your retrieval-augmented generation system working effectively, you may face new challenges in scalability, user experience, and operational overhead.

By Dom Couldwell Sep 03, 2024 8 mins Generative AI Artificial Intelligence Data Management

Getting map data right and keeping it right

With digital maps and vast databases, there’s no limit to how rich and real-time maps can get. Accuracy and consistency will come from a system of unique identifiers called GERS.