- Create database, documents and collections
- Documents and collections in a document-oriented database are roughly equivalent to the rows and tables, respectively in RDBMS
- Perform DML operations on collections
- Create and execute queries using aggregation operators and pipelining aggregation operators
- Build, monitor, and analyze queries for performance using indexes
What is MongoDB
- MongoDB is an open source document oriented database written in C++.
- Data in MongoDB is stored in key-value pairs in JSON format called as documents, which are roughly equivalent to rows in RDBMS.
- Multiple documents are stored in what is called a collection in MongoDB. A collection is roughly equivalent to a table in RDBMS.
- MongoDB is considered as schema-less as it can store documents having varying sets of fields with different types of each field.
- MongoDB stores these JSON documents in a binary-encoded format called BSON (Binary JSON) for faster storage and retrieval of data.
Features of MongoDB
- It supports flexible schema at the document level, that can vary from one document to another. It does not have a fixed structure defined at the collection level
- It has rich query language that supports CRUD operations, aggregation, text search and so on
- As documents can be embedded within each other, it does not need complex joins.
- It also provides support for indexes for faster querying.
- MongoDB stores data by using 'key-value' entities and provides flexibility in adding attributes for products of various categories in a single collection
MongoDB
|
RDBMS
|
Non-relational document oriented database
|
Relational table oriented database
|
Supports JSON-like query language
|
Supports SQL query language
|
Collection based
|
Table based
|
Document based
|
Row based
|
Field based, uses key value pairs
|
Column based
|
Schema is dynamic and flexible
|
Schema is predefined
|
Emphasis on CAP theorem
|
Emphasis on ACID properties
|
No need for JOINS
|
Supports JOINS across multiple tables
|
Better performance due to simpler queries
|
Complex queries cause slow performance
|
Create a product catalog :
MongoDB Database: A MongoDB database stores all the related collections.
MongoDB Collection: A collection is a single entity that contains all the data related to a particular aspect. A collection is created inside a database
Commands :
- creates a database : use Zoiva
- To view database : db
- Creates a collection: db.createCollection('product_catalog') output will say <"ok":1>
- To drop collection : db.product_catalog.drop()
- To drop the database currently in use : db.dropDatabase()
Create a product in the catalog :
Commands:
- Inserting one document : db.collection_name.insert({document})
Output : WriteResult({"nInserted":1})
- Inserting multiple documents : db.collection_name.insert( [ {document1}, {document2}] )
Output : BulkWriteResult({"nInserted":3})
enclose the comma separated documents inside square brackets or [ ]
Alternate insert methods
- db.collection_name.insertOne()
eg : db.inventory.insertOne(
{ prodid: 7000001,
prodname: "iphone 7",
manufacturer: "apple",
categories: {main:"electronics",sub:"smartphones"},
year_of_launch: 2017,
price: 60000,
publisher: "Dreamtech",
colors: ["silver","black","gold","rosegold"]}
)
Output : { "acknowledged ": true,
"insertedid : [ objectid ("axx") ]" }
- db.collection_name.insertMany()
eg : db.inventory.insertMany( [
{ prodid: 7000001,
prodname: "iphone 7",
manufacturer: "apple",
categories: {main:"electronics",sub:"smartphones"},
year_of_launch: 2017,
price: 60000,
publisher: "Dreamtech",
colors: ["silver","black","gold","rosegold"]
},
{ prodid: 7000002,
prodname: "iphone X",
manufacturer: "apple",
categories: {main:"electronics",sub:"smartphones"},
year_of_launch: 2018,
price: 90000,
publisher: "Iphone",
colors: ["silver","black","gold","rosegold"]
},
{ prodid: 7000003,
prodname: "wardrobe",
manufacturer: "apple",
categories: {main:"decorative",sub:"wooden storage"},
year_of_launch: 2018,
price: 30000,
quantity : 3,
publisher: "Avanta Grand",
colors: ["marune","black"]}
] )
Output : { "acknowledged ": true,
"insertedid : [
objectid("axx"), objectid("axx1"), objectid("axx2")
]" }
Retrieve product details to create reports :
db.product_catalog.find()
db.product_catalog.find().pretty()
Parameters in find()
db."collection_name".find( {query}, {projection} )
query : db.product_catalog.find( { publisher: "Dreamtech" } )
projection : which fields are to be returned in the documents that match the query selection criteria.
- The projection parameter document is usually written as
{ field1: <value>, field2: <value> ... }
The <value> is always written as 0 or 1. 0 means omit this field and 1 means include this field.
db.product_catalog.find( { publisher: "Dreamtech" }, { prodname: 1, price: 1 } )
it will display only prodname and price
- unique _id value that acts as a primary key.If an inserted document omits the _id field, then MongoDB automatically generates a unique id and sets it to the _id field.
db.product_catalog.find({ }, {_id: 0}).pretty()
- Retrieving product details from an embedded document
db.product_catalog.find({ "categories.main":"electronics" })
- Retrieving details based on an array element
db.product_catalog.find({ colors: "black" },{ prodname: 1 })
- Multiple selection criteria to retrieve documents, Similar to $and, the logical operators $or, $nor and $not are also available.
db.product_catalog.find(
{ $and : [{ "categories.sub": "smartphones" }, { manufacturer: "apple" }] },
{_id: 0, prodname: 1,price: 1}
)
- compare the value of a field with a given value, we can use relational operators
db.product_catalog.find(
{ "categories.main":{$ne:"electronics"} },{ _id: 0, prodname: 1 }
)
- To check if this array has specific colors like black and silver, we can use the $all - array operator. $all operator is enclosed in { }
The syntax of the $all operator is:{ field_name: { $all: [ "value1", "value2" ] }
This operator selects documents where the value of a field is an array, that contains all the specified elements, i.e the field's value must contain an array with value1 and value2 at least. If a document's field has only value1 or only value2 in the array, it is not returned.
- $exists { field: { $exists: <boolean> } } When <boolean> is true, $exists matches the documents that contain the field, including documents where the field value is null. If <boolean> is false, the query returns only the documents that do not contain the field.
db.product_catalog.find( { qty: { $exists: true, $nin: [ 5, 15 ] } } )
This query will select all documents in the inventory collection where the qty field exists and its value does not equal 5 or 15
Update products in the catalog
- The default update() method updates only the first matching document and ignores all other documents.
db.product_catalog.update(
{ prodname : "nosql distilled" },
{ $set: { "prodname" : "NoSQL Distilled - Second Edition" } }
)
Output: WriteResult({"nMatched":1, "nUpserted":0, "nModified":1})
db.product_catalog.update(
{ prodname : "nosql distilled" },
{ "prodname" : "NoSQL Distilled - Second Edition" }
)
The above query will override the existing document and retain only the _id field and the prodname field with the value as "NoSQL Distilled - Second Edition". All other fields are removed.
· Updating multiple documents
db.product_catalog.updateMany(
{"genre.academic": "technical"},
{ $set: {"genre.academic": "Computer Science Technology"} }
)
· Try to update a document that does not exist, upsert to true. This allows a new document to be inserted
db.product_catalog.updateMany(
{ "price" : { $gt : 80000 }, "manufacturer" : "apple" },
{ $set: { "prodname" : "iphone 7 plus" } },
{ upsert: true }
)
· use of $inc field update operator. This operator is used to update only numeric values of a field. This operator can take both, negative and positive values
db.product_catalog.update(
{ ISBN: 18407806 },
{ $inc: { price: 50 } }
)
· $push operator to add an element to an array.This query appends the color 'white' to the end of the array.
If the field does not exist, the $push operator creates the field and sets the value as an array with the element specified
db.product_catalog.update(
{ prodid: 7000001 },
{ $push: { colors: { $each: [ 'midnight blue', 'red' ] } } }
)
both the elements need to be pushed individually to the array, Using $each ensures that both the elements are pushed individually, rather than appending an array containing the new elements.
colors: ["silver","black","gold","rosegold",'midnight blue', 'red']
else it will be append array inside an array
colors: ["silver","black","gold","rosegold",['midnight blue', 'red']]
Delete products from the catalog :
· deleteOne() function is used to delete a single document. By default, this function will delete only the first matching document.
db.collection_name.deleteOne(
{ <> }
)
Output: {"acknowledged":true, "deletedCount":1}
· db.collection_name.deleteMany() is used for deleting multiple documents that match the given criteria.
db.product_catalog.deleteMany(
{
price: { $lt : 1000 }
}
)
Perform inventory analysis :
Aggregate Functions
count() - total number of products and distinct() - to retrieve an array of the distinct values of a particular field
db.product_catalog.count( { "categories.sub": "smartphones" })
db.product_catalog.distinct( "manufacturer" )
The aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages may not produce one output document for every input document; e.g., some stages may generate new documents or filter out documents.
Documents enter a multi-stage pipeline that transforms the documents into a final aggregated result.
product_catalog --> $project --> $group --> aggregated results
$project is used to specify which fields are required for performing the aggregation.
$group is used to group the fields by value and then perform some aggregation
Total price of all products created by each manufacturer
db.product_catalog.aggregate( [
{ $project: { _id: 0, manufacturer: 1, price: 1 } },
{ $group: { _id: "$manufacturer", totalPrice: { $sum: "$price" } } }
] )
Get a count of how many products are created by each manufacturer
db.product_catalog.aggregate( [
{ $group: { _id: "$manufacturer", totalProducts: { $sum: 1 }} }
] )
Other functions :
db.product_catalog.aggregate( [
{ $sort : { price: 1 } },
{ $limit: 5},
{ $out: "FiveCheapestMobiles"}
] )
- $sort is used on the price to specify sort by ascending (1) or descending (-1)
- $limit is used to specify only 5 matching documents must be returned
- $out is used to specify the name of the collection in which the matching documents will be stored
db.FiveCheapestMobiles.find().pretty()
Reduce timetaken by search results :
- Consider a query that retrieves products whose price ranges between '5000' and '20000'.this query will search all the documents in the collection to find matching documents and return them. This can lead to considerable waiting time by users and cause an application to perform slowly.The performance for queries based on indexed and non-indexed fields can be analyzed by using the explain("executionStats")
db.product_catalog.find(
{ price: { $gte: 5000, $lte: 20000 } } )
.explain("executionStats")
- Single field index
db.product_catalog.createIndex(
{ price:-1 })
output :
- Compound index : to filter multiple criteria
db.product_catalog.createIndex(
{ price:1, rating:-1 })
indicates whether an ascending index (1) or descending index (-1)
- Text index
- Customers would like to perform text based searches,for example: search for either the main category such as 'electronics' or more specifically sub category such as 'smartphones'.
db.product_catalog.createIndex(
{categories:"text"})
- Now, the $text operator is used to perform text search on the indexed field to retrieve the details of 'smartphones'
db.product_catalog.find(
{ $text:{ $search:"smartphones"}})
- Delete an index
db.product_catalog.dropIndex( "price_-1" ) or db.product_catalog.dropIndex( { price:-1 } )
- Delete all indexes
db.product_catalog.dropIndexes()