MongoDB - Full

|
Divnesh BLOG
Mongo Database

Introduction :

From past few years, the data generation, storage and analysis has been increasing tremendously. Traditional databases cannot deal with flexible data storage and the cost of scaling up is also high.

Enterprises need an infrastructure with the following capabilities to address these challenges:

  • Database to develop faster, deploy easier and scale bigger with flexible schema
  • Free and open source database
  • Rich query language for data read and write operations (CRUD) and aggregations
  • Indexes for faster query retrieval time

This course introduces you to MongoDB which is a free and open source NoSQL document database that provides an infrastructure with the above capabilities. In addition, MongoDB is suitable for heavy read operations.

It mainly focuses on MongoDB query language for document level operations, aggregation framework and indexes.

Business requirements change over time, affecting the data models and associated schemas in relational databases. MongoDB offers flexible schema to tackle this issue.

Appendix :

  • Install and configure MongoDB
  • Design data model for product catalog
  • Perform DML operations on MongoDB collections
  • Execute queries using aggregation operators
  • Build, Monitor and analyze queries for performance using indexes
eCommerce product catalog scenario

To manage an eCommerce system, a product catalog is needed that represents each product category with its respective attributes.

While using RDBMS, you need to create tables for each product category as shown

Also, schema needs to be defined upfront before loading the data.

Observation:

  1. RDBMS handles different data models by creating individual tables for each of them.

  2. This results in huge number of tables in the database. 

Challenge:

RDBMS cannot handle schema(table structure) changes in the tables efficiently.

Solution:

MongoDB addresses this challenge by offering a flexible schema. All the products irrespective of the schema can be stored using a single entity as shown:

The other advantages of MongoDB when compared to RDBMS are:


Popularity of MongoDB

MongoDB gained popularity with its flexible schema, scalability and document oriented NoSQL capabilities. In 2013, MongoDB was acclaimed as the "Database Management System of the Year" by DB-Engines.

The Forrester Wave™: Big Data NoSQL, Q3 2016 whitepaper named MongoDB as a leader amongst popular NoSQL databases.

The whitepaper also lists the following facts about MongoDB:

  • Shortlisted by the Enterprise Architecture (EA) pros for its security, services, tooling and document capabilities
  • Popular choice of more than half of the Fortune 100 companies

Some of the real world scenarios where MongoDB has found its place:

  • MetLife - one of world’s largest insurance companies uses MongoDB to build a single collection of data from 100 million customers, 100 products and more than 70 source systems
  • Shutterfly - an internet based photo sharing and personal publishing company has stored more than 20TB of data and developed dozens of projects on MongoDB
  • Genentech - a biotechnology corporation changed from relational database to MongoDB and has increased its drug research rate from 4 months to 3 weeks
Introduction to MongoDB
  • MongoDB is an open source document oriented database written in C++
  • Data in MongoDB is stored in documents within a collection as sets of name-value pairs
  • The document is a Java Script Object Notation (JSON*) that is schema-less
  • MongoDB uses Binary JSON (BSON**) for storage and retrieval of data

Features of MongoDB

  1. It supports schema at the document level, that can vary from one document to another. It does not have a fixed structure defined at the collection level

  2. It has rich query language that supports CRUD operations, aggregation, text search and so on

  3. As documents can be embedded within each other, it does not need complex joins.

  4. It also provides support for indexes for faster querying

  5. As data is stored across multiple machines, it is horizontally scalable. Sharding distributes the data across multiple machines

*JSON: is represented as set of ordered name/value pairs used to represent data that is easy to read and write.
**BSON: Binary representation of JSON data on disk.

Zovia eCommerce product catalog business component

Zoiva is an eCommerce system that sells products online based on customer needs. It maintains a product catalog consisting of more than 10 million products stored in relational database.

Challenges faced by Zoiva

With RDBMS Zoiva faces the following challenges:

  • Data model of products in the catalog keeps changing. It is difficult to update the RDBMS schema for each change
  • Customers would search products based on various criteria like price, brand and so on. This requires query flexibility without affecting the performance

Solution

Zoiva decided to migrate their product catalog to MongoDB to address these challenges. To achieve the same, the development team needs to implement the following requirements:

Before developing the product catalog, MongoDB needs to be installed.

Installation and configuration of MongoDB

  1. Download MongoDb
  2. Create the database path as data\db that MongoDB uses to store data.  Open Cmd --> md data  -->md data\db
  3. Set enviromnent variable path : C:\Program Files\MongoDB\Server\4.2\bin\
  4. Run mongod to start the server. MongoDB runs on 27017 port by default
  5. Run mongo to start the client. By default, MongoDB connects to the test database
  6. To test the client connectivity, execute the db command as shown above. It should display the output as test.
configuration :

As a part of the exploratory exercises, try out the following:

  1. Find out the version of MongoDB that is running [Hint: Version] : db.version()

  2. Change the default data path where MongoDB stores its data. [Hint: Data path]

  3. In certain cases, if MongoDB server shuts down unexpectedly and is not able to start again. What is the command to recover from such scenarios? [Hint: Repair]


Requirement 1: Designing the data model

The basic building blocks of MongoDB are:

  • Database
A MongoDB database contains all the data generated by the system as collections.
  • Collection (Similar to table in RDBMS)
Collection is a single entity that contains all the data related to a particular aspect.
  • Document (Similar to row in RDBMS)
Data is stored as documents which are represented as a set of fields with associated values of corresponding datatypes.

Datatypes in MongoDB

The datatypes supported in MongoDB are string, number, array, boolean, date, float and so on.

Requirement 1 implementation:

The development team created a collection to represent the product catalog for Zoiva. Here is a sample:


Requirement 2: Performing CRUD operations

CRUD operations
  • Create operation
  • Read operation
  • Update operation
  • Delete operation
  • Bulk write operation

2.1.1 Creating database

First, you need to create a database for Zoiva

use Zoiva

2.1.2 Insert operations

Use insert operation to add new documents to a collection. If a collection does not exist, the insert operation will create one.

Syntax:

db.collection.insert( 
{ 
	field: value, 
	field: value 
})
To check all Collection
 db.getCollectionInfos()

Requirement 2.1 implementation

In this section, you will see how to insert document/s into product_catalog collection.

To insert a single document into the product_catalog collection:

db.product_catalog.insert(   { 
	prodid: 7000001,  
	prodname: "iphone 7",
	manufacturer: "apple",     
	categories: {
	main:"electronics",
	sub:"smartphones"
	},       
	date_of_launch: new Date("2016-09-07"),      
	price: 60000,       
	colors: ["silver","black","gold","rosegold"]  
})

Alternatively, db.collection.insertOne() command can be used to insert one document at a time.

To insert multiple documents into the product_catalog collection:

db.product_catalog.insert ([
    {    
    prodid:7000010,  
    prodname:"nosql distilled",   
    publisher:"Addison-Wesley",   
    ISBN:1234567,  
    price:400
    },  
    {
    prodid:7000011,  
    prodname:"big data: principles and best practices",        
    publisher:"Dreamtech",  
    price:700
    }
] )

Alternatively, db.collection.insertMany() can be used for multiple documents insertion into the collection.

Now that the documents have been inserted, next you will see how to query by applying various conditions.


db.product_catalog.insertMany ( [
{
    "prodid": 7000002, 
    "prodname": "Java", 
    "publisher": "Oracle",
    "version": "1.7.0.55", 
    "price": 400
},
{
    "prodid": 7000003, 
    "prodname": "Sql edveloper", 
    "publisher": "Oracle",
    "description": "to connect to database RDMS"
 },
{
    "prodid": 7000004,
    "prodname": "db visualizer",
    "publisher": "IBM",
    "description": "to connect to database ICOMS",
    "jar": {
		"isJarRequired": true,
		"jar": "As400", 
		"version": 10.1
	},
     "version": 9.11
 } 
])


Read operation

You can use find operation to retrieve documents from the collection.

In this section, you will be doing the following:

  • Retrieving documents
  • Querying using relational operators
  • Working on array operators
  • Querying using logical operators
  • Retrieval using regular expression

Syntax:

    db.collection.find(
    	{ <<query criteria>> },
    	{ <<projection criteria>> }
    )

Consider this sample data:

    {
	prodid: 7000001,
	prodname: "iphone 7",
	manufacturer: "apple",
	categories: {main:"electronics",sub:"smartphones"},
	date_of_launch: new Date("2016-09-07"),
	price: 60000,
	colors: ["silver","black","gold","rosegold"]
}

You will be implementing different read operations on this data next.

To retrieve all the documents from the collection:

  1. db.product_catalog.find()

You can use db.collection.find().pretty() to display the result in a formatted manner.

Projecting only the required fields

Retrieve only the prodname and its price of all the documents

    db.product_catalog.find( 
{ },
{ prodname:1, price:1 }
).pretty()

You would have observed that the _id field will always be displayed. In order to suppress it use the following command:

    db.product_catalog.find( 
{ },
{ _id:0, prodname:1, price:1 }
).pretty()

Display based on criteria

  1. Retrieve the documents having manufacturer as 'apple'
db.product_catalog.find(
{ manufacturer:"apple" }
)

Retrieve the documents that either does not contain colors field or null value for that field

Using equality filter:

    db.product_catalog.find(
{ colors:null }
)

Using existence check: ( to check the key is present or not)

    db.product_catalog.find(
{ colors: {$exists:true} }
)

The categories field is an embedded document. The below query retrieves the details of the products whose main category is 'electronics':

    db.product_catalog.find(
{ "categories.main":"electronics" }
)

The colors field is an array. To retrieve the names of the products that are available in 'gold' color:

    db.product_catalog.find(
{ colors:"gold" }, -- filter
{ prodname:1 } -- project
)

NOTEBy default, 20 documents can be retrieved at a time. To retrieve specific number of documents at a time, for example: 3 documents, set the shell batch size using the following command:

    DBQuery.shellBatchSize=3

Then, use it at the shell to fetch the next batch of documents.

Querying using relational operators

$eq

To retrieve the names of the products that falls under the 'smartphones' sub category:

    db.product_catalog.find(
{ "categories.sub":{$eq:"smartphones"} },
{ _id:0,prodname:1 }
)

$lt

To retrieve the product names having price less than '25000':

    db.product_catalog.find(
{ price:{$lt:25000} },
{ _id:0,prodname:1 }
)

$ne

Retrieve the documents that have values other than 'electronics' for the main category:

    db.product_catalog.find(
{ "categories.main":{$ne:"electronics"} }
)

Working on array operators

Exact array match

Retrieve the products where colors field has exactly two elements – 'white' and 'black' in the order specified:

    db.product_catalog.find( 
{ colors: ["white", "black"] }
)

Array match specifying the number of elements

Query for documents that contain exactly 2 elements for the colors field:

    db.product_catalog.find(
{ colors:{ $size:2 } }
)

Match specified array elements

Retrieve the documents containing both the elements 'white' and 'gold' in the colors field, irrespective of the order or presence of other elements:

    db.product_catalog.find(
{ colors : { $all : ["white", "gold"] } }
)

Match one or more array elements satisfying the criteria

Retrieve the documents where revisedYears array field contains atleast one element that is both greater than '2015' and less than or equal to '2017':

    db.product_catalog.find( 
{ revisedYears: { $elemMatch: { $gt: 2015, $lte: 2017 } } }
)

Querying using logical operators

$and

Retrieve the details of the 'smartphones' whose price is less than '25000'[inclusive]:

    db.product_catalog.find(
{$and:[
{ "categories.sub":"smartphones" },
{ price:{$lte:25000} }
]
}
)

$or

Retrieve the details of the products whose price is greater than or equal to '80000' or products whose manufacturer is 'apple':

    db.product_catalog.find( 
{ $or: [
{ price: { $gte: 80000 } },
{ manufacturer: "apple" }
]
}
)

$not

Retrieve the products where the ISBN value is not equal to 2343454:

    db.product_catalog.find( 
{ ISBN: { $not: { $eq: 2343454 } }
}
)

$not retrieves documents if ISBN field does not exist also.

Retrieval using regular expressions

You can match patterns via regular expressions for string matching during querying.

Syntax:

    { <field>: { $regex: /pattern/, $options: '<options>' } }
OR
{ <field>: { $regex: 'pattern', $options: '<options>' } }
OR
{ <field>: { $regex: /pattern/<options> } }
OR
{ <field>: /pattern/<options> }

The $options clause takes the following as parameters:

  • i – Performs a case insensitive match
  • m – Performs pattern match that consists of anchors i.e. ^ for the beginning, $ for the end
  • x – Ignores all white space characters in the pattern
  • s – Allows dot character ( . ) to match all characters

Search for the products whose prodname starts with iphone:

db.product_catalog.find(
    { prodname:{ $regex:/^iphone/i } }
)
db.product_catalog.find(
    { prodname:{ $regex: /^very/, $options: ‘i’ } } 
) 

Update operation 

Update operation allows users to modify the existing documents in a collection whenever a change occurs. A criteria or filters needs to be specified.

  • Updating documents
  • Using field update operators
  • Working with array update operators

Syntax:

db.collection.update(
    { <<update criteria>> },
    { <<update action>> },
    { <<update option>> }
    )
Update a single document based on condition

Update the prodname of a book from 'hadoop application architectures' to 'Hadoop architecture':

db.product_catalog.updateOne(
{ prodname : "hadoop application architectures" },
{ $set: { "name" : "Hadoop architecture" } } 
)
Update multiple embedded documents

In the categories field, update the value of gender from 'women' to 'female':

    db.product_catalog.updateMany(
        {"categories.gender":"women"},
        { $set: {"categories.gender":"female"} }
    )
Using upsert to update documents else insert
    db.product_catalog.updateMany(
    { "price" : { $gt : 80000 }, "manufacturer" : "apple" },
    { $set: { "prodname" : "iphone 7 plus" } },
    { upsert: true }
    )
The operation updates all the documents with price greater than '80000' and manufacturer  'apple'. If there are no documents matching the given criteria, a new document would be inserted.

Using field update operators

$inc

Decrease the price of the book with ISBN '1979891' by 20:

    db.product_catalog.update(
        { ISBN: 1979891 },
        { $inc: { price: -20 } }
    )
$max

Update the price of the product with prodid '7000022' to '27500' if the given new value is greater than the current value:
       db.product_catalog.update( 
        { prodid: 7000022 },
        { $max: { price: 27500 } }
    )
$unset

To delete the colors field from the document with prodid 7000022
   
db.product_catalog.update{
{ prodid: 7000022 },
{ $unset:
 {colors: ""} 
}
)

$addToSet
Add 'white' color to the colors field of the document with prodid '7000020':
   db.product_catalog.update(
    { prodid: 7000020 },
    { $addToSet: { colors: "white" } }
    )
$addToSet ensures that the values inserted are unique
[Note: white color does not exist in the current set]

$push allows duplicate values inserted into the array.

$pop

Remove the first element of the colors field from prodid '7000020'
(use -1 to remove the value at the beginning of the array):

    db.product_catalog.update(
	{ prodid: 7000020 },
	{ $pop: { colors: -1 } }
    )

$pullAll

Delete all occurences of 'white' and 'gold' from the colors array:

db.product_catalog.update(
{ prodid: 7000020 },
{ $pullAll: { colors: ['white','gold'] } }
)



Delete operation
Delete operation allows you to delete the documents from a collection that is no longer valid.
You can specify criteria or filters to delete documents.
Syntax:

db.collection.remove(
{ <<remove criteria>> }
)


Requirement 2.4 implementation

Delete the product with prodname 'xperia z'
:

db.product_catalog.deleteOne(
{
prodname: "xperia z"
}
)


To delete all the products having price less than '1000':

db.product_catalog.remove( { price: {$lt : 1000 } }     )

To delete all the documents from the product_catalog collection:

db.product_catalog.remove( {} )

Alternatively, db.collection.deleteMany() is used for deleting multiple documents matching the given criteria.
Bulk write operation

You can also do bulk write operation to perform multiple inserts, updates, deletes on a collection through a single query.

The following write operations are supported:

  • insertOne
  • updateOne
  • updateMany
  • replaceOne
  • deleteOne
  • deleteMany

Requirement 2.5 implementation

In the below query, you will insert a new document with prodid '200025', update the price of 'galaxy 7' from '45000' to '40000' and deletes all the mobiles that comes under the 'smartphones' sub category whose price is less than '4000'

    db.product_catalog.bulkWrite(
[
{ insertOne :
    {
       "document" :
        {
          "prodid":200025,"prodname":"MacBook Air","manufacturer":"apple","price":56000
        }
    }
},
{ updateOne :
{
    "filter" : { "prodname" : "galaxy 7" },
    "update" : { $inc: {"price" : -5000 } }
}
},
{ deleteMany :
    { "filter" :
        {
          "categories.sub" : "smartphones",
          "price": { $lt: 4000 }
        }
    }
}
])

CRUD operations: Exercise

Exercise 2 : ProblemStatement:

Insert the product_catalog dataset provided in the artifacts download into a collection and then perform the following:

  • Retrieve only the prodname and price of the products in the 'smartphones' section.
     db.product_catalog.find(
        {"categories.sub":"smartphones"},
        { prodname : 1 , price:1}
    )
  • List the details of the 'washing machine' that ranges between 20000 to 30000 in ascending order of the product name.    
db.product_catalog.aggregate ({ $match:
{
"$and":[
    { price : { $lte:60000 ,$gte : 40000 } },
    { category : "WashingMachine" }
]
}},
{ $sort:
    { prodname : 1 }
})
or
db.product_catalog.find( {
"$and":[
    { price : { $lte:60000 ,$gte : 40000 } },
    {category:"WashingMachine" }
] } )
.sort({prodname:1} )
  • Retrieve the details of mobiles whose manufacturer is either 'apple' or 'samsung' and its price should be within 60000
        db.product_catalog.find({$and : [
        {$or : [
              {publisher:"apple"},{publisher:"Samsung"}
        ]},
        {category:"Mobile"},
        {price:{$lt:60000} }
    ]})   
})
  • Search for the books whose publisher name starts with 'O'Reilly'
             db.product_catalog.find({ prodname:{$regex:/^s/i} })

Exercise 2: CRUD operations

1. db.product_catalog.find( { "categories.sub": "smartphones" }, {_id:0,prodname:1,price:1} )
2. db.product_catalog.find( { $and: [ {"categories.sub": "washing machine" }, { price: { $gte: 10000, $lte: 20000 } } ] } ).sort( {prodname:1} )
3. db.product_catalog.find( { $and: [ { manufacturer: {$in: ["apple","samsung"] } }, { price: {$lte:60000 } } ] } )
4. db.product_catalog.find( { publisher:{ $regex:/^O'Reilly/i } } )
As expensive exercise,
  • Replace a single document containing manufacturer as Mi to the new document as { "prodname": "Xiaomi Mi Max Prime", "manufacturer": "Xiaomi", price: 20000 } [Hint: replaceOne]     
[Note: The document may or may not exist. If it doesn't exist, create a new one]
  • Retrieve the product details in ascending order of price [Hint: sort]
  • Retrieve the documents from position three to position five in the collection [Hint: skip and limit]
  • Count the total number of 'apple' products in the catalog [Hint: count]

Requirement 3: Using Aggregation

Aggregation methods in MongoDB

Similar to SQL, aggregation is used to perform computations on a set of documents to return a single result.

You can perform aggregation in three ways in MongoDB:

  • Map-Reduce programming
  • Single purpose aggregation methods
  • Aggregation pipeline operations
Implement Map-Reduce programming 

Consider the product_catalog collection where you need to find the total price of products for each manufacturer.

Only fields such as manufacturer and price are required to execute the above query. These are emitted as a key-value pair using the map function.

    map = function() {
    emit ( this.manufacturer, this.price );
    }

The reduce function aggregates the results obtained from the map function.

    reduce = function(key,values) {
        var totalPrice = 0;
        for(var i=0; i<values.length; i++)
        totalPrice += values[i];
        return totalPrice;
    }

The above functions are then used in the map-reduce command of MongoDB as shown:

    db.product_catalog.mapReduce( map, reduce, { out:{inline:1} } )

The inline option displays the results on the screen. Instead, results can be stored into another collection as shown below:

    db.product_catalog.mapReduce( map,reduce,{ out: "countPdts" }

Disadvantage:

Using map-reduce programming, you need to write the query logic using javascript for implementing the map and reduce functions.

Single purpose aggregation methods

The query discussed in map-reduce programming for finding the total price of products for each manufacturer can be performed using a single aggregate function group() as shown below:

    db.product_catalog.group( {    
        key: { manufacturer:1, price:1 },    
        reduce: function ( curr, result )
        {         
             result.totalPrice += curr.price;    
         },    
         initial: { totalPrice:0 }
    } )

We have other single purpose functions such as distinct() and count().

Disadvantage:

Here, you can perform only one aggregation operation at a time.

Work with aggregation pipeline operator

Most commonly used pipeline operators supported by the aggregation framework are shown below:
  • $project - Used to select the required fields based on the query
  • $match - Used to filter the documents based on criteria
  • $group - Used to aggregate a set of documents
  • $sort - Used to display query results in a sorted order
The query discussed earlier for finding the total price of products for each manufacturer is performed in two stages using the required pipeline operators as shown below:




db.product_catalog.aggregate( [
{ $project: { _id:0, manufacturer:1, price:1 } },
{ $group: { _id: "$manufacturer", totalPrice: { $sum: "$price" } } }
] )

More examples

  • Sort the collection with ename field ascending.

            db.employee.aggregate( [ { $sort: {ename:1} } ] )

  • Display the total count of products whose price is greater than '5000' and less than '25000'

    db.product_catalog.aggregate( [ { $match : { price : { $gt: 5000, $lt: 25000 } } }, { $group: { _id: null, count: { $sum: 1 } } }, { $project: { _id:0, count:1 } }     ] )
  • Store the first five highest priced products in a new collection named HighPricePdts
    db.product_catalog.aggregate( [ { $sort : { price: -1 } }, { $limit: 5}, { $out: "HighPricePdts"}     ] )
  • Concatenate the prodname and the manufacturer
      db.product_catalog.aggregate( [
    
        $project: { 
         prodman: { $concat:[ "$prodname"," - ","$manufacturer" ] } 
         
      }
    ] )
  • Return a document for each color in the colors array and randomly select 3 documents
    db.product_catalog.aggregate( [ { $unwind : "$colors" } , { $sample: { size: 3 } }     ] )
  • Return the first and the last element in the colors array
     db.product_catalog.aggregate( [     {     $project: {     prodname: 1,     first: { $arrayElemAt: [ "$colors", 0 ] },     last: { $arrayElemAt: [ "$colors", -1] }     }       }     ] )
  • Retrieve the documents with date_of_launch in the format yyyy-mm-dd
     db.product_catalog.aggregate( [ {   $project: {   _id:0, prodname:1, launchdate: { $dateToString: { format: "%Y-%m-%d",
date: "$date_of_launch" } } }   }     ] )


SQL to MongoDB aggregation Mapping chart

ProblemStatement:

Insert the below documents into the employee collection

    { eid:111, ename:"Akaash", loc:"BNG", salary:25000 }     { eid:222, ename:"Alok", loc:"CHD", salary:52000 }     { eid:333, ename:"Amar", loc:"BNG", salary:20000 }     { eid:444, ename:"Akbar", loc:"MYS", salary:60000 }     { eid:555, ename:"Anthony", loc:"CHD", salary:35000 }

Below are the SQL aggregation statements. Write the MongoDB equivalent of it.

  • SELECT loc,SUM(salary) AS total FROM employee GROUP BY loc
    db.employee.aggregate(
        {$project : {loc :1 ,salary:1}},
        { $group: { _id: "$loc",
            total: { $sum: "$salary" }
    }
    })

  • SELECT loc,count(*) FROM employee GROUP BY loc HAVING count(*) > 1
    db.employee.aggregate(
     [ { $group: {
            _id: "$loc",
            count: { $sum: 1 }
     } },
     { $match: { count: { $gt: 1 } } }
    ] )


Requirement 4: Using indexes to enhance performance

Analyze query performance using explain()

Consider a query that retrieves products whose price ranges between '5000' and '20000'. Performance for queries based on indexed and non-indexes fields can be analysed using the explain("executionStats") method as shown below:


    db.product_catalog.find( { price: { $gte: 5000, $lte: 20000 } }     ).explain("executionStats")

Query with no index

The query plan returns the following result without the price field being indexed:

Observation:

As seen above, the entire collection of 24 documents is scanned and only 5 documents have been retrieved.

Query with Index

The query plan returns the following result with the price field being indexed:

Observation:

As seen above, only 5 index entries and documents matching the query criteria are scanned

Result:

Using indexes, only those documents matching the given query are scanned whereas without indexes, every document needs to be scanned.

Work with indexes

Single field index

Create an index on the price field that is frequently used for querying products in the catalog for faster retrieval.

    db.product_catalog.createIndex( { price:-1 }     )

Compound index

The below query retrieves the list of products with price less than '5000' and rating is 4:

    db.product_catalog.find( { price:{ $lt:5000 }, rating: 4 }     )

To support the above query efficiently, you can create an index on the fields used in the above query:

    db.product_catalog.createIndex( { price:1, rating:-1 }     )

Multikey index

For efficiently querying array fields, multikey index can be created.

The below query indexes elements in ascending order of colors field which is an array in the product_catalog collection.

    db.product_catalog.createIndex( { colors:1 }     )

Text index

In the application, customers would like to perform text based searches, for example: search for either the main category such as 'electronics' or more specifically sub category such as 'smartphones'. Text indexes can be used for this purpose.

First, a text index is created and then $text operator is used to perform text search on the indexed field.

    db.product_catalog.createIndex( { categories:"text" }     )

Now, the above index can be used to retrieve the details of 'smartphones':

    db.product_catalog.find( { $text:{ $search:"smartphones" } }     )

Drop index

An index on colors can be dropped using either the index name or the index field.

    db.product_catalog.dropIndex( "colors_1" )
or
    db.product_catalog.dropIndex( { colors:1 } )

Exercise 4 : ProblemStatement:

As an exploratory exercise, perform the following:

  1. Ensure that the product names are not duplicated [UniqueIndex]

  2. Assume, Zoiva has announced a sale for certain set of mobiles for one hour from the time it has started the sale. The saleDate field for the mobile products are set to the current time. Delete the mobile products automatically after the sale time is over. The time limit being 1 hour. [TTLIndex]

MongoDB BSON types

The below table contains the various types of data supported in BSON and their corresponding number and string alias that can be used while querying the documents.

To retrieve the products whose rating field is of 'Null' datatype, either of the below commands can be used:

    db.product_catalog.find( { rating: { $type:10 } }     )

or
    db.product_catalog.find( { rating: { $type:"null" } }     )

Query and projection operators

Below are the various query and projection operators available in MongoDB.

Comparison Query Operators

Logical Query Operators

Element Query Operators

Array Query Operators

Query and projection operators

Exercise 5 : ProblemStatement:

  1. Write the MongoDB equivalent of the following:

    1. SELECT * FROM product_catalog WHERE publisher!= 'Shroff';

    2. SELECT * FROM product_catalog WHERE (price>=1000 AND price<=20000) OR (manufacturer IN('nike','reebok'));

  2. Query for documents in the 'shoes' category that contains exactly 2 colors

  3. Remove books whose ISBN is neither 2323232 nor 1979891
----------------------------------------------------------------------------------------------------------------------------------

Exercise 5: Query and projection operators
============================================

1(1)	db.product_catalog.find( { publisher: {$ne:"Shroff"} } )

1(2)	db.product_catalog.find( { $or: [ { price: {$gte:1000, $lte:20000} }, {manufacturer: { $in: [ "nike", "reebok" ] } } ] } ).pretty()

2.	db.product_catalog.find( { "categories.main": "shoes", colors:{ $size:2 } } )

3.	db.product_catalog.remove( { "categories.main": "books", ISBN: { $nin: [2323232,1979891] } } )

----------------------------------------------------------------------------------------------------------------------------------

Update Operators

There are various update operators available and are as follows:

Field Update Operators

Array Update Operators

Array Update Modifiers

Exercise 6 : ProblemStatement:

  1. Update the color 'black' to 'jet black' in the colors array without specifying its position

  2. Update 'iphone 7' date_of_launch to today's date

  3. Rename the prodname to productname for all the documents in the collection

  4. Add new colors 'red' and 'purple' at the beginning of the array for 'nike' products

-----------------------------------------------------------------------------------

Exercise 6: Update operators
==============================

1.	db.product_catalog.updateMany( {colors:'black'}, { $set:{ 'colors.$': 'jet black' } } )

2.	db.product_catalog.update({prodname:"iphone 7"},{ $currentDate:{date_of_launch:true}})

3.	db.product_catalog.updateMany({},{$rename:{"prodname":"productname"}})

4.	db.product_catalog.updateMany( {manufacturer:"nike"}, {$push:{colors:{$each:["red","purple"],$position:0} } } )
-----------------------------------------------------------------------------------
Solutions :
Exercise 1: Installation and configuration of MongoDB
======================================================

1. db.version()

2. Suppose, the data path is /user/sahana_basavaraja/MongoDBPath
   Then, the command to change the default data path is mongod --dbpath /user/sahana_basavaraja/MongoDBPath

3. mongod --repair

-------------------------------------------------------------------------------
Exercise 2: CRUD operations
===========================

1. db.product_catalog.find( { "categories.sub": "smartphones" }, {_id:0,prodname:1,price:1} )

2. db.product_catalog.find( { $and: [ {"categories.sub": "washing machine" }, { price: { $gte: 10000, $lte: 20000 } } ] } ).sort( {prodname:1} )

3. db.product_catalog.find( { $and: [ { manufacturer: {$in: ["apple","samsung"] } }, { price: {$lte:60000 } } ] } )

4. db.product_catalog.find( { publisher:{ $regex:/^O'Reilly/i } } )

-----------------------------------------------------------------------------

Exercise 3: SQL to MongoDB aggregation mapping chart
=====================================================

1. db.employee.aggregate( [
   {
     $group: {
        _id: "$loc",
        total: { $sum: "$salary" }
     }
   }
] )


2. db.employee.aggregate( [
   {
     $group: {
        _id: "$loc",
        count: { $sum: 1 }
     }
   },
   { $match: { count: { $gt: 1 } } }
] )

-----------------------------------------------------------------------------
Exercise 5: Query and projection operators
============================================
1(1)	db.product_catalog.find( { publisher: {$ne:"Shroff"} } )
1(2)	db.product_catalog.find( { $or: [ { price: {$gte:1000, $lte:20000} }, {manufacturer: { $in: [ "nike", "reebok" ] } } ] } ).pretty()
2.	db.product_catalog.find( { "categories.main": "shoes", colors:{ $size:2 } } )
3.	db.product_catalog.remove( { "categories.main": "books", ISBN: { $nin: [2323232,1979891] } } )
-----------------------------------------------------------------------------
Exercise 6: Update operators
==============================

1.	db.product_catalog.updateMany( {colors:'black'}, { $set:{ 'colors.$': 'jet black' } } )
2.	db.product_catalog.update({prodname:"iphone 7"},{ $currentDate:{date_of_launch:true}})
3.	db.product_catalog.updateMany({},{$rename:{"prodname":"productname"}})
4.	db.product_catalog.updateMany( {manufacturer:"nike"}, {$push:{colors:{$each:["red","purple"],$position:0} } } )

------------------------------------------------------------------------------
Additional references

ProblemStatement:

Real life scenario: Customer entitlement portfolio for a networking major
A major client is a leading provider of networking products. It has multiple RDBMS systems to manage service

contracts for the various hardware and software products it has sold to customers.  


The company’s RDBMS strategy has spawned multiple systems for each region.

Each region’s customers and the products they have purchased are captured in RDBMS.

There are a billion documents of this installed base of products.

Companies have service contracts on these installed products. Each service contract

for the covered products has a list of entitlements that are owed to a customer. Each customer

also has multiple users from its organization, who have various roles assigned to them under

each covered product in a contract. Some of these roles can be 'admin', 'user', 'audit', 'reports'.

This entire system is called Entitlement System.

RDBMS systems was not able to scale to hundreds of millions of contracts, products and

tens of millions of customers. MongoDB is used to merge all this data on a single platform

to achieve a single view of the customer.

In this assignment, you will be working on the sample data sets of the Entitlement System

  1. Create a database named Entitlement and insert the given collections [Use Source code]
  2. Find the number of active contracts [Hint: Active Contract defines any contract which has the end date greater than the current date]

  3. Identify the methods entitled for an 'audit' role

  4. Retrieve the user details where the username starts with 'telecom'

  5. Find the maximum used product [Hint: Use installedbase collection]

  6. Find all the contracts that are expiring in the next one month

  7. Update the address of 'bigdatasoft' customer as address: {city: "BrisbaneIL", country: "Australia"}

  8. Output a document for each element in the methods_entitled array from the roles collection into RoleMethods collection

  9. Add another product 'si-erp-SAP' to the userToProduct collection having '_id=114'

  10. What are the indexes which will improve the performance of queries in this scenario

As an exploratory exercise, perform the following:

[Hint: Use $lookup aggregate operator for the below queries]

  1. Find all active contracts and the methods entitled with it. Display the contractid and the methods_entitled

  2. Find all the products for a given customer

  3. Find all the customers for a given produc

db.getCollection('orders').aggregate(
   [
     {
       $project:
         {
           doc: "$$ROOT",
           year: { $year: "$created" },
           month: { $month: "$created" },
           day: { $dayOfMonth: "$created" }
         }
     },
     { $match : { "month" : 4, "year": 2016 } }
   you want all the orders which are created in April 2016 you can use below query.
db.getCollection('orders').aggregate(
[
  {
    $project:
    {
      doc: "$$ROOT",
      year: { $year: "$created" },
      month: { $month: "$created" },
      day: { $dayOfMonth: "$created" }
    }
},
{ $match : { "month" : 4, "year": 2016 }}]) 
> db.specificMonthDemo.aggregate([ 
{$project: {
StudentName: 1, 
StudentDateOfBirth:{$month: '$StudentDateOfBirth'}}}, 
{$match: {StudentDateOfBirth: 01}}
]).pretty();
Source  co
use Entitlement db.customers.insert({_id:1, custs: {city:"Bangalore", country:"INDIA"} }); db.customers.insert({_id:2, custname: "stayinhotels", address: {city:"Mumbai", country:"INDIA"} }); db.customers.insert({_id:3, custname: "randompublish",address: {city:"NewDelhi", country:"INDIA"} }); db.customers.insert({_id:4, custname: "justpearls", add {city:"Hyderabad", country:"INDIA"} }); db.customers.insert({_id:5, custname: "jaguarmotors", address: {city:"London", country:"UK"} }); db.customers.insert({_id:6, custname: "bigdatasoft", address: {city:"PaloAlto", country:"USA"} }); db.customers.insert({_id:7, custname: "telecomair", address: {city:"Sydney", country:"AUSTRALIA"} }); db.users.insert({_id:101, username: "megaretailadmin", custid: 1 }); db.users.insert({_id:102, username: "megaretailuser", custid: 1 }); db.users.insert({_id:103, username: "stayinhotelsadmin", custid: 2 }); db.users.insert({_id:104, username: "stayinhotelsuser", custid: 2 }); db.users.insert({_id:105, username: "randompublishadmin", custid: 3 }); db.users.insert({_id:106, username: "randompublishuser", custid: 3 }); db.users.insert({_id:107, username: "justpearlsadmin", custid: 4 }); db.users.insert({_id:108, username: "justpearlsuser", custid: 4 }); db.users.insert({_id:109, username: "jaguarmotorsadmin", custid: 5 }); db.users.insert({_id:110, username: "jaguarmotorsuser", custid: 5 }); db.users.insert({_id:111, username: "bigdatasoftadmin", custid: 6 }); db.users.insert({_id:112, username: "bigdatasoftuser", custid: 6 }); db.users.insert({_id:113, username: "telecomairadmin", custid: 7 }); db.users.insert({_id:114, username: "telecomairuser", custid: 7 }); db.roles.insert({_id:11, rolename: "admin", methods_entitled : ["create","drop"] }); db.roles.insert({_id:12, rolename: "user", methods_entitled : ["insert","update","delete"] }); db.roles.insert({_id:13, rolename: "audit", methods_entitled : ["audit"] }); db.roles.insert({_id:14, rolename: "report", methods_entitled : ["read"] }); db.roles.insert({_id:15, rolename: "support", methods_entitled : ["tickets"] }); db.contracts.insert({_id:501, custid: 1, fromdt: new ISODate('2016-01-23'), enddt: new ISODate('2017-01-22')}) db.contracts.insert({_id:502, custid: 1, fromdt: new ISODate('2017-01-15'), enddt: new ISODate('2018-01-14')}) db.contracts.insert({_id:503, custid: 2, fromdt: new ISODate('2016-01-01'), enddt: new ISODate('2017-31-12')}) db.contracts.insert({_id:504, custid: 2, fromdt: new ISODate('2016-01-01'), enddt: new ISODate('2016-31-12')}) db.contracts.insert({_id:505, custid: 2, fromdt: new ISODate('2017-01-01'), enddt: new ISODate('2017-31-12')}) db.contracts.insert({_id:506, custid: 3, fromdt: new ISODate('2014-01-01'), enddt: new ISODate('2014-31-12')}) db.contracts.insert({_id:507, custid: 4, fromdt: new ISODate('2014-01-01'), enddt: new ISODate('2015-31-12') }) db.contracts.insert({_id:508, custid: 5, fromdt: new ISODate('2015-01-01'), enddt: new ISODate('2016-31-12') }) db.contracts.insert({_id:509, custid: 6, fromdt: new ISODate('2016-05-17'), enddt: new ISODate('2017-05-16') }) db.contracts.insert({_id:510, custid: 7, fromdt: new ISODate('2017-02-15'), enddt: new ISODate('2018-02-14')}) db.userToContract.remove({}); db.userToContract.insert({_id:101, contractids: [501, 502] }); db.userToContract.insert({_id:102, contractids: [502] }); db.userToContract.insert({_id:103, contractids: [503,504,505] }); db.userToContract.insert({_id:104, contractids: [504,505] }); db.userToContract.insert({_id:105, contractids: [506] }); db.userToContract.insert({_id:106, contractids: [506] }); db.userToContract.insert({_id:107, contractids: [507] }); db.userToContract.insert({_id:108, contractids: [507] }); db.userToContract.insert({_id:109, contractids: [508] }); db.userToContract.insert({_id:110, contractids: [508] }); db.userToContract.insert({_id:111, contractids: [509] }); db.userToContract.insert({_id:112, contractids: [509] }); db.userToContract.insert({_id:113, contractids: [510] }); db.userToContract.insert({_id:114, contractids: [510] }); db.userRoleContract.insert({_id:101, contractids: [501, 502], roles: ["admin", "audit"] }); db.userRoleContract.insert({_id:102, contractids: [502], roles: ["user", "report", "support"] }); db.userRoleContract.insert({_id:103, contractids: [503,504,505], roles: ["admin", "audit"] }); db.userRoleContract.insert({_id:104, contractids: [504,505], roles: ["user", "report", "support"] }); db.userRoleContract.insert({_id:105, contractids: [506], roles: ["admin","audit","user","support" ] }); db.userRoleContract.insert({_id:106, contractids: [506], roles : ["report"] }); db.userRoleContract.insert({_id:107, contractids: [507], roles : ["admin","audit","user","report", "support" ] }); db.userRoleContract.insert({_id:108, contractids: [507], roles : ["admin","audit","user","report", "support" ] }); db.userRoleContract.insert({_id:109, contractids: [508], roles : ["admin"] }); db.userRoleContract.insert({_id:110, contractids: [508], roles : ["audit","user","report", "support" ] }); db.userRoleContract.insert({_id:111, contractids: [509], roles : ["admin","audit","user","report", "support" ] }); db.userRoleContract.insert({_id:112, contractids: [509], roles : ["admin","audit","user","report", "support" ] }); db.userRoleContract.insert({_id:113, contractids: [510], roles : ["admin" ] }); db.userRoleContract.insert({_id:114, contractids: [510], roles : ["admin","audit","user","report", "support" ] }); db.serviceline.insert ({_id:601, contractid: 501, serviceline :"datacenter" }) db.serviceline.insert ({_id:602, contractid: 502, serviceline :"networks" }) db.serviceline.insert ({_id:603, contractid: 503, serviceline :"analytics" }) db.serviceline.insert ({_id:604, contractid: 504, serviceline :"networks" }) db.serviceline.insert ({_id:605, contractid: 505, serviceline :"datacenter" }) db.serviceline.insert ({_id:606, contractid: 506, serviceline :"networks" }) db.serviceline.insert ({_id:607, contractid: 507, serviceline :"analytics" }) db.serviceline.insert ({_id:608, contractid: 508, serviceline :"systemsintegration" }) db.serviceline.insert ({_id:609, contractid: 509, serviceline :"systemsintegration" }) db.serviceline.insert ({_id:610, contractid: 510, serviceline :"networks" }) db.installedbase.insert({_id:801,serviceline_id: 601, product:"datacenter_bridge" }) db.installedbase.insert({_id:802, serviceline_id: 602,product:"network_router" }) db.installedbase.insert({_id:803,serviceline_id: 603, product:"hadoop-hiveDW" }) db.installedbase.insert({_id:804, serviceline_id: 604,product:"network_router" }) db.installedbase.insert({_id:805,serviceline_id: 605, product:"datacenter_switch" }) db.installedbase.insert({_id:806,serviceline_id: 606, product:"network_router" }) db.installedbase.insert({_id:807,serviceline_id: 607, product:"hadoop-hiveDW" }) db.installedbase.insert({_id:808,serviceline_id: 608, product:"si-eai" }) db.installedbase.insert({_id:809,serviceline_id: 609, product:"si-erp-SAP" }) db.installedbase.insert({_id:810,serviceline_id: 610, product:"network_router" }) db.userToProduct.insert({_id:101, product: ["datacenter_bridge", "network_router"] }); db.userToProduct.insert({_id:102, product: ["network_router"] }); db.userToProduct.insert({_id:103, product: ["hadoop-hiveDW","network_router","datacenter_switch"] }); db.userToProduct.insert({_id:104, product: ["network_router", "datacenter_switch"] }); db.userToProduct.insert({_id:105, product: ["network_router"] }); db.userToProduct.insert({_id:106, product: ["network_router"] }); db.userToProduct.insert({_id:107, product: ["hadoop-hiveDW"] }); db.userToProduct.insert({_id:108, product: ["hadoop-hiveDW"] }); db.userToProduct.insert({_id:109, product: ["si-eai"] }); db.userToProduct.insert({_id:110, product: ["si-eai"] }); db.userToProduct.insert({_id:111, product: ["si-erp-SAP"] }); db.userToProduct.insert({_id:112, product: ["si-erp-SAP"] }); db.userToProduct.insert({_id:113, product: ["network_router"] }); db.userToProduct.insert({_id:114, product: ["network_router"] });




















Featured Post

HTML cheetsheet

List: Link tgs Dropdown

Popular Post

(C) Copyright 2018, All rights resrved InShortView. Template by colorlib