Know your Mongo through Replica Set Oplog and System Profile

At work, recently we noticed that records in a Mongo collection did not have the latest data. Specifically, I was storing the status as NEW -> PROCESSING -> OK, and then expecting to see status of OK, since that was the latest update, but we were seeing PROCESSING instead of OK. We checked the logs of the Java application and it was sending NEW -> PROCESSING -> OK. So based on the application logs we new we should see OK status but we were seeing PROCESSING.

We wanted to see what Mongo was getting. There are a couple ways to do it:

1. Mongo Oplog

To quote Mongo documentation on Oplog1:

The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then copy and apply these operations in an asynchronous process. All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database.

So here's how we can use it:

use local  
db.oplog.rs.find();  

This will return you all the records in the op log collection of the format:

/* 0 */  
{
    "ts" : Timestamp(1422739093, 110),
    "h" : NumberLong(-5537608931132355814),
    "op" : "u",
    "ns" : "Comic.ComicBooks",
    "o2" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01")
    },
    "o" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01"),
        "_class" : "com.rackspace.automation.support.mongo.AgentFile",
        "name" : "Spider Man",
        "lastUpdated" : ISODate("2015-01-31T21:18:13.480Z")
    }
}

/* 1 */
{
    "ts" : Timestamp(1422739093, 110),
    "h" : NumberLong(-5537608931132355814),
    "op" : "u",
    "ns" : "Comic.ComicVideos",
    "o2" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01")
    },
    "o" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01"),
        "_class" : "com.rackspace.automation.support.mongo.AgentFile",
        "name" : "Spider Man Video",
        "lastUpdated" : ISODate("2015-01-31T21:18:13.480Z")
    }
}

Notice: They are returning operations on different collections since we have not restricted the query to a particular collection.

Here's what the fields mean:

  • ts: the time this operation occurred.
  • h: a unique ID for this operation. Each operation will have a different value in this field.
  • op: the write operation that should be applied to the slave. n indicates a * no-op, this is just an informational message.
  • ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.
  • o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

If you want to check the recent operations on a particular collection you can:

db.oplog.rs.find({ns: 'Comic.ComicBooks'}).sort({ ts : -1 });  

In Comic.ComicBooks: Comic is the database name and ComicBooks is the collection name.

/* 0 */  
{
    "ts" : Timestamp(1422739093, 110),
    "h" : NumberLong(-5537608931132355814),
    "op" : "u",
    "ns" : "Comic.ComicBooks",
    "o2" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01")
    },
    "o" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01"),
        "_class" : "com.rackspace.automation.support.mongo.AgentFile",
        "name" : "Spider Man",
        "status: "NEW",
        "lastUpdated" : ISODate("2015-01-31T21:18:13.480Z")
    }
}

/* 1 */
{
    "ts" : Timestamp(1422739093, 110),
    "h" : NumberLong(-5537608931132355814),
    "op" : "u",
    "ns" : "Comic.ComicBooks",
    "o2" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01")
    },
    "o" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01"),
        "_class" : "com.rackspace.automation.support.mongo.AgentFile",
        "name" : "Superman",
        "status: "NEW",
        "lastUpdated" : ISODate("2015-01-31T21:18:13.480Z")
    }
}

If you want to query for a certain record you can get all the logs for it by:

db.oplog.rs.find({ns: 'Comic.ComicBooks', 'o._id': ObjectId("54d3977de4b0ad9775570661")}).sort({ ts : -1 });

Here we are querying for ComicBooks collection in Comic db, for id: 54d3977de4b0ad9775570661

And when I did so for the problem at hand I got the response below:

/* 0 */  
{
    "ts" : Timestamp(1422739093, 110),
    "h" : NumberLong(-5537608931132355814),
    "op" : "u",
    "ns" : "Comic.ComicBooks",
    "o2" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01")
    },
    "o" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01"),
        "_class" : "com.rackspace.automation.support.mongo.AgentFile",
        "name" : "Spider Man",
        "status": "PROCESSING",
        "lastUpdated" : ISODate("2015-01-31T21:18:13.480Z")
    }
}

/* 1 */
{
    "ts" : Timestamp(1422739093, 108),
    "h" : NumberLong(-5537608931132355814),
    "op" : "u",
    "ns" : "Comic.ComicBooks",
    "o2" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01")
    },
    "o" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01"),
        "_class" : "com.rackspace.automation.support.mongo.AgentFile",
        "name" : "Spider Man",
        "status": "OK",
        "lastUpdated" : ISODate("2015-01-31T21:18:13.580Z")
    }
}

/* 2 */
{
    "ts" : Timestamp(1422738003, 107),
    "h" : NumberLong(-5537608931132355814),
    "op" : "u",
    "ns" : "Comic.ComicBooks",
    "o2" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01")
    },
    "o" : {
        "_id" : ObjectId("54caab29368a16b1f1a00c01"),
        "_class" : "com.rackspace.automation.support.mongo.AgentFile",
        "name" : "Spider Man",
        "status": "NEW",
        "lastUpdated" : ISODate("2015-01-31T21:18:13.180Z")
    }
}

The query sorts the records in descending order of time. So Mongo saw: NEW -> OK -> PROCESSING but if you notice the lastUpdated field you will notice that the Java application updated the object/entity in NEW -> PROCESSING -> OK sequence. So it seems like something is going wrong in Spring Data or MongoDB Java driver.

Another way to get research issues with operations is System Profile.

2. Mongo System Profile

MongoDb provides system profiling2 that allows you to profile database queries, operations etc. This can help identify inefficient queries and operations.

You can run the profile command on a particular database:

use Comic  
db.setProfilingLevel(2)  
// Perform whatever operation you want to debug
db.system.profile.find();  

Sample output:

/* 0 */ 
{ "ts" : ISODate("2015-02-05T19:49:19.354Z"), "op" : "update", "ns" : "Comic.ComicBooks", "query" : { "id" : ObjectId("54d3ba5a836c6476930566de") }, "updateobj" : { "id" : ObjectId("54d3ba5a836c6476930566de"), "_class" : "com.foobar.ComicBook", "name" : "Superman", "synchronizationStatus" : "OK", "lastUpdated" : ISODate("2015-02-05T19:49:08.428Z") }, "millis" : 0, "client" : "10.14.15.16"
}

This will return you all the queries, and operations occurring in Comic db. Here's what the fields mean:

  • ts: the time this operation occurred.
  • op: the write operation that should be applied to the slave. n indicates a * ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.
  • query: the query used to find the record.
  • updateobj: the actual document representing the op. The contents of updateobj are prety self explanatory.
  • millis: The time in milliseconds from the perspective of the mongod from the beginning of the operation to the end of the operation.
  • client: The IP address or hostname of the client connection where the operation originates.

There are other fields that are included which are outside the scope of this post. You can read more about them here3.

NOTE: Please know that system.profile is a capped collection. So you can lose records over time since oldest records are removed if collection is near it's max size. Please set the profiling level to 0 once you are done researching the issue in order to not slow down the database. Also, this will help save hard disk space.

If you would like to query about specific collection you can do so by:
db.system.profile.find({ns: 'Comic.ComicBooks'});

If you want to query about a certain document id in a collection you can do so by:
db.system.profile.find({ns: 'Comic.ComicBooks', 'updateobj._id': ObjectId("54d3ba5a836c6476930566de")})

Basically, you can query this collection as you would query any other collection in Mongo.

Now, I could not find the records I was looking for in system.profile. The possible reason for that (according to team DBA is that we do not have logging on), but once I turn logging on, I would update this post to share if I could find what I was looking for with this. Regardless, I think this is a very handy tool to research issues with queries.

Great! I hope that was helpful.

Please help me out with your feedback/question in the comments section of this blog post. Muchas gracias.


References