There are multiple ways to take backups of mongodb is different configuraitions, one of the configuration that I have been involved recently is replica-sets. When mongodb is running in replica-set configuration, there is a single primary node and multiple secondary nodes. To take backup of the replica-set we can either do a mongodump of one of the nodes or shutdown one of the secondary nodes and take file copies, since in a replica-set all nodes have the same data (except arbiter). Lets see how we could deal with mongodump method of taking backup.
Sync the primary node so that all writes are flused to disk and lock the database for writes, doing a fsync allows for all writes to be persisted to the disk.
use admin
db.fsyncLock()
now we can issue the mongodump command, there are many more options for mongodump that can be changed, using defaults here
mongodump -h node_name --out /data/backups/backup_file_name
once the mongodump command is done we can unlock the database so that writes can be issued.
use admin
db.fsyncUnlock()
The downside of this approach is that the primary is not available for writes, but reads are fine. If by any chance there is write issued, all reads are also blocked after that, which is pretty drastic. The other option is to operate on one of the secondaries, this allows us to keep the primary available for write and read, the secondary can be used for backup purposes. Which node is PRIMARY or SECONDARY can be dynamically determined by running some javascript on the command line
#!/usr/bin/env ruby
require 'rubygems'
require 'json'
mongo_nodes = JSON.parse `mongo node_name --quiet
--eval "printjson(rs.status().members.map(
function(m) { return
{'name':m.name, 'stateStr':m.stateStr} }))"`
primary_node = mongo_nodes.detect {
|member| member['stateStr'] == 'PRIMARY'
}
This dynamic script allows us to find the node that we want to take backup from, either the primary or secondary.