Mongodb и адекватная репликация с replica set

2: Резервное копирование MongoDB с помощью mongodump

Сначала мы рассмотрим резервное копирование базы данных MongoDB.

Важным аргументом команды mongodump является –db, он указывает имя БД, резервную копию которой вы хотите создать. Если вы не укажете имя базы данных, mongodump создаст резервную копию всех ваших баз данных.

Второй важный аргумент – это –out, он определяет каталог, в который будут выгружены данные. Для примера давайте создадим резервную копию БД newdb и сохраним ее в каталоге /var/backups/mongobackups. В идеале каждая из наших резервных копий появится в каталоге с текущей датой (в формате /var/backups/mongobackups/10-29-20).

Сначала создайте необходимый каталог /var/backups/mongobackups:

А теперь запустите команду mongodump:

Вы увидите такой вывод:

2020-10-29T19:22:36.886+0000    writing newdb.restaurants to
2020-10-29T19:22:36.969+0000    done dumping newdb.restaurants (25359 documents)

Обратите внимание, что в указанном выше пути к каталогу мы использовали date +”%m-%d-%y”, что автоматически установит текущую дату. Это позволит нам создавать резервные копии внутри каталога, например, /var/backups/10-29-20/

Это особенно удобно, когда мы автоматизируем резервное копирование.

На этом этапе у вас есть полная резервная копия базы данных newdb в каталоге /var/backups/mongobackups/10-29-20/newdb/. В этой резервной копии есть все, чтобы правильно восстановить newdb и сохранить точность и целостность данных.

Как правило, резервные копии должны создаваться регулярно, желательно тогда, когда сервер наименее загружен. Чтобы сделать это, можно установить команду mongodump как задание cron, чтобы она выполнялась в определенное время, например, каждый день в 03:03.

Для этого откройте crontab, редактор cron:

Обратите внимание, что при запуске sudo crontab вы будете редактировать задания cron для пользователя root. Рекомендуется делать именно так, потому что если вы установите crons для другого пользователя, задачи могут работать некорректно (особенно если ваш профиль sudo требует проверки пароля)

В командную строку crontab введите следующую команду mongodump:

В приведенной выше команде мы намеренно опускаем аргумент –db, потому что резервное копирование обычно требуется всем базам данных.

В зависимости от размеров вашей БД MongoDB, рано или поздно у вас закончится дисковое пространство – накопится много резервных копий. Вот потому также рекомендуется регулярно удалять или сжимать старые копии.

Например, чтобы удалить все резервные копии старше семи дней, вы можете использовать следующую команду bash:

Как и предыдущую команду mongodump, вы можете добавить ее как задачу cron. Она должна запускаться непосредственно перед началом следующего резервного копирования, например, в 03:01. Снова откройте crontab:

После этого вставьте следующую строку:

Сохраните и закройте файл.

Все эти задачи обеспечат правильное выполнение резервного копирования ваших баз данных MongoDB.

How does replication work in MongoDB?

Replication exists primarily to offer data redundancy and high availability. We maintain the durability of data by keeping multiple copies or replicas of that data on physically isolated servers. That’s replication: the process of creating redundant data to streamline and safeguard data availability and durability.

Replication allows you to increase data availability by creating multiple copies of your data across servers. This is especially useful if a server crashes or if you experience service interruptions or hardware failure.

If your data only resides in a single database, any of these events would make accessing the data impossible. But thanks to replication, your applications can stay online in case of database server failure, while also providing disaster recovery and backup options.

With MongoDB, replication is achieved through a replica set. Writer operations are sent to the primary server (node), which applies the operations across secondary servers, replicating the data.

If the primary server fails (through a crash or system failure), one of the secondary servers takes over and becomes the new primary node via election. If that server comes back online, it becomes a secondary once it fully recovers, aiding the new primary node.

Oplog:

The (operations log) is a special that keeps a rolling record of all operations that modify the data stored in your databases.

Replica Sets in MongoDB is a group of processes that maintain the same data set. Replica sets provide redundancy and and are the basis for all production deployments. This section introduces replication in MongoDB as well as the components and architecture of replica sets. The section also provides tutorials for common tasks related to replica sets.

Let’s assume that for our example, we have 3 servers called mongoDb1, mongoDb2, and MongoDb3.

In this configuration,

mongoDb-01 => PRIMARY IP: 10.0.0.11/24mongoDb-02 => SECONDARYIP: 10.0.0.12/24mongoDb-03 => SECONDARYIP: 10.0.0.13/24

Add Members to Replica Set

To add members to replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

The basic syntax of rs.add() command is as follows −

>rs.add(HOST_NAME:PORT)

Example

Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set, issue the command rs.add() in Mongo client.

>rs.add("mongod1.net:27017")
>

You can add mongod instance to replica set only when you are connected to primary node. To check whether you are connected to primary or not, issue the command db.isMaster() in mongo client.

Useful Video Courses

Video

44 Lectures
3 hours

More Detail

Video

54 Lectures
5.5 hours

More Detail

Video

44 Lectures
4.5 hours

More Detail

Video

40 Lectures
2.5 hours

More Detail

Video

26 Lectures
8 hours

More Detail

Video

Dealing with replication delay

A major concern when it comes to configuring replication is the replication delay (lag). This refers to the delay in the replication process to a secondary node after an update to the primary node in the replica set.

A certain replication lag while replicating large data sets is normal. Still, the following factors can increase the replication delay, negating the benefits of an up-to-date replication:

Network latency. As you are dealing with multiple MongoDB instances residing in different servers during replication, the primary communication method will be the network. If the network is insufficient to cater to the needs of the replication process, there will be delays in replicating data throughout the replica set. Therefore, it is better to always route your traffic in a stable network with sufficient bandwidth.
Disk throughput. If the replication nodes use different disk types (e.g., the primary node using SSD while secondary nodes using HDD as disks), there will be a delay in replication since the secondary nodes will process the write queries slower compared to the primary node. This is a common issue in multi-tenant and large-scale deployments.
Heavy workloads. Executing heavy and long-running write operations on the primary node will also lead to delays in the replication process. So, it’s best to configure the MongoDB Write Concern correctly so that the replication process will be able to keep up with the workload without affecting the overall performance of the replica set.
Background tasks. Another important step is to identify the background tasks such as server updates, cron jobs, and security checkups that might have unexpected effects on the network or disk usage, causing delays in the replication process.
Database operations. – Some database queries can be slow to execute, while some might take a considerable time to execute. Using a database profiler, you can identify such queries and try to optimize them accordingly.

Replication in MongoDB¶

A replica set is a group of instances that maintain
the same data set. A replica set contains several data bearing nodes
and optionally one arbiter node. Of the data bearing nodes, one and
only one member is deemed the primary node, while the other nodes are
deemed secondary nodes.

The primary node receives all write
operations. A replica set can have only one primary capable of
confirming writes with
write concern; although in some circumstances, another mongod instance
may transiently believe itself to also be primary.
The primary records all changes to its data
sets in its operation log, i.e. oplog. For more information on primary node
operation, see Replica Set Primary.

The secondaries replicate the
primary’s oplog and apply the operations to their data sets such that
the secondaries’ data sets reflect the primary’s data set. If the
primary is unavailable, an eligible secondary will hold an election to
elect itself the new primary. For more information on secondary
members, see Replica Set Secondary Members.

In some circumstances (such as you have a primary and a secondary but
cost constraints prohibit adding another secondary), you may choose to
add a instance to a replica set as an
arbiter. An arbiter participates in
but does not hold data (i.e.
does not provide data redundancy). For more information on arbiters,
see Replica Set Arbiter.

Socket Exceptions when Rebooting More than One Secondary¶

When you reboot members of a replica set, ensure that the set is able
to elect a primary during the maintenance. This means ensuring that a majority of
the set’s are
available.

When a set’s active members can no longer form a majority, the set’s
steps down and becomes a . Starting in
MongoDB 4.2, when the primary steps down, it no longer closes all
client connections. In MongoDB 4.0 and earlier, when the primary steps
down, it closes all client connections.

Clients cannot write to the replica set until the members elect a new
primary.

Example

Given a three-member replica set where every member has
one vote, the set can elect a primary if at least two members
can connect to each other. If you reboot the two secondaries at
once, the primary steps down and becomes a secondary. Until at least
another secondary becomes available, i.e. at least one of the rebooted
secondaries also becomes available, the set has no primary and cannot
elect a new primary.

For more information on votes, see Replica Set Elections. For
related information on connection errors, see .

Prerequisites:

Install MongoDB on all nodes

refer: https://docs.mongodb.com/manual/administration/install-community/

Here I consider I have Ubuntu 18.04 LTS on all my 3 nodes run following commands on each node

2. If you have an active DNS server, add A records for all servers, or modify /etc/hosts file. Add these on all nodes.

$ sudo vim /etc/hosts10.0.0.11 mongoDb-01 10.0.0.12 mongoDb-02 10.0.0.13 mongoDb-03

3. Set hostname permanently

$ sudo vim /etc/hostname     ## On Node1 mongoDb-01 $ sudo vim /etc/hostname     ## On Node2mongoDb-02$ sudo vim /etc/hostname     ## On Node3mongoDb-03

4. Generate key

On Node 1(mongoDb-01)# mkdir -p /etc/mongodb/keyFiles/# openssl rand -base64 756 > /etc/mongodb/keyFiles/mongo-key# chmod 400 /etc/mongodb/keyFiles/mongo-key# chown -R mongodb:mongodb /etc/mongodb

Размещение бесплатного экземпляра mongodb для проектов в MongoDB Atlas

Для решения задач в этом руководстве нужно будет сохранять кой-какие данные, для этого будет использоваться база данных MongoDB.

Чтобы создавать веб-приложения с помощью базы данных MongoDB можно использовать три пути:

Для создания базы данных MongoDB и разработки приложения использовать собственный компьютер. Для этого вы должны установить сервер Node и сервер базы данных MongoDB на своем ПК.
Для создания базы данных MongoDB использовать облачный сервис MongoDB Atlas, а приложение разрабатывать и запускать на локальном ПК. Этот способ будет рассмотрен в данной статье.
Для создания базы данных MongoDB использовать облачный сервис MongoDB Atlas, а приложение разрабатывать и запускать на каком-нибудь облачном сервисе, например Glitch.

Чтобы не заморачиваться с установкой и настройкой MongoDB воспользуемся облачным сервисом MongoDB Atlas, который не только упростит конфигурацию базы данных, но и позволит иметь к этой базе доступ откуда угодно и в любое время. Руководство по настройке аккаунта в MongoDB Atlas и подключению экземпляра базы данных MongoDB читайте на этой странице.

Настройка репликации MongoDB

Для начала нужно остановить MongoDB на каждом сервере:

Затем нужно настроить каталог, который будет использоваться для хранения данных. Создайте его с помощью следующей команды:

Каталог дл хранения данных готов. Теперь нужно отредактировать конфигурационный файл, чтобы отразить в нём новую конфигурацию набора реплик.

В этом файле нужно отредактировать несколько параметров. Найдите переменную dbpath и направьте её на только что созданный каталог:

Раскомментируйте строку port, чтобы включить стандартный порт:

В нижней части файла нужно раскомментировать параметр replSet. Задайте ему значение, которое вы легко сможете узнать:

В завершение нужно включить ветвление процессов. В нижней части файла укажите:

Сохраните и закройте файл. Запустите узел репликации:

Примечание: Инструкции этого раздела нужно выполнить на каждом узле репликации.

Step 1: Configure MongoDB Replica set

To configure MongoDB to be ready to run replica sets. The MongoDB configuration can be found at . Only edit the following parameters. Keep other configurations as default.

On node 1 => mongoDb-01# network interfacesnet:  port: 27017  bindIp: 10.0.0.11#security:security: authorization: enabled keyFile:  /etc/mongodb/keyFile/mongo-key#replication:replication:  replSetName: "replicaset01"On node 2 => mongoDb-02# network interfacesnet:  port: 27017  bindIp: 10.0.0.12#security:security: authorization: enabled keyFile:  /etc/mongodb/keyFile/mongo-key#replication:replication:  replSetName: "replicaset01"On node 3=> mongoDb-03# network interfacesnet:  port: 27017  bindIp: 10.0.0.12#security:security: authorization: enabled keyFile:  /etc/mongodb/keyFile/mongo-key#replication:replication:  replSetName: "replicaset01"

To update changes restart mongod service on all nodes

Создание модели

CRUD Часть I — создание

CRUD — это сокращение для операций Create, Read, Update and Delete (создать, прочесть, обновить и удалить). Эти операции являются основными для работы с базами данных, таких как MongoDB.

В mongoose все завязано на 2х ключевых понятиях Схема(Schema) – описание сущности и Модель – сама сущность.

Прежде всего вам нужна https://mongoosejs.com/docs/guide.html.

Создадайте схему и модель из неё.

В файл скопируйте следующий код.

Каждое поле в характеризуется типом и может иметь дополнительные характеристики: , и (для Number), и (для String), и (для индексов). Подробнее о типах можно почитать тут.

В функции первый параметр — это имя модели, второй параметр — имя схемы, из которой создается модель.

Схемы — это строительный блок для моделей. Модель позволяет создавать экземпляры ваших объектов, называемых документами.

Examples¶

Copy from the Same Instance

To copy from the same host, omit the field.

The following command copies the database to a new
database on the current instance:

copy

db.adminCommand({
   copydb 1,
   fromdb "test",
   todb "records"
})

Copy from a Remote Host to the Current Host

To copy from a remote host, include the field.

The following command copies the database from the remote host
to a new database on the current
instance:

copy

db.adminCommand({
   copydb 1,
   fromdb "test",
   todb "records",
   fromhost "example.net"
})

To copy from another instance () that
enforces , then you
must use the shell method
.

How MongoDB replication works

MongoDB handles replication through a Replica Set, which consists of multiple MongoDB nodes that are grouped together as a unit.

A Replica Set requires a minimum of three MongoDB nodes:

One of the nodes will be considered the primary node that receives all the write operations.
The others are considered secondary nodes. These secondary nodes will replicate the data from the primary node.

Basic replication methodology

While the primary node is the only instance that accepts write operations, any other node within a replica set can accept read operations. These can be configured through a supported MongoDB client.

In an event where the primary node is unavailable or inoperable, a secondary node will take the primary node’s role to provide continuous availability of data. In such a case, the primary node selection is made through a process called Replica Set Elections, where the most suitable secondary node is selected as the new primary node.

The Heartbeat process

Heartbeat is the process that identifies the current status of a MongoDB node in a replica set. There, the replica set nodes send pings to each other every two seconds (hence the name). If any node doesn’t ping back within 10 seconds, the other nodes in the replica set mark it as inaccessible.

This functionality is vital for the automatic failover process where the primary node is unreachable and the secondary nodes do not receive a heartbeat from it within the allocated time frame. Then, MongoDB will automatically assign a secondary server to act as the primary server.

Replica set elections

The elections in replica sets are used to determine which MongoDB node should become the primary node. These elections can occur in the following instances:

Loss of connectivity to the primary node (detected by heartbeats)
Initializing a replica set
Adding a new node to an existing replica set
Maintenance of a Replica set using stepDown or rs.reconfig methods

In the process of an election, first, one of the nodes will raise a flag requesting an election, and all the other nodes will vote to elect that node as the primary node. The average time for an election process to complete is 12 seconds, assuming that replica configuration settings are in their default values. A major factor that may affect the time for an election to complete is the network latency, and it can cause delays in getting your replica set back to operation with the new primary node.

The replica set cannot process any write operations until the election is completed. However, read operations can be served if read queries are configured to be processed on secondary nodes. MongoDB 3.6 supports compatible connectivity drivers to be configured to retry .

Обновление документов в БД с помощью model.findOneAndUpdate()

В последних версиях mongoose есть методы, упрощающие обновление документов. Но некоторые более продвинутые функции (например, хуки pre/post, валидация) ведут себя по-другому при этом подходе, поэтому классический метод все еще полезен во многих ситуациях.

В файл скопируйте следующий код.

Функция находит пользователя по условию, указанному в первом параметре , затем устанавливает свойства, указанные во втором параметре . Третий параметр в функции указывает на то, чтобы функция возвращала измененный документ, а не оригинал. Т. е. при при установленном в на консоле будет выведено , а при установленном в на консоле будет выведено . По умолчанию установлено в . Четвертый параметр в функции — это функция обратного вызова.

How does MongoDB detect replication lag?

Replication lag is a delay in data being copied to a secondary server after an update on the primary server. Short windows of replication lag are normal, and should be considered in systems that choose to read the eventually-consistent secondary data. Replication lag can also prevent secondary servers from assuming the primary role if the primary goes down.

If you want to check your current replication lag:

In a connected to the primary, call the method.This returns the value for each member, which shows the time when the last oplog entry was written to the secondary server.

Replication lag can be due to several things, including:
Network Latency: Check your ping and traceroute to see if there is a network routing issue or packet loss. See: ping diagonistic documentation, troubleshooting replica sets.
Disk Throughput: Sometimes the secondary server is unable to flush data to disk as rapidly as the primary. This is common on multi-tenant systems, especially if the system accesses disk devices over an IP network. System-level tools, like vmstat or iostat can help you find out more. See: production notes, mongostat.
Concurrency: Long-running operations on the primary can block replications. Set up your write concern so that write operations don’t return if replication can’t keep up with the load. Alternatively, check slow queries and long-running operations via the database profiler. See: Write Concern.
Appropriate Write Concern: If the primary requires a large amount of writes (due to a bulk load operation or a sizable data ingestion), the secondaries may not be able to keep up with the changes on the oplog. Consider setting your write concern to “majority” in order to ensure that large operations are properly replicated.

Check the Size of the Oplog¶

A larger can give a replica set a greater tolerance for
lag, and make the set more resilient.

To check the size of the oplog for a given member,
connect to the member in and run the
method.

The output displays the size of the oplog and the date ranges of the
operations contained in the oplog. In the following example, the oplog
is about 10 MB and is able to fit about 26 hours (94400 seconds) of
operations:

The oplog should be long enough to hold all transactions for the
longest downtime you expect on a secondary. At a minimum, an oplog
should be able to hold minimum 24 hours of operations; however, many
users prefer to have 72 hours or even a week’s work of operations.

For more information on how oplog size affects operations, see:

,
, and
.

Note

You normally want the oplog to be the same size on all
members. If you resize the oplog, resize it on all members.

To change oplog size, see the Change the Size of the Oplog
tutorial.

[]	Starting in MongoDB 4.0, the oplog can grow past its configured size limit to avoid deleting the .

Set Up a Replica Set

In this tutorial, we will convert standalone MongoDB instance to a replica set. To convert to replica set, following are the steps −

Shutdown already running MongoDB server.
Start the MongoDB server by specifying — replSet option. Following is the basic syntax of —replSet −

mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

Example

mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0

It will start a mongod instance with the name rs0, on port 27017.
Now start the command prompt and connect to this mongod instance.
In Mongo client, issue the command rs.initiate() to initiate a new replica set.
To check the replica set configuration, issue the command rs.conf(). To check the status of replica set issue the command rs.status().

Test Connections Between all Members¶

All members of a must be able to connect to every
other member of the set to support replication. Always verify
connections in both «directions.» Networking topologies and firewall
configurations can prevent normal and required connectivity, which can
block replication.

Changed in version 3.6: Starting in MongoDB 3.6, MongoDB binaries, and
, bind to localhost by default. If the
configuration file setting or the
command line option is set for the binary, the binary additionally binds
to the localhost IPv6 address.Previously, starting from MongoDB 2.6, only the binaries from the
official MongoDB RPM (Red Hat, CentOS, Fedora Linux, and derivatives)
and DEB (Debian, Ubuntu, and derivatives) packages bind to localhost by
default.When bound only to the localhost, these MongoDB 3.6 binaries can only
accept connections from clients (including and
other members of your deployment in replica sets and sharded clusters)
that are running on the same machine. Remote clients cannot connect to
the binaries bound only to localhost.To override and bind to other ip addresses, you can use the
configuration file setting or the
command-line option to specify a list of hostnames or ip
addresses.

Warning
Before binding to a non-localhost (e.g. publicly accessible)
IP address, ensure you have secured your cluster from unauthorized
access. For a complete list of security recommendations, see
Security Checklist. At minimum, consider
and
hardening network infrastructure.
For example, the following instance binds to both
the localhost and the hostname , which is
associated with the ip address :

In order to connect to this instance, remote clients must specify
the hostname or its associated ip address :

Consider the following example of a bidirectional test of networking:

Slow Application of Oplog Entries¶

Starting in version 4.2 (also available starting in 4.0.6), secondary
members of a replica set now that take longer than the slow operation
threshold to apply. These slow oplog messages:

Are logged for the secondaries in the
.
Are logged under the component with the text
.
Do not depend on the log levels (either at the system or component
level)
Do not depend on the profiling level.
May be affected by ,
depending on your MongoDB version:
- In MongoDB 4.2 and earlier, these slow oplog entries are not
  affected by the .
  MongoDB logs all slow oplog entries regardless of the sample rate.
- In MongoDB 4.4 and later, these slow oplog entries are affected by
  the .

The profiler does not capture slow oplog entries.

Automatic Failover¶

When a primary does not communicate with the other members of the set
for more than the configured period
(10 seconds by default), an eligible secondary calls for an election
to nominate itself as the new primary. The cluster attempts to
complete the election of a new primary and resume normal operations.

The replica set cannot process write operations
until the election completes successfully. The replica set can continue
to serve read queries if such queries are configured to
while the
primary is offline.

The median time before a cluster elects a new primary should not
typically exceed 12 seconds, assuming default . This includes time required to
mark the primary as and
call and complete an .
You can tune this time period by modifying the
replication configuration
option. Factors such as network latency may extend the time required
for replica set elections to complete, which in turn affects the amount
of time your cluster may operate without a primary. These factors are
dependent on your particular cluster architecture.

Lowering the
replication configuration option from the default (10 seconds)
can result in faster detection of primary failure. However,
the cluster may call elections more frequently due to factors such as
temporary network latency even if the primary is otherwise healthy.
This can result in increased for
write operations.

Your application connection logic should include tolerance for automatic
failovers and the subsequent elections. Starting in MongoDB 3.6, MongoDB drivers
can detect the loss of the primary and automatically
a single time,
providing additional built-in handling of automatic failovers and elections:

MongoDB 4.2+ compatible drivers enable retryable writes by default
MongoDB 4.0 and 3.6-compatible drivers must explicitly enable
retryable writes by including in the .

See for complete documentation on
replica set elections.

To learn more about MongoDB’s failover process, see:

Экземпляр Sharding

Распределение портов структуры фрагмента следующее:

Шаг 1. Запустите Shard Server

Шаг 2: Запустите сервер конфигурации

Примечание. Здесь мы можем запустить его как обычную службу mongodb, не добавляя параметры –shardsvr и configsvr. Поскольку функция этих двух параметров заключается в изменении начального порта, мы можем указать порт самостоятельно.

Шаг 3: начать процесс маршрутизации

Шаг 4. Настройте сегментирование

Затем мы используем MongoDB Shell для входа в mongos и добавляем узел Shard

Шаг 5: Нет необходимости сильно менять программный код, просто подключите базу данных к интерфейсу 40000 точно так же, как подключение к обычной базе данных mongo.

What locks are taken by some common client operations?¶

The following table lists some operations and the types of locks they
use for document level locking storage engines:

Operation	Database	Collection
Issue a query	(Intent Shared)	(Intent Shared)
Insert data	(Intent Exclusive)	(Intent Exclusive)
Remove data	(Intent Exclusive)	(Intent Exclusive)
Update data	(Intent Exclusive)	(Intent Exclusive)
Perform Aggregation	(Intent Shared)	(Intent Shared)
Create an index (Foreground)	(Exclusive)
Create an index (Background)	(Intent Exclusive)	(Intent Exclusive)
List collections	(Intent Shared) Changed in version 4.0.
Map-reduce	(Exclusive) and (Shared)	(Intent Exclusive) and (Intent Shared)