Skip to main content
Version: Next

Data Replication With ReductStore

Data replication is a process of copying data from one database to another. ReductStore provides simple and efficient append-only replication to stream data from one bucket to another one.

Concepts

The data replication in ReductStore is based on the concept of a Replication Task. A replication task is a configurable thread that filters and copies records from a source bucket to a target bucket. The target bucket can belong to the same or a different ReductStore instance. For more information on buckets, see the Buckets guide.

Data Replication with ReductStore

Once a replication task is created, a ReductStore instance starts a new thread that waits for new records in the source bucket. When a new record arrives, the HTTP frontend stores the record in the source bucket and registers it in a transaction log. The replication task periodically checks the transaction log for new records and replicates them to the target bucket. For efficiency, the replication task replicates multiple records in a single batch. Once the record has been successfully replicated, the replication task deletes the record from the transaction log. This approach ensures that data is replicated in real time and that the replication process is fault-tolerant and can recover from failures.

info

The replication engine only replicates new records written or updated to the source bucket after the replication task is created. It doesn't replicate deletions or existing records in the source bucket.

Conditional Replication

A replication task can filter records before replicating them to the target bucket. You can specify the following filters:

ParameterDescriptionType
entriesA list of entries that the replication task will use to filter records. Only records with these entries will be replicated. If the list is empty, all records will be replicated. You can use the * wildcard to match any entry.List of strings
whenA set of conditions that the record must meet to be replicated. The conditions are based on the record labels. For more information on conditional queries, see the Conditional Query ReferenceJSON-like object
each_sReplicate a record every S secondsFloat
each_nReplicate only every N recordInteger

Usage Example

Data replication may seem complex, but it is actually quite simple. Let's take a simple example:

Imagine we collect high frequency vibration sensor data from an engine in the sensor-data bucket. The data from each sensor is stored in a separate record. We want to replicate only the data from the sensor-1 entry to the remote-data bucket in another ReductStore instance. However, we only want to replicate the records if the engine is working and the sensor data is not corrupted. In this case, the conditional replication settings will be:

entries: ["sensor-1"]
where: { "&rms": { "$gt": 2.0 }, "&quality": { "$eq": "ok" } }

See the next section for more information on how to create a replication task with conditional replication settings.

Managing Data Replication Tasks

Here you will find examples of how to create, list, retrieve, update, and delete replication tasks using the ReductStore SDKs, REST API, CLI and Web Console.

Pay attention that all the examples are written for a local ReductStore instance available at http://127.0.0.1:8383 with API token my-token.

For more information on setting up a local ReductStore instance, see the Getting Started guide.

Creating a Replication Task

To spin up a new replication task, you must provide the following information:

  • Source Bucket: The name of the bucket in the source database from which data will be replicated.
  • Remote Bucket: The name of the bucket in the target database to which data will be replicated.
  • Remote URL: The URL of the target database.
  • Remote Token: The API token of the target database.
  • Filter Settings: See the Conditional Replication section for more information.

Let's create a replication task that replicates all records from the source-bucket to the remote-bucket by using the ReductStore SDKs, REST API, CLI and Web Console. You can also provision a Replication Task by using environment variables.

info

A created replication task replicates only new records written to the source bucket after the task is created. It doesn't replicate existing records in the source bucket. However, you can manually replicate existing records using the Manual Data Replication feature.

reduct-cli alias add local -L http://localhost:8383 -t "my-token"
# Create a source bucket
reduct-cli bucket create local/src-bucket
# Create a replication between the source bucket and the demo bucket at https://play.reduct.store
reduct-cli replica create local/my-replication src-bucket https://demo@play.reduct.store/demo

Browse Replication Tasks

You can list all replication tasks and get detailed information about a specific replication task using the ReductStore SDKs, REST API, CLI and Web Console. The detailed information includes status, current settings and statistics of the replication task:

  • Status: The status of the replication task. It can be Active or Inactive. Inactive replication tasks are paused and don't replicate data usually because the target database is unreachable.
  • Provisioned: Whether the replication task is provisioned or not. Provisioned replication tasks are created using environment variables.
  • Number of Pending Records: The number of records that are waiting to be replicated.
  • Number of Failed Records: The number of records that failed to be replicated for in the last hour
  • Number of Replicated Records: The number of records that were successfully replicated for the last hour
  • Error List: A list of errors that occurred during the replication process for the last hour
note

For the first hour, the Number of Failed Records and Number of Replicated Records are interpolated.

reduct-cli alias add local -L http://localhost:8383 -t "my-token"
# List all replications
reduct-cli replica ls local --full
# Browse a specific replication
reduct-cli replica show local/example-replication

Removing a Replication Task

You can remove a replication task by using the ReductStore SDKs, REST API, CLI and Web Console. Once you remove a replication task, the replication process stops immediately, and the transaction log is deleted from the database.

info

You can't remove a provisioned replication task. Before removing it, you need to unset the corresponding environment variables and restart the ReductStore instance.

reduct-cli alias add local -L http://localhost:8383 -t "my-token"
reduct-cli replica rm local/repl-to-remove --yes

Manual Data Replication

You can also manually replicate data if you need to copy specific time periods or records from one bucket to another. To do this, you can use ReducerCLI's cp command. Here we'll copy all records from the src-instance/example-bucket to the dst-instance/demo bucket that have the anomaly=true label and do not have the status=ok label.

reduct-cli alias add src-instance -L http://localhost:8383 -t my-token
reduct-cli alias add dst-instance -L https://play.reduct.store -t reductstore
reduct-cli cp src-instance/example-bucket dst-instance/demo --when '{
"&status": {"$ne": "ok"},
"&anomaly": {"$eq": true}
}'