Translate

Thursday, March 11, 2021

Implement a Distributed Database to Your Java Application

 Source: https://dzone.com/articles/an-introduction-to-interference-implement-a-distri

by Sam Brown  · Mar 01, 21 · Java Zone · Analysis

Brief Description

i.o.cluster, also known as interference open cluster, is a simple Java framework that enables you to launch a distributed database service within your Java application, using JPA-like interface and annotations for structure mapping and data operations. This software inherits its name from the interference project, within which its mechanisms were developed.

 i.o.cluster is a open source, pure Java software.

The basic unit of the i.o.cluster service is a node – it can be a standalone running service, or a service running within some Java application.

Each i.o.cluster node has own persistent storage and can considered and used as a local database with the following basic features:

  • operates with simple objects (POJOs).
  • uses base JPA annotations (@Table@Column@Transient@Index@GeneratedValue) for object mapping directly to persistent storage.
  • supports transactions.
  • supports SQL queries with READ COMMITTED isolation level.
  • uses persistent indices for fast access to data and increase performance of SQL joins.
  • allows flexible management of in-memory data for stable operation of the node in any ratio of storage size/available memory.

 Each of the nodes includes several mechanisms that ensure its operation:

  • core algorithms (supports structured persistent storage, supports indices, custom serialization, heap management, local and distributed sync processes).
  • SQL and CEP processor.
  • event transport, which is used to exchange messages between nodes, as well as between a node and a client application.

Nodes can be joined into a cluster, at the cluster level with inter-node interactions, we get the following features: 

  • allows you to insert data and run SQL queries from any node included in the cluster.
  • support of horizontal scaling SQL queries.
  • support of transparent cluster-level transactions.
  • support of complex event processing (CEP) and simple streaming SQL.
  • i.o.cluster nodes does not require the launch of any additional coordinators. 

i.o.cluster implements the most simple data management model, which is based on several standard JPA-like methods of the Session object: 

  • persist() - placing an object in a storage.
  • find() - find an object by a unique identifier.
  • execute() - execution of a SQL query.
  • commit() - committing a transaction.
  • rollback() - rollback a transaction.

As well, i.o.cluster software includes a remote client that provides the ability to remotely connect to any of the cluster nodes using internal event transport and execute standard JPA-like commands (persist, find, execute, commit, rollback).

Distributed Persistent Model 

The interference cluster is a decentralized system. This means that the cluster does not use any coordination nodes; instead, each node follows a set of formal rules of behavior that guarantee the integrity and availability of data within a certain interaction framework.

Within the framework of these rules, all nodes of the interference cluster are equivalent. This means that there is no separation in the master and slave nodes system — changes to user tables can be made from any node. Also, all changes are replicated to all nodes, regardless of which node they were made on.

Talking about transactions, running commit in a local user session automatically ensures that the changed data is visible on all cluster nodes.

To include a node in the cluster, you must specify the full list of cluster nodes (excluding this current one). 

Node Configuration Parameters

The minimum number of cluster nodes is 2, and the maximum is 64.

After configuration, we may start all configured nodes as clusters in any order. All nodes will be using specific messages (events) to provide inter-node data consistency and horizontal-scaling queries.

Rules of Distribution

  • All cluster nodes are equivalent.
  • All changes on any of the nodes are mapped to other nodes.
  • If replication is not possible (the node is unavailable or the connection is broken), a persistent change queue is created for this node.
  • The owner of any data frame is the node on which this frame has been allocated.
  • The system uses the generation of unique identifiers for entities (@DistributedId) so that the identifier is unique within the cluster and not just within the same node.
  • Data inserts are performed strictly in local structures, and then replicated changes (update/delete) can be performed only on the node-owner of the data frame with this record. 

SQL Horizontal-Scaling Queries

All SQL queries called on any cluster nodes will be automatically distributed among the cluster nodes for parallel processing. Such a decision is made by the node based on the analysis of the volume of tasks (the volume of the query tables is large enough, etc.)

If a node is unavailable during the processing of a request (network fails, service stopped), the task distributed for this node will be automatically rescheduled to another available node.

Complex Event Processing

Interference supports complex event processing using the SELECT STREAM clause in the SQL statement. SELECT STREAM query supports three modes of CEP: 

  • Events are processed as is, without any aggregations.
  • Events are aggregated by column value with using any of group functions (tumbling window).
  • Some window aggregates events for every new record (sliding window).

The basic differences between a streaming query and the usual one are as follows:

  • The execute() method returns a StreamQueue object.
  • The request is executed asynchronously until StreamQueue.stop() method will be called or until the application terminates.
  • The StreamQueue.poll() method returns all records previously inserted into the table and according to the WHERE condition (if exist) and continues to return newly added records.
  • Each StreamQueue.poll() method always returns the next record after the last polled position within the session, so that, provided that the SQL request is stopped and called again within the same session, data retrieve was continued from the last fixed position, in another session data will be retrieved from begin of the table.
  • Unlike usual, a streaming request does not support transactions and always returns actually inserted rows, regardless of the use of the commit() method in a session inserting data (DIRTY READS). 

Further Information

GitHub: https://github.com/interference-project/interference

More info: http://io.digital