Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Data Nodes

A data node is an appliance that you can add to your event and flow processors to increase storage capacity and improve search performance. You can add an unlimited number of data nodes to your JSA deployment, and they can be added at any time. Each data node can be connected to only one processor, but a processor can support multiple data nodes.

For more information about planning your deployment, see the Juniper Secure Analytics Architecture and Deployment Guide.

Data Rebalancing After a Data Node is Added

When you add a data node, JSA rebalances the data to improve search and overall system performance.

Data rebalancing includes decompressing older data, and moving data that was on the original storage device to evenly distribute it across all connected devices.

For example, your deployment has an event processor that receives 20,000 events per second (EPS). When you add data nodes, JSA automatically distributes the events across the event processor and all data nodes that are available to it. If you add three data nodes, the event processor stores 5,000 EPS and sends 5,000 EPS to each of the attached data nodes. The event processor is still processing all of the events, but the data nodes provide more storage, indexing, and search capabilities to improve the overall performance.

How does rebalancing work?

Cluster members consist of one event processor and one or more data nodes. Data can move between any members of the cluster in any direction. Data moves between members of the cluster transactionally by hourly folders. One hour of data is the smallest block of data that moves. If any file from an hourly folder is not copied, the entire transaction is rolled back.

Rebalancing does not merge hourly folders. For example, if an hourly folder exists on the destination, rebalancing does not move data from the same hourly folder from other members of the cluster. Before rebalancing starts, the cluster determines its target. The target is the percentage of free space that rebalancing tries to achieve on all members of the cluster. The target doesn't account for absolute free space in gigabytes, it accounts only for the percentage.

Members that have a higher percentage of free space are targets. After the cluster determines its target, the members that have a smaller percentage of free space than the target become sources. Each source connects, and pushes data, to each destination. Some components in your JSA deployment might restart and cause the rebalancing process to fail. Rebalancing restarts itself and continues from where it failed to completion. When rebalancing restarts, it does so with a progressively increasing timeout period (5 minutes, 10 minutes, 30 minutes, and so on) to avoid too many failed attempts during full deployment or maintenance. Whole rebalancing concludes between Ariel processes on members of the cluster.

How does scattering work?

Scattering distributes incoming data from the event processor among all members of the cluster. Scattering works with events and flows and is not bound to the smallest hourly block. For example, one hour of events is scattered across all clusters into the same hourly folder.

Scattering distributes events and flows proportionally to the amount of free space in percentage on the member of the cluster. Scattering moves data sequentially to the cluster hosts in round-robin fashion according to the free space percentage.

If any errors or connectivity issues occur, scattering tries to move the data to the next member of the cluster. If it is unsuccessful, it stores data locally on the event processor so that no data is lost. Data is scattered between the ecs-ep process (source) and multiple data node processes (destinations) on the data node.

How is existing data moved between the event processor (source) and the data node (target)?

When you add a data node, JSA calculates a target space. The target space is the amount of free space on the event processor, plus the amount of free space on the data nodes, divided by the total amount of event processors and data nodes. For example, you have one event processor and two data nodes. If the event processor has 60% free space and both data nodes have 100% free space, the target space is 86.6% (60 + 100 + 100 / 3). When the target is defined, the data is moved in one hour blocks at a time until the target space is reached (86.6% in this example) on any cluster hosts.

How is new data moved between the event processor (source) and the data node (target)?

When the initial balancing is complete, JSA scatters new data across the event processors and data nodes, according to the amount of free space available. For example, if an event processor has 25% free space and a data node has 40% free space, the data node receives 40 events, while the EP receives 25 events until both appliances have approximately the same amount of free space.

When is balancing complete?

The balancing process is complete when all source data is processed, or when the target space constraints are reached.

Viewing the Progress Of Data Rebalancing

When you add a data node, JSA automatically redistributes the data to balance it across the storage volumes in your deployment.

Search performance improvements are realized only after the data rebalancing is complete. You can view the progress of the data rebalancing, and also see data such as the percentage of disk space that is used.

  1. On the navigation menu (), click Admin.

  2. In the System Configuration section, click System and License Management.

  3. In the Display list, select Systems.

  4. In the host table, select the managed host that you want to view more information about.

    • To view information about the cluster of managed hosts, select the top-level host.

    • To view information about a specific data node, select the data node.

  5. On the Actions menu, click View and manage system.

  6. Click the Security Data Distribution tab to view the progress of data rebalancing and the capacity of the Data Node appliance.

    Note:

    You can also view information about the progress of data node rebalancing in the deployment status bar on the Admin tab.

Saving All Event Data to a Data Node Appliance

To improve the performance of an event processor, configure JSA to save all event data on a Data Node appliance. With this configuration, the event processor only processes events; it doesn't store event data locally.

An event processor that is configured to only process events still saves event data locally when no active Data Node appliances are available. When a Data Node appliance becomes available, JSA transfers as much data as possible from the event processor to the Data Node.

  1. On the navigation menu (), click Admin.

  2. In the System Configuration section, click System and License Management.

  3. In the Display list, select Systems.

  4. Select the Event Processor from the host table, and on the Deployment Actions menu, click Edit Host.

  5. Click the Component Management settings icon ().

  6. Under Event Processor, in the Event Processor Mode field, select Processing-Only.

  7. Click Save, and then click Save again.

  8. On the Admin tab, click Deploy Changes.

Archiving Data Node Content

Configure a Data Node appliance to use Archive mode when you want the Data Node to provide online access to historical data without impacting storage for incoming data.

In Archive mode, the appliance does not receive new data, but existing data is saved.

Note:

No event retention policies are applied on the Data Node appliance in Archive mode.

  1. On the navigation menu (), click Admin.

  2. In the System Configuration section, click System and License Management.

  3. In the Display list, select Systems.

  4. Select the Data Node appliance in the host table, and on the Deployment Actions menu, click Edit Host.

  5. Click the Component Management settings icon ().

  6. In the Data Node Mode field, select Archive, and then click Save.

  7. On the Admin tab, click Deploy Changes.

To resume storing data on the Data Node appliance, set the mode back to Active.