Data Node Overview
Understand how to use Data Nodes in your Juniper Secure Analytics (JSA) deployment.
Data Nodes enable new and existing JSA deployments to add storage and processing capacity on demand as required.
Users can scale storage and processing power independently of data collection, which results in a deployment that has the appropriate storage and processing capacity. Data Nodes are plug-n-play and can be added to a deployment at any time. Data Nodes seamlessly integrate with the existing deployment.
Increasing data volumes in deployments require data compression sooner. Data compression slows down system performance as the system must decompress queried data before analysis is possible. Adding Data Node appliances to a deployment allows you to keep data uncompressed longer.
The JSA deployment distributes all new data across the Event and Flow processors and the attached Data Nodes. Figure 1 shows the JSA deployment before and after adding Data Node appliances.
Data Nodes add storage capacity to a deployment, and also improve performance by distributing data collected on a processor across multiple storage volumes. When the data is searched, multiple hosts, or a cluster, do the search. The cluster can greatly improve search performance, but do not require the addition of multiple event processors. Data Nodes multiply the storage for each processor.
You can connect a Data Node to only one processor at a time, but a processor can support multiple data nodes.
Data Nodes are available on JSA 2014.2 and later.
Data Nodes perform similar search and analytic functions as Event and Flow processors in a JSA deployment. Operations on a cluster are affected by the slowest member of a cluster. Data Node system performance improves if Data Nodes are sized similarly to the event and flow processors in a deployment. To facilitate similar sizing between Data Nodes and event and flow processors, Data Nodes are available on core appliances.
Data Nodes can be installed as VM or on JSA appliances. You can mix these in a single deployment.
Bandwidth and latency
Ensure a 1 GBps link and less than 10 ms between hosts in the cluster. Searches that yield many results require more bandwidth.
Data Nodes are compatible with all existing JSA appliances that have an Event or Flow Processor component, including All-In-One appliances.
Data Nodes support high-availability (HA).
Data Nodes use standard TCP/IP networking, and do not require proprietary or specialized interconnect hardware. Install each Data Node that you want to add to your deployment as you would install any other JSA appliance. Associate Data Nodes with event or flow processors in the JSA Deployment Editor. See Juniper Secure Analytics Administration Guide.
You can attach multiple Data Nodes to a single Event or Flow Processor, in a many-to-one configuration.
When you deploy high availability pairs with Data Node appliances, install, deploy and rebalance data with the high availability appliances before synchronizing the high availability pair. The combined effect of the data rebalancing, and the replication process utilized for high availability results in significant performance degradation. If high availability is present on the existing appliances to which Data Nodes are being introduced, it is also preferable that the high availability connection be broken and reestablished once the rebalance of the cluster is completed.
Remove Data Nodes from your deployment with the Deployment Editor, as with any other JSA appliance. Decommissioning does not erase balanced data on the host. You can retrieve the data for archiving and redistribution.
Adding a Data Node to a cluster distributes data evenly to each Data Node. Each Data Node appliance maintains the same percentage of available space. New Data Nodes added to a cluster initiate additional rebalancing from cluster event and flow processors to achieve efficient disk usage on the newly added Data Node appliances.
Starting in JSA 2014.3, data rebalancing is automatic and concurrent with other cluster activity, such as queries and data collection. No downtime is experienced during data rebalancing.
Data Nodes offer no performance improvement in the cluster until data rebalancing is complete. Rebalancing can cause minor performance degradation during search operations, but data collection and processing continue unaffected.
Management and Operations
Data Nodes are self-managed and require no regular user intervention to maintain normal operation. JSA manages activities, such as data backups, high availability and retention policies, for all hosts, including Data Node appliances.
If a Data Node fails, the remaining members of the cluster continue to process data.
When the failed Data Node returns to service, data balancing resumes. During the downtime, data on the failed Data Node is unavailable.
For catastrophic failures requiring appliance replacement or the reinstallation of JSA, decommission Data Nodes from the deployment and replace them using standard installation steps. Copy any data not lost in the failure to the new Data Node before deploying. The rebalancing algorithm accounts for old data and shuffles only data collected during the failure.
For Data Nodes deployed with an high availability pair, a hardware failure causes a failover, and operations continue to function normally.
For more information about each component, see the Juniper Secure Analytics Administration Guide.