Known Issues

This section lists the known issues in Juniper Routing Director.

Device Life-Cycle Management

Changing the router ID of a device after onboarding might create a duplicate node in the topology.

Workaround: If you prefer to change the router ID, you must offboard the device, update the router ID in the configuration, and then onboard the device again.
When a vMX is deployed using I2C ID 161, all commit operations will fail after any subscriber is created.

Workaround: Delete the subscribers and then commit the configuration.
Routing Director triggers the configuration templates included in a device profile and interface profile only during the initial onboarding of the device. You cannot use the configuration templates included in the device profiles and interface profiles to apply additional configuration on a device after the device is onboarded.

Workaround: If you need to apply additional configuration on a device after the device is onboarded, you need to manually apply the configuration using the CLI or by executing the configuration templates through the Routing Director GUI.

Observability

When KPIs continuously oscillate between fixed values in a repeating pattern, the boundary initially adapts as expected, but after a few hours, it begins to readjust even though the oscillation pattern remains unchanged. This behavior can persist even after full adaptation, causing the boundary to continue oscillating unnecessarily.

Workaround: None.
Sometimes, continuous packet drops that happen under some specific trap codes may not be classified as a blackhole.

Workaround: None.
The SPFRun anomaly might not get detected when both levels are enabled in IS-IS.

This issue does not apply to devices where only one level is enabled in IS-IS. A level can be disabled in ISIS using the set protocol isis level <level-number> disable command.

Workaround: None.
In some cases, when you offboard a device from Routing Director, the BMP configuration on the router may not be removed. This causes any further attempts at onboarding and offboarding devices fail, and leads to inconsistent entries in the Routing Explorer > Devices and Routing Explorer > Adjacencies tables.

Workaround: After offboarding devices from Routing Director, you may have to completely deactivate the BMP configuration on the affected router by deactivating the application of group paragon-routing-bgp-analytics and then onboard the router.
When a new unmonitored device is added to the ISIS database for the first time, it takes around 10 minutes for its prefixes to be reflected in the IGP heatmap.

Workaround: Wait for approximately 10 minutes for the changes to be reflected.
In setups with parallel links between two nodes, adjacency or link flaps are reported only on a complete loss or restoration of connectivity between the nodes, rather than on individual link transitions.

Workaround: None
There is an unexpected delay in reporting an anomaly related to a sudden decrease in nodes. The anomaly is reflected only after the total delay period has passed.

Workaround: You can use the REST API to view information related to this anomaly.
The Device Count column on the IGP Prefixes tab (Observability > Routing > Route Topology) might display inaccurate data. It takes approximately 30 minutes for the device count to be updated when a new device starts originating the prefix or when a device stops originating the prefix.

Workaround. Device count may not immediately reflect recent changes and may be inaccurate for up to 30 minutes. We recommend that you wait for approximately 30 minutes to get the latest device count.
When the responses are delayed, the LLM Connector chat window auto-scrolls up and down unexpectedly.

Workaround: None.
The Input Traffic boundary is incorrectly displayed for Optical Rx power and Input Traffic KPIs. This issue occurs when you onboard a device and then stream data at a later point.

Workaround: If the data for both Optical Rx power and Input Traffic KPIs start around the same time and are non-zero, then the forecasted boundary also begins at the same time. We recommend onboarding a device with streaming data.
After you upgrade to Release 2.5.0, you might notice that the following pages might take more than 10 minutes to display device-related information:
1. Inventory page (Observability > Troubleshoot devices > Device-Name)
2. Devices tab on the Topology page (Observability > Topology)
Workaround: After you upgrade, restart services using the kubectl rollout restart -n streams deployment/papi-mon-v2 command.
After a device is onboarded, Routing Director continuously monitors the KPIs related to device health. For each KPI, Routing Director monitors the KPI, forecasts the range, and detects any anomalies that occur. If a KPI value changes, the forecasted range takes approximately two hours to stabilize.

Workaround: None.
If you try to create an LSP using the REST API and if you are reusing an existing LSP name, then the REST API server does not return an error.

Workaround: None.

While adding a device profile for a network implementation plan, if you enable Routing Protocol Analytics then the routing data is collected for the devices listed in the device profile. When you publish the network implementation plan, even though the onboarding workflow appears to be successful there might be errors related to the collection of routing data for these devices. Because of these errors, the devices will not be configured to send data to Routing Director and therefore the routing data will not be displayed on Route Explorer page of the Routing Director GUI. This issue occurs while offboarding devices as well, where the offboarded devices continue to send data to Routing Director.

This issue also occurs when you have not configured ASN or Router ID on the devices, or when you have locked device configuration for exclusive editing.

Workaround: To fix this issue:

Do one of the following:

Check the service logs by running the request paragon debug logs namespace routingbot app routingbot service routingbot-apiserver Shell command. Take the necessary action based on the error messages that you see in Table 1.

Table 1: Error Messages
Error Messages	Issue
Failed to get device profile info for dev_id {dev_id}: {res.status_code} - {res.text} Failed to get device info for dev_id {dev['dev_id']}. Skipping device.	The API call to PAPI to get the device information has failed.
No results found in the response for dev_id {dev_id} Failed to get device info for dev_id {dev['dev_id']}. Skipping device.	The API call to PAPI returns a response with no data.
Complete device info not found in the response for dev_id {dev_id} : {device_info}	The API call to PAPI returns a response with incomplete data.
No data found for dev_id {dev_id} from PF	The API call to Pathfinder to get the device information has failed.
Required data not found for dev_id {dev_id} from PF data:{node_data}	The API call to Pathfinder to get device information returns a response with incomplete data.
EMS config failed with error, for config: {cfg_data} or EMS Config push error {res} {res.text} \| try: {retries}. Failed to configure BMP on device {mac_id}	BGP configuration has failed.
Invalid format for major, minor, or release version : {os_version}	The device's OS version is not supported.
Error POST {self.config_server_path}/api/v2/config/device/{dev_id}/ {data} {res.json()}	Playbook application has failed.
Error PUT:{self.config_server_path}/api/v2/config/device/{dev_id}/ {data} {res_put.json()}	Playbook removal has failed.
Error PUT:{self.config_server_path}/api/v2/config/device/{dev_id}/ {data} {res_put.json()}	Device or playbook application to device-group has failed.
Error PUT {self.config_server_path}/api/v2/config/device-group/{site_id}/ {data} {res_put.json()}	Device or playbook removal from device-group has failed.

Examine the device configuration to check whether the device shows unexpected absence or presence of the configuration. For example, you can,
- View the configurations present under set groups paragon-routing-bgp-analytics routing-options bmp.
- Check the device configuration in the JTIMON pod.

After resolving the above issues, edit the device profile of the network implementation plan that you have applied for the device. Based on whether you are onboarding or offboarding devices, enable or disable the Routing Protocol Analytics option in the device profile.
Publish the network implementation plan.
Verify whether the required results are seen based on the data that is displayed on the Route Explorer page of the Routing Director GUI.

On the Interfaces accordion, FEC uncorrected errors charts are available only on interfaces that support speeds equal to or greater than 100-Gbps.
After you apply a new configuration for a device, the Active Configuration for Device-Name page (Observability> Troubleshoot Device > Device-Name > Configuration accordion > View active config link) does not display the latest configuration immediately. It takes several minutes for the latest changes to be reflected on the Active Configuration for Device-Name page.

Workaround: You can verify whether the new configurations are applied to the device by logging in to the device using CLI.
The number of unhealthy devices listed on the Troubleshoot Devices and Health Dashboard pages (Observability > Health) do not match.

Workaround: None.
You cannot delete unwanted nodes and links from the Routing Director GUI.

Workaround: Use the following REST APIs to delete nodes and links:
- REST API to delete a link:
  
  [DELETE] https://{{server_ip}}/topology/api/v1/orgs/{{org_id}}/{{topo_id}}/links/{{link_id}}
  
  Note:
  You can follow the steps described here to get the actual URL.
  
  For example,
  - URL: 'https://10.56.3.16/topology/api/v1/orgs/f9e9235b-37f1-43e7-9153-e88350ed1e15/10/links/15'
  - Curl:
- REST API to delete a node:
  
  [DELETE] https:// {{Server_IP}}/topology/api/v1/orgs/{{Org_ID}}/{{Topo_ID}}/nodes/{{Node_ID}}
  
  Note:
  You can follow the steps described here to get the actual URL.
  
  For examples,
  - URL: ' https://10.56.3.16/topology/api/v1/orgs/f9e9235b-37f1-43e7-9153-e88350ed1e15/10/nodes/1'
  - Curl:
  Use the following procedure to get the actual URL that you use in CURL for deleting a link or a node:
  1. Navigate to the Topology page (Observability > Topology).
  2. Open the developer tool in the browser by using the CTRL + Shift + I buttons in the keyboard.
  3. In the developers tool, select Network and select the XHR filter option.
  4. Identify the link index number or node number. To identify the link index number to the node number:
    1. On the Topology page of the Routing Director GUI, double click the link or the node that you want to delete.
      
      The Link Link-Name page or the Node Node-Name page appears.
    2. Navigate to the Details tab and note the link index number or the node number that is displayed.
  5. In the developers tool, select and click the row based on the link index number or the node number that is related to the link or the node that you want to delete.
  6. Copy the URL that you need to use to delete the link or node in CURL.

Not all optics modules support all the optics-related KPIs. See Table 2 for more information.

Workaround: None.

Table 2: KPIs Supported for Optics Modules
Module	Rx Loss of Signal KPI	Tx Loss of Signal KPI	Laser Disabled KPI
SFP optics	No	No	No
CFP optics	Yes	No	No
CFP_LH_ACO optics	Yes	No	No
QSFP optics	Yes	Yes	Yes
CXP optics	Yes	Yes	No
XFP optics	No	No	No

For PTX100002 devices, the following issues are observed on the Interface accordion (Observability > Health > Troubleshoot Devices > Device-Name > Overview):
- On the Pluggables Details for Device-Name page (Interfaces accordion > Pluggables data-link), the Optical Tx Power and Optical Rx Power graphs do not display any data.
- On the Input Traffic Details for Device-Name page (Interfaces accordion > Input Traffic data-link), the Signal Functionality graph does not display any data.

Service Orchestration

The Logical tunnel (LT) interface configuration is not persisting through Device Profile Placement Resources.

Workaround: Configure the LT interface either through Logical Tunnels in the Placement Resources section of the Add Device page (Inventory > Device Onboarding > Network Implementation Plan) page or through Topo resources in the Resource Instances page (Orchestration > Resource Instances).
When there are two VLAN-aware tagged L2VPN services, in placement section of an L3VPN service with Integrated Routing and Bridging (IRB) interfaces, both L2VPN instances are not listed.

Workaround: Click the Update Placements option twice and then check for L2VPN instances. If the L2VPN instances are still not listed, then click Reset Placements to remove existing placement configurations for the service and then click Update Placements.

instance.
On the Resource Instances page (Orchestration > Service > Resource Instances), the network-operator:topo resource is a system-managed resource. As a result, the Workflow Run ID column may be empty when the system generates the resources. The Workflow Run ID column is set only if you click the Update button.

Workaround: None.
After the placement of a logical tunnel using either the Update Placements option or through successful provisioning, if you try to assign a possible placement again using the Update Placements option, then all Logical Tunnel interfaces and ranges are not displayed on the Logical Tunnel Placement section

Workaround: Delete the existing placement configurations using the Reset Placements option and then use the Update Placements option. The Logical Tunnel Placement section will display all Logical Tunnel interfaces and ranges.
The description for the interfaces is missing in the Description column of the Add or Edit Devices section of the Add Network Implementation Plan page (Inventory > Device Onboarding > Network Implementation Plan > +).

Note that the description for sub-units will be the same as that of the main interface description. If the descriptions for sub-units et-0/0/9.100 and et-0/0/9.200 are missing, you can refer to the description of the main interface, et-0/0/9.

Workaround: None.
When you stitch a Layer 2 circuit with an L3VPN service, the Input Traffic Rate and Output Traffic Rate columns on the logical Interfaces accordion of the Passive Assurance tab (Orchestration > Instances > Service-Instance-Name > Service-Instance-Name Details) are blank.

Workaround: None.
The traffic statistics are not displayed on the Passive Assurance tab (Orchestration > Instances > Service-Instance-Name > Service-Instance-Name Details) for an L3VPN service with Integrated Routing and Bridging (IRB) interfaces.

Workaround: None.
In rare high-load scenarios, provisioning of an L2VPN instance may fail due to the unavailability of temporary back-end resources.

Workaround: Try provisioning the service again.
Although the qinq option is listed for the Tag Type field, we do not support qinq in VLAN-aware tagged L2VPN services. We recommend that you do not select the qinq option as the L3VPN provisioning might fail with a validation error.

Workaround: None.
While provisioning multihoming for an EVPN service, if multiple cvlan_ids are provided for the same site network access, then the same ESI ID is allocated to all the access circuits provisioned for that site network access.

Workaround: To ensure a different ESI ID is allocated per VLAN, create a unique site network access for each VLAN, and assign a unique group ID for each site network access.
When a user with an administrator role clears the explicitly configured link speed on a physical interface within an EVPN LAG, the speed resets to 10 Mbps. Currently, only explicit speed values are supported. For help in removing the default value, contact Juniper Networks® Technical Assistance Center (JTAC).

Workaround: None.
In a scaled environment when you provision multiple VPN instances parallelly, some of the instances may fail with the following error in the workflow:
Workaround: Retry provisioning for the failed VPN instances.
Before you upgrade an L2VPN service instance that was created in a release earlier to Release 2.5.0, ensure that you update the vpn-resources instance to Release 2.5.0.

Workaround: None.
If different L3VPN services are running on the same IFD using different MTU values, then service provisioning fails.

Workaround: Ensure that the MTU values are the same for L3VPN services that share the same IFD.
The following accordions on the Passive Assurance tab (Orchestration > Instances > Service-Order-Name Details) displays incorrect or no data:
- BGP accordion—The VPN State column displays incorrect data for customer edge (CE) or provider edge (PE) devices with IPv4 or IPv6 neighbors.
- OSPF accordion—There are no IPv6 entries in the Neighbor Address column for CE or PE devices with IPv6 neighbors.
- L3VPN accordion—The VPN State column displays incorrect data for OSPF and BGP protocols. The Neighbor Session and VPN State columns are blank for CE or PE devices with static IPv4 or IPv6 address.
This issue occurs only for an L3VPN service.

Workaround: None.
For an MX 240 device, the OSPF-related data is not populated on the Passive Assurance tab (Orchestration > Instances > Service-Order-Name Details).

Workaround: Configure OSPF on the customer edge (CE) device.
When you click the Refresh icon on the Service-Instance-Name Details page (Orchestration > Instances > Service-Instance-Name), you may not see the latest events in the Relevant Events section.

Workaround: To view the latest events, instead of using the Refresh icon go to the Service Instance page (Orchestration > Instances) and select the service instance for which you need to see the latest events.
The Order History tab on the L3VPN-Name Details page (Orchestration > Instances > Service-Instance-Name hyperlink) lists all the order history if you deprovision a service instance and later provision a service using the same details as that of the deprovisioned service.

Workaround: None.
In a scaled setup, you cannot upgrade service designs in bulk.

Workaround: We recommend that you upgrade only one service design at a time.

Active Assurance

During the installation of Test Agent on the JunOS EVO (such as ACX) devices, there can be few scenario where the installation may get stuck in CONFIGURING state (due to slowness in router or some environment issue) for more than 10-15 minutes.

Workaround: We recommend that you do a hard delete of Test Agent and do a fresh creation or installation of Test Agent on the device. You can delete Test Agents in the following ways:
- Use the GUI to delete Test Agent. Navigate to Test Agents page (Inventory > Active Assurance) and delete the Test Agent.
- Use the following REST API:
In some rare cases, only Test Agents that are in an offline state and due for a plug-in upgrade are upgraded. The plug-in upgrade may not happen for Test Agents that are online and due for a plug-in upgrade.

Workaround: Changing the active version of one of the plug-ins in the system (not necessarily the same plug-in or in the same organization) will make any pending upgrades to be revisited, causing the upgrade to continue for any online Test Agent pending to be upgraded. You can do this in one of the following ways:
- You can use the Plugin Inventory page (Inventory > Active Assurance) to change the Active version of a plug-in back and forth between two plug-in versions.
- Or, alternatively, use the API to re-enable the same plug-in version.
  1. Copy the ID of the plug-in version from the Plugin Inventory page.
  2. Run the following request to re-enable the same plug-in:
The Metrics graph shows No Data for a Test that includes a Step with Measurements if:
- The Test uses self-governed plugin.
- If you click a Stream that produces Metrics while the Test is executing.
This issue occurs if you set the same start time and end time.

Workaround: Ensure that you manually set the Custom Time Range to something meaningful instead. Once the Test execution is complete, the Metrics are shown correctly.
When you restore a Routing Director instance, you might notice that some data such as Active Assurance Plugins and Packet Capture files may not be backed up. This is because backups are not done on any Kubernetes volumes.

Workaround: We recommend that you download Packet capture files before you restore an instance and store them locally to analyze them. In case of Active Assurance Plug-ins, we recommend that you use the Plugin Inventory page (Inventory > Active Assurance) to upload the latest Plugin again on the new (restored) instance.
After you take a backup and restore a Routing Director instance, some Test Agents might incorrectly display status as Online.

Workaround: After the system is fully restored, perform the following steps:
1. Restart test-agent-gateway one additional time to refresh Test Agents.
2. Execute the kubectl -n paa delete pod -l app=paa-test-agent-gateway command from the Linux root shell.
After you perform the Undelete operation, the commit operation for the Test Agent fails. This issue occurs regardless of the interface involved.

Workaround: Delete the orphan VLAN interfaces. After deletion, the commit functionality is restored for the affected Test Agent.
When you create a Monitor with 600 streams, you might encounter Monitor Creation Timeout error and the Monitor might automatically stop.

Workaround: Restart the Monitor from the Monitor-Name page (Observability > Active Assurance > Monitors >Monitor-Name) and click More > Start) on the Routing Director GUI.
When you click the Distribution tab on the Application-Name page (Observability > Health > Health Dashboard > Active Assurance (Tab) > Applications (Accordion) > View Details), the page hangs and you might not be able to see metrics and site-related data for a Measurement.

Workaround: None.
The status of a Test Agent is shown as offline after the device's Routing Engine switches over from the primary Routing Engine to the backup Routing Engine, or vice versa. This issue occurs only if you are using a Junos OS version that is older than 23.4R2.

Workaround: Reinstall Test Agent after the Routing Engine switchover.
When you add a new host to the existing Monitor, the new measurements are not reflected in the Active Assurance tab of the Health Dashboard (Observability > Health).

Workaround: None.

Network Optimization

Junos OS Release 22.4R1 and later have a limitation with SR-TE LSPs. For PCEP sessions to be established, you must disable the multipath feature using the following command:

set protocols pcep disable-multipath-capability

Secondary path is not supported.
The status of SR-TE LSPs might be displayed as down after delegating or provisioning. There might be errors (RPD_SPRING_TE_ROUTE_LSP_MISMATCH) in RPD logs if there are parallel SR-TE LSPs (same source or destination nodes) using both node and adjacency SIDs.

Workaround: All parallel SR-TE LSPs should use either node or adjacency SIDs.
Events listed on the Events page Observability > Network > Topology > Tunnels tab > Tunnel-Name > View > Event History) do not contain route information of SR and SRv6 LSPs. Due to this issue, the Show Path Changes option might not work for SR and SRv6 LSPs.

Workaround: None.
Some of the columns on the Events (Observability > Network > Topology > Tunnels tab > Tunnel-Name > View > Event History) page is empty.
Workaround: None.
When creating SR tunnels with a Count value greater than 1, the same color is applied to all the tunnels, which results in an error.

Workaround: Do not specify Color if you prefer to set the Count value that is greater than 1.
If the specified Tunnel Delay Violation Interval is lesser than the LSP Latency Interval (default value is 5 minutes), then the threshold maximum delay rerouting will not happen.

Workaround: None.
Sometimes, there is a mismatch in the severity level that is displayed in the Severity column of the Troubleshoot Devices page (Observability > Health) and the Device tab of the Topology page (Observability > Network).

Workaround: None.
After you perform a backup or restore operation, the traffic is displayed as 0 percent on the Topology page.

Workaround: After the backup or restore operation, either restart the pf-telemetry pod or trigger a device collection., and also restart the pcs pod in pf- namespace.
When a router which resides on an segment routing-traffic engineering (SR-TE) path raises the overload (OL) bit in its ISIS LSP, the tunnel doesn’t get rerouted away from that router.

Workaround: Manually specify a large delay on a node with the overload bit.
Due to Kafka message size limitation, you can delete only 200 LSPs at a time.

Workaround: None.
If the configuration database is locked exclusively by a root terminal session, then tunnel provisioning fails and the status is displayed as Unknown.

Workaround: Use the Edit LSP option in the GUI and re-provision the tunnel after you remove the exclusive lock from router configure.
If there are multiple ECMP diverse paths and if you have enabled periodic re-optimization, then the diverse LSPs might switch back and forth between two routing paths.

Workaround: If you do not prefer this behavior, set the Path Type as Preferred on the Modify LSP page.
Sometimes, an LSP provisioning might not be successful, and you might see the PCC_Pending error displayed on the tunnels table of the Topology (Observability > Topology) page.

Workaround: Restart the PCEP session on head-end routers by deactivating and activating the protocols and PCE-related statements in the Junos OS configuration.
When you update the AS number on the Dynamic Topology tab of the Topology Settings Options page (Observability > Topology > Settings icon), the updated AS number is not reflected in the containerized routing protocol process (daemon) (cRPD).

Workaround: In addition to updating the AS number using the Routing Director GUI, you must log in to the cRPD CLI and update the AS number.
In broadcast links exist in the network, Segment Routing (SR) LSPs may not be created.

Workaround: Change broadcast links to point-to-point links in the router configuration.

Network Planner

The admin-group constraint that is set in a tunnel is not considered during the what-if failure simulation.

Workaround: None.
A report is not generated when you run a what-if failure simulation for shared risk link groups (SRLGs).

Workaround: None.
A tunnel and demand should not have the same name.. Otherwise, the status of the tunnel and demand might be displayed as down.
In an offline model, only a primary LSP can be created. You cannot create a secondary LSP or a standby LSP in an existing offline model. You can, however, view the secondary or standby LSP-related details when you import a live network.

Workaround: None.
Planner reports are deleted based on the retention policy settings. However, reports older than the retention period are deleted only when the scripts are run. The cleanup scripts are scheduled to run at midnight everyday.

Workaround: None.

Trust

There are no known issues in this release.

Administration

In the case of a scale deployment, when the system is under resource stress, you may notice that the papi-mon service pod restarts. This happens as part of the self-healing recovery, so that the service is restored.

Workaround: None.
Alert notifications through webhook and e-mail do not contain certain information, such as which alert triggered the notification, the device ID, and so on. The email notification that is sent to the users does not include essential information, such as message, hostname, deviceMAC, Key, and so on.

Workaround: To get this information, you must fetch the list of active alerts by using the /alert-manager/api/v1/orgs/{org_id}/alerts API .
You cannot use a Transport Layer Security (TLS) certificate to onboard Nokia devices.

Workaround: None.
LDAP authentication may not work for users that are not included in the CN=Users container.

Workaround: Add users to the CN=Users container.

Installation and Upgrade

If you have taken a backup of a Juniper Routing Director instance that includes the Active Assurance Victoria Metrics database (used for storing time-series data) and if you are restoring the instance, the GUI will fail to show the restored data. You might see a set of errors in the logs of the metrics-service in the paa Kubernetes namespace.

Workaround: Restart paa-metrics using the kubectl rollout restart deployment paa-metrics -n paa command
In a multinode setup, taking a backup after a node has failed may cause the backup operation to fail.

Workaround: To perform a backup or restore operation, a fully operational setup with all services running is required. If a node is non-operational for any reason, we recommend that you resolve the issue before executing the backup or restore operation.
When you run the request deployment troubleshooting information command, the output includes syslog files from the previous execution.

Workaround: Before you run this command, ensure that the files generated by the request deployment debug system-logs app common command are manually deleted.

If you are attempting to delete the file from the deployment shell, use the file delete /paragon/troubleshooting/app/<sys-log-file-name> command.

Alternatively, use the rm /root/troubleshooting/app/<sys-log-file-name> command from the Linux root shell.
When the cluster experiences a high load, some components, especially the Victoria Metrics Operator and ArangoDB Operator pods, may be restarted. This will not impact the cluster’s functionality.

Workaround: None.
In an air-gapped installation, the cluster deployment might fail with the following error:

TASK 2376 [jcloud/airflow2 : Install Helm Chart]

Workaround: Ensure that the NTP and DNS servers are reachable from all Routing Director VMs.
You might not be able to increase the disk size on an already installed system.

Workaround: If you encounter this issue, manually increase the disk size by adjusting the disk size at the VM level using the growpart /dev/vda 1 and resize2fs /dev/vda1 commands. On ESXi hosted VMs, use sda instead of vda. For more information, see Increase VM Disk Size.

If the cluster VMs are in different geographical locations or if the latency between VMs is equal to or more than 25ms, you must increase the timeout period.