Step 3: Understand What Happens When Memory Failures Occur
Most Juniper Networks Routing Engines support Error Checking and Correction (ECC) protected memory. There are two types of memory errors: single-bit and multiple-bit.
A single-bit error is when a single 0 or 1 bit is incorrect. The system detects and corrects single-bit errors, then logs the event in the
/var/log/eccdfile. If there are persistent single-bit errors, the Routing Engine controller reboots the Routing Engine. Persistent single-bit errors could be a symptom of bad RAM.Multiple-bit errors are when multiple bits are incorrect. By default, if a multiple-bit error is detected, a nonmaskable interrupt (NMI) is generated to interrupt the Routing Engine and panic the kernel causing the router to subsequently reboot. The Routing Engine panics the kernel, and leaves a vmcore file.Multi-bit parity error detection was implemented in JUNOS software release 5.3 and above.