Whiteboard Technical Series: Reinforcement Learning

A drawing of a mobile phone on the left showing a Wi-Fi signal detection app. On the right are the following words and numbers: “Band 2.4, 5; Channel 36, 38, 42, 60, 64; Transmit Power 2, 8, 16, 32, 64; Bandwidth 20, 40, 80, 160.”

Thank you AI-driven RRM for the stellar experiences.

Radio frequency environments are inherently complex and challenging to control. But what if the wireless network itself could perform Radio Resource Management (RRM) on its own? The Juniper Mist AI platform does just that by using reinforcement learning to optimize the RF radio waves that transmit traffic. 

Show more

You’ll learn

  • Exactly how reinforcement learning works to reinforce optimal behavior

  • The five actions the machine can perform to optimize the value function

  • The many benefits of AI-driven RRM using reinforcement learning 

Who is this for?

Network Professionals Business Leaders


0:09 radio frequency environments are

0:10 inherently complex

0:12 and therefore challenging to control and

0:14 optimize for the efficient transmission

0:16 of data

0:17 since the inception of radio frequency

0:19 or rf

0:20 radio resource management also known as

0:23 rrm

0:24 has been a long-standing technique used

0:26 to optimize the rf radio waves

0:28 that transmit network traffic in

0:30 wireless lands

0:31 however multiple interference sources

0:34 like walls

0:35 buildings and people combined with the

0:37 air servings of transmission medium

0:39 make rrm a challenging technique to

0:42 master

0:43 traditionally site surveys have been

0:45 used to determine the optimal placement

0:47 of wi-fi

0:48 access points and settings for transmit

0:49 power channels

0:51 and bandwidth however these manual

0:53 approaches can't account for the dynamic

0:55 nature of the environment

0:56 when the wireless network is in use with

0:58 people and devices entering or leaving

1:01 and moving about additionally this

1:04 challenge is compounded with random rf

1:06 interferences from sources like

1:07 microwave ovens

1:08 radios and aircraft radar to name a few

1:12 but what if the wireless network itself

1:14 could perform rrm on its own

1:17 what if it could detect and respond to

1:19 both interference sources

1:21 as well as the movement of people and

1:22 devices and adjust the radio settings in

1:25 real time

1:26 to provide the best possible wireless

1:28 service

1:29 that's exactly what juniper has done

1:31 with the ai driven missed wireless

1:33 solution

1:33 using advanced machine learning

1:35 techniques specifically

1:37 mist uses reinforcement learning to

1:39 perform rrm

1:42 in a nutshell a reinforcement learning

1:44 machine

1:45 or agent learns through an iterative

1:47 trial and error process in an effort to

1:49 achieve the correct result

1:51 it's rewarded for actions that lead to

1:53 the correct result while receiving

1:55 penalties for actions leading to an

1:57 incorrect result

1:58 the machine learns by favoring actions

2:00 that result in rewards

2:02 with miss wireless the reinforcement

2:04 learning machine's value function

2:06 is based on three main factors that lead

2:08 to a good user experience

2:10 coverage capacity and connectivity

2:13 a value function can be thought of as an

2:15 expected return based on the actions

2:17 taken

2:18 the machine can execute five different

2:20 actions to optimize the value function

2:23 these are adjusting the band setting

2:25 between the two wireless bands of 2.4

2:27 gigahertz and 5 gigahertz

2:29 increasing or decreasing the transmit

2:31 power of the ap's radios

2:34 switching to a different channel within

2:36 the band

2:37 adjusting a channel's bandwidth and

2:40 switching the bss color

2:42 which is a new knob available to 11ax

2:45 access points

2:46 rrm will select actions with maximum

2:49 future rewards for a site

2:51 future rewards are evaluated by a value

2:53 function

2:55 the various actions taken by the

2:56 learning machine such as the increase of

2:58 transmit power or switching the ban from

3:00 2.4 g to 5g

3:02 together represent a policy which is a

3:05 map the machine builds based on multiple

3:07 trial and error cycles as it collects

3:09 rewards

3:10 modeling actions that maximize the value

3:12 function

3:13 again keep in mind that the value

3:15 function represents good wireless user

3:17 experience

3:18 as time goes on even if random changes

3:21 occur in the environment

3:22 the machine learns as it strives to

3:24 maximize the value function

3:26 the benefits of using reinforcement

3:28 learning are obvious a missed wireless

3:30 network customizes the rrm policy per

3:33 site

3:33 creating a unique wireless coverage

3:36 environment akin to a well-tailored suit

3:39 while large organizations with multiple

3:41 sites replicate their many locations as

3:43 copy exact

3:45 these sites will naturally experience

3:47 variances despite

3:48 best efforts reinforcement learning

3:51 easily fixes this

3:52 delivering real-time actively adjusting

3:55 custom wireless environments we hope

3:58 this episode helped to uncover some of

3:59 the magic

4:00 and mystery behind our ai driven network

4:02 solutions

Show more