Hitesh Ballani, Senior Principal Researcher, Microsoft Research

Research Talk: Cloud Networking for a Post-Moore’s Law Era

Industry Voices OperationsTrending

Image card of the host Hitesh Ballani, Senior Principal Researcher, Microsoft Research, holding up one of the off-she-shelf tunable lasers that Microsoft used to use

Let’s talk about how to get your network ready for future cloud requirements

In this lightning talk, Microsoft Research’s Hitesh Ballani discusses how new applications, like large-scale machine learning, are putting a lot of pressure on networks – and how networks can improve their infrastructure to keep pace.

You’ll learn

hy the network infrastructure needs to be an order of magnitude more efficient to cater to future cloud requirements
How Microsoft Research is betting on optical innovation to create new network technologies for both switching and transmission
About other space technologies that could lead to new growth curves for the future of Cloud networks

Who is this for?

Network Professionals Business Leaders

Host

Hitesh Ballani

Senior Principal Researcher, Microsoft Research

Resources

Transcript

0:06 >> I'm Hitesh Ballani. I'm a researcher at Microsoft.

0:10 Today I'm going to tell you about how we're

0:12 using light and optics to

0:14 re-imagine the future for

0:16 a Cloud network as

0:17 we approach the tail end of Moore's Law.

0:19 Now, historically, the power consumed by the network in

0:23 a typical datacenter has been

0:24 around 5 percent of the total datacenter power.

0:28 It's an important number, but

0:30 not something that we lose sleep over.

0:32 However, that may be about to

0:34 change because of new applications and

0:36 new scenarios that are putting

0:38 a lot of pressure on the underlying network.

0:41 A very good example is large-scale machine learning.

0:44 The size of the models that we've been trying to train

0:48 has been growing by more than 10 times every year,

0:51 which is much faster than the gains

0:53 that we are even getting because of Moore's Law.

0:56 Actually, since 2019, the size of the largest model has

1:01 exceeded the amount of

1:02 high-speed memory available on

1:04 individual training devices.

1:06 We have to print these models in a distributed fashion,

1:10 which means communication across

1:12 the network during the training phase,

1:14 which in turn necessitates

1:17 the need for an ultra-high bandwidth network.

1:20 If you look at all the

1:21 recent AI training devices in the market,

1:24 the off package bandwidth here in

1:26 all these cases is more than a terabit per second,

1:29 which is more than 10 times higher than

1:31 the bandwidth available on a typical Cloud zealotry.

1:35 It's not just machine learning.

1:37 Trends like resource desegregation and

1:40 increasing use of hardware accelerators all require

1:44 a step change in network performance in terms of

1:46 bandwidth and in some cases in

1:48 terms of latency and reliability.

1:51 It's very hard to achieve this step change with

1:55 incremental changes to

1:56 traditional networking technologies.

1:59 If we go back to our power analysis,

2:02 historically, the network has consumed

2:05 5 percent of the total datacenter power.

2:07 As we look ahead to servers being equipped with

2:11 200 gigabit per second and 400 gig network bandwidth,

2:15 this power will start increasing gradually.

2:18 The real worry here is that if I

2:21 want to build an AI supercomputer comprising

2:24 tens of thousands of AI accelerators,with

2:27 more than a terabit per endpoint network bandwidth,

2:31 the power consumed by the network inflates

2:34 to about 50 percent of the total infrastructure power,

2:37 which is of course not sustainable from an

2:40 economical or from an environmental perspective.

2:43 This is the real power wall and

2:46 the associated cost wall that we are very worried about.

2:50 Now, if we were to break this down,

2:53 the Cloud network today

2:55 comprises of two main components;

2:57 electrical switches and transceivers that pick

3:00 digital data and encode that into

3:03 light fields that are transmitted across optical fibers.

3:06 The transceivers consume

3:08 around 60 percent of the network power,

3:10 and the remaining is coming from the electrical switches.

3:13 On the transceiver front

3:16 a big part of the challenge

3:17 comes from the fact that we are

3:18 wasting a lot of energy moving data

3:21 from the switching ASICs to

3:24 the pluckable transceivers that

3:27 are at the edge of the switch front panel.

3:30 There's been a lot of work over the past decade in

3:33 the research community on how we can move the optics much

3:37 closer to the switching ASIC so that

3:39 they ideally sit on the same package

3:40 and really optimizing the underlying optical components.

3:45 On this front, at MSR,

3:47 we've actually worked with partner to partners to design

3:51 optimized modulators and photodetectors that have

3:54 the potential to reduce

3:56 transceiver power by an order of budgeting.

3:59 In general, this notion of

4:00 optical co-packaging is getting a lot of placation in

4:03 industry which is great

4:06 because it addresses this blue piece of the pie.

4:09 However, this, in turn,

4:11 converts the switching power to the next bottleneck.

4:14 An obvious question to ask is,

4:16 what do we do above this green elephant in the room?

4:20 As I mentioned, historically,

4:23 they relied on electrical switches

4:25 that have skilled in line with Moore's law,

4:28 but that trend is becoming harder and harder to sustain.

4:32 Effectively, we have a mainstream technology

4:35 that is approaching the tail end of its S-curve.

4:39 The question we're asking is,

4:41 are there other non-similar space technologies that

4:45 could lead to new growth curves

4:47 for the future for Cloud network?

4:49 On this front, optics has a lot of potential.

4:53 Actually, as it turns out,

4:56 there's been more than 20 years of

4:58 research in the optics in physics community on

5:01 different physical layer technologies that can

5:04 essentially switch light waves at fast granularity.

5:08 The particular technology we

5:11 took a bet on is actually very simple.

5:13 The core of our network is

5:15 just a passive diffraction grading.

5:17 No moving parts, no electricity.

5:20 It's just a piece of glass

5:22 with etchings on it and it behaves like a prison.

5:25 If we send our data on the red wavelength,

5:29 it goes in one direction.

5:30 If we send it on the green wavelength,

5:32 it goes in a different direction.

5:34 We can use this as

5:36 a switching block in our data centers,

5:38 as long as we have a light source

5:40 whose wavelength we can control,

5:41 which is precisely what a tunable laser lets us do.

5:46 Now in order to support Cloud workloads that can be very

5:50 bursty and may require

5:52 almost packet by packet switching,

5:54 we need these lasers to be

5:56 tuned at nanosecond granularity.

5:59 Unfortunately, when we started

6:01 this project back in 2016, 2017,

6:04 the off-the-shelf tunable lasers we were using

6:07 could only be tuned at millisecond granularity.

6:12 That was ASIC's orders of magnitude off.

6:16 That's when the penny dropped for us.

6:18 In order to make this technology wilder for prime time,

6:22 we actually have to innovate at the physical layer,

6:25 which is different from the traditional research

6:27 we've done at Microsoft and Microsoft Research.

6:30 We took the dive, we designed

6:33 our own custom optical chips

6:34 and you can see them on the screen here.

6:36 They're tiny chips that can achieve

6:38 this laser tuning in less than a nanosecond.

6:41 In this process, we designed our own chips,

6:44 we got them fabricated at external foundries,

6:47 we were able to test them at die levels,

6:50 we got them packaged.

6:52 While this was a great learning experience,

6:55 it was only the very first step of a very long journey,

6:59 resulting in an end-to-end demonstration of

7:02 a system prototype where we were able to

7:04 transmit real data between endpoints.

7:07 We were able to demonstrate

7:09 the end-to-end reconfiguration latency

7:11 of less than four nanoseconds.

7:14 This is neat because if you are showing

7:16 the viability of a system that

7:19 can be reconfigured at packet by

7:21 packet pieces while relying on our optical technology.

7:25 Now, as you can imagine,

7:27 achieving this required going beyond

7:30 our custom chip and solving

7:31 the associated physical challenges.

7:34 We had to solve challenges across

7:36 the entire data center network stack.

7:39 For example, we designed and implemented

7:42 a time synchronization protocol that can achieve

7:45 synchronization at a granularity

7:47 of less than a nanosecond.

7:49 Yes, yet it's scalable and very robust.

7:53 Similarly, we had to solve

7:54 the longstanding problem of

7:56 scheduling in an optically switched network,

7:58 whereby coming up with a scheduler that

8:01 can operate at nanosecond timescales,

8:04 yet is practical to deploy is incredibly hard.

8:09 The thing I want to point out is that these are not

8:12 new technical problems and these

8:14 are well-known in different communities.

8:16 I have an epoch in background.

8:19 We love designing scheduling algorithms

8:21 and congestion control protocols.

8:23 That's a better model but because we've

8:26 gone off these problems in a piecemeal fashion,

8:30 the general prognosis regarding

8:32 the viability of optical switching has been pessimistic.

8:34 The only reason we were able to come up with

8:38 a slightly more optimistic outlook is because

8:41 we vent off to these problems in a cross-layer fashion.

8:44 We really try to co-design a solution with

8:47 the idiosyncrasies of the Cloud environment

8:49 and a Cloud workloads.

8:51 That's one of the joys of working at Microsoft Research.

8:54 Our team here has computer scientists and

8:57 physicists and optical experts and chip designers

9:00 and mechanical engineers working hand in hand to

9:04 reimagine Cloud infrastructure given the Cloud context.

9:09 That's the message I want to leave with you.

9:12 We have this emerging tsunami of emerging scenarios like

9:17 resources aggregation in Machine Learning that are

9:20 going to put a lot of pressure on the underlying network.

9:23 At the same time,

9:25 when mainstream networking technologies

9:27 are starting to run out of steam.

9:29 We are betting on optical innovation,

9:32 which will allow us to create new forms of

9:34 transceivers and new switching technologies

9:37 which hold the promise of offering

9:40 that step change in

9:41 netbook performance that we're looking for.

9:44 The key insight here is to co-design these solutions

9:47 across the entire data center stack

9:50 and with Cloud applications in mind.

9:53 This co-design exercise is not

9:55 just a one-way street that benefits the network.

9:59 It has implications for future Cloud applications too.

10:02 For example, imagine designing a distributed system or

10:07 any iatrogenic chip that can assume that

10:10 the underlying Cloud network is completely synchronous.

10:14 It can have massive implications

10:16 for Cloud application design.

10:19 That's the opportunity that I want you to think about.

10:22 I'm going to stop here. Thank you for your time.

10:25 Hope you have a lovely research summit.

10:27 If you'd like to know more about this work,

10:29 I'd love to tell you more. Thank you again.

Watch