James Carroll, Founder, Optical Networks Daily

Next-Gen Data Center Interconnects: The Race to 800G Webcast with Inphi and Microsoft

Industry Voices Trending

Title slide with a name and photo of the webcast host and three guests against a dark colored background with a picture of a skyline at the bottom with blue beams of light shooting into the sky representing invisible networks.

Mark Filer: “People are expecting something higher than 400G and they are expecting it soon.”

Watch this fascinating webcast to gain a new perspective on 800G. Two experts from Inphi, a global leader in optical interconnect technologies, and Mark Filer from Microsoft explore the path to 800G Coherent and the race to push lane speeds to 224G.

You’ll learn

How data centers have evolved and where 800G fits in
If there is a demand for 800G among network operators (hint: it’s affirmative)
The innovation, architectures, and hardware that future data center networks will require

Who is this for?

Network Professionals Business Leaders

Host

James Carroll

Founder, Optical Networks Daily

Guest speakers

Dr. Ilya Lyubomirsky

Senior Technical Director, Inphi

Dr. Radha Nagarajan

CTO & SVP, Platforms, Inphi

Mark Filer

Principal Engineer, Optical System Architecture, Microsoft

Resources

Transcript

0:00 [Music]

0:13 so i've really been looking forward to this webinar on 800g the race to 800 g as as we know 400 g

0:20 400 zr is just starting to come to market now so what we're talking about here is sort

0:26 of the next next thing on the horizon we've got two great companies represented here first infine

0:34 which as many of you may know has been in the news a lot over the past year for both business

0:41 reasons which we can't really talk about in this in this webinar as well as for the technology reasons

0:47 um it was about a year ago that they started sampling their spica 8 800g

0:54 7 nanometer dsp which provides a pathway to things such as 8 by 100 g optical transceiver modules

1:02 very interesting and of course we have the perspective of one of the largest

1:07 hyperscale cloud operators in the world which i think will give us um really a reality check as to

1:15 whether this next next wave on the horizon is something that is coming soon perhaps

1:21 or perhaps a little bit later so we've got a lot to talk about let's kick it off rada um

1:28 take it away hi this is radha nagarajan from infi thank you all for the great introduction

1:36 together with uh ilya liu bemersky we're going to give you a perspective

1:41 on 800g so there are a wide spectrum of

1:47 you in the audience i'm going to take a step back and just look at data centers

1:55 data center architectures how they've evolved and where does 800 gig fit in there's been

2:02 a considerable evolution they listen to interconnect and data centers in general

2:10 have gone from sort of hierarchical structures with a lot of over subscription to

2:16 flatter and flatter uh architectures um it's a little bit of an

2:21 exaggeration this slide is borrowed from microsoft what has happened over the last couple

2:28 of years the elimination of layers very high speed interconnects between layers minimization of over subscription

2:35 and data center aggregation and what has actually driven this is lots of interconnect resulted in

2:42 driven versus interconnect rich environment machine to machine traffic a lot of what

2:48 we call east-west traffic and also a very high radix interconnection

2:54 between servers and sort of elimination of the switching layers

2:59 for applications that require lower and lower latency

3:04 so what i'm going to focus on is ratix and i'm going to tell you why it matters

3:10 and then how it drives data center interconnect and interconnect speeds

3:15 here's a chart that we borrowed from facebook what it shows

3:22 is the switch radix versus the number of levels that you actually

3:28 need to interconnect x number of servers

3:33 for example if your radix is at 32 which is out here and if you wanted to interconnect 100

3:40 000 servers which is a typical size of a fairly large data center you probably need four levels of network

3:48 and if you were to actually go to something like 128

3:53 you need something between two and three levels essentially you need three levels so by going from one radix to another

4:00 radix is essentially in how many small tributaries that your switch gets

4:05 split up that determines what your interconnect speeds are so

4:12 where do we actually care about radix at least in this perspective

4:20 it sort of drives the interconnect speeds i have switch generation here 12.8 is in

4:28 deployment 25.6 coming up and 51t is the one after that

4:33 at 12.8 at radix 32 sort of a simplified view though even for a given radix the data center

4:39 architectures are quite a bit different for for between different operators

4:44 you have a 400 gig interconnect or if you had in 128 the examples of both

4:50 uh your interconnect speeds optical interconnect speeds become 100 giga and as you go to 51t uh at 25 6 you go

4:58 to 800 and 1.6 at radix of 128 you go from 1

5:04 2 and 4. as you can well imagine we have the whole breadth of interconnects out there and deployed

5:11 differently under different architectures all the way from 1 to 400 and 800 is coming trust me i can

5:19 say who is deploying and who isn't that isn't the focus of the talk but we are getting there so

5:26 we want to go 2x and speed scaling what do we do let's start with 400 gig

5:32 today 400 gig uh two flavors one is between data center interconnects

5:38 60 gigabyte 16 qualm quan 16 one lambda or 53 gigabyte pam4

5:45 four lambdas this is what a pam signal looks like and we take this and put them in two axes this is what a

5:51 qualm 16 signal looks like the easiest or the easier way to scale

5:57 is why don't you just increase the number of wavelengths 2x number of lambdas and that is the

6:04 path at 800 gig eight landers um the first implementation of 800 gig inside data center is following

6:10 the issue usually is power size cost and it's not really scalable so what you

6:16 do after eight even eight is a bit of a challenge the next axis is when we just increase

6:24 the baud rate by 2x and the challenges there are the analog bandwidth but it looks like we have a

6:31 handle on analog bandwidth because the next axis is even more

6:36 complicated for a given bandwidth can we just increase the

6:42 modulation format complexity snr you don't really want to be dealing with

6:48 pam16 that's what it takes from pam4 to increase the double the baud

6:53 rates or qualm 256 and then let's look at some

6:58 engineering tradeoffs with a quant 64 90 gigabyte work or a pan

7:03 six 90 gigabyte work uh the consensus that's building it 800 gig single lambda would most

7:10 likely be uh 12 120 gigabyte

7:16 com16 and about 112 gigabyte pan 4. we'll get into the

7:22 details in a moment all right technology choices in data center interconnects

7:28 view at 100 gig inside data center 802.3

7:34 psm4 cwdm4 and 2 to 10 kilometers lr4 also cwdm

7:41 wavelengths 10 to 100 kilometers there were two options spam 4 in which invite deployed

7:47 together with microsoft and coherent between 100 to 600 gig coherent dwdm

7:53 and a variety of coherent formats going all the way up to 10 000 kilometers

7:58 at 400 gig there was a real opportunity and has been exploited um to collapse

8:06 a lot of these into a common format although i wouldn't claim that you can go up to 10 000 kilometers in commercial deployments

8:13 but zr zr plus they are making an impact um although at really uh long distances

8:20 there are still uh custom-made deployments but zr and zr

8:26 plus would would cover things all the way up to about a thousand kilometers and um two things happened uh one is the

8:33 collapse of the coherent formats and into one and also the dsp type processing moved

8:41 inside the data center a 200 gig and 400 gig is essentially m4 and 400 gig is 500 gig per lambda

8:50 and a 200 gig is 50 gig per lambda dr4 and fr4 and it an

8:58 800 gig uh view to the future um we think it's going to be 200 gig per

9:05 lander we'll make a case for it in terms of power consumption in terms of complexity

9:10 200 gig per lambda makes a whole lot more sense than 800 100 gig per lambda and eight

9:16 wavelengths when it gets 200 gig per lambda you may be you are probably going to be

9:23 dispersion limited even at 13 10 so it might just be limited to

9:28 two kilometers and you'll probably have coherent moving closer and closer to the data center we don't think coherent is

9:35 going to move inside the data center anytime soon and just for

9:41 to get some discussions going there is a possibility that we may actually have a cwdm

9:47 lr coherent with a distance optimized and a lower power dsp

9:55 so 400 gig four lambda 800 gig eight lambda 800 gig for lambda the variety of uh

10:03 trade-offs here going from 50 gig pam to 100 gig cam the number of host uh lanes remain the

10:10 same that's sort of important going from 100 gig to 200 gig what do we think this transition would happen

10:16 it's got to do with uh dsp area dsp power although the

10:24 analog bandwidth requirements are going to go up from power consumption perspective

10:31 i think this transition would happen uh that that we would go from 100 gig pam

10:39 to 200 gig pound on the line slide and we could potentially accommodate all of these

10:45 in a qsfpddra osfp form factor the question that always

10:51 comes up um also another point is moving to 800 gig

10:56 for lambda you need 200 gig per lambda to get to 1.6 t

11:01 modules which i showed you um in the slide on ratings

11:07 why wouldn't you do an 800 gig coherent one lambda inside data center the number of issues here as to why

11:15 coherent may not move inside or will not move inside data center 800 gig power the power is still going

11:22 to be high for 800 gig coherent one lambda backward compatibility to pam4 very

11:29 important uh the established infrastructure right now inside data center spam 4

11:34 and backward compatibility is will matter and synergy with copper applications

11:41 there are some challenges for pam4 i said reach extension beyond 10 kilometers i would say reach extension

11:47 beyond two kilometers is a challenge but a variety of other solutions can be

11:53 brought to bear on this but we think it's spam inside data center 800 and 16 com

12:00 outside all right for those of you on the audience who do not deal with

12:06 pluggable modules um here is a primer

12:12 the pan 400 gig that we deployed for dci between data center is what we

12:17 call a qsfp28 module and for the 400g zrs either a

12:24 qsfpdd or osfp they're slightly wider for qsfp and slightly longer for

12:32 qsfp dd and for a one ru which is typically one and

12:39 three quarters of an inch in height and a 19 inch with rack both of these formats will accommodate

12:44 32 modules which fits neatly into a 32 radix type data center architecture so

12:50 they're slightly different um in in physical dimensions but both of these two both of these formats uh will work

12:59 so um we have been focused on silicon photonics uh we have been developing silicon

13:05 photonics for the 400 gzr and 400g um dr4 modules

13:11 uh this is a typical performance of the max under modulators with um over about about 40 gigahertz bandwidth

13:18 and high high-speed photo detectors the first two are prerequisites for 400 gig inside data center

13:24 and outside data center you really have to have uh polarization

13:29 maintaining components as well polarization beam splitters polarization combiners low loss and high

13:35 extinction ratio before i pass on to my colleague i'd like to show you some of the work that

13:41 we've done and deployed uh something that jim referred to earlier on uh we announced about a year

13:49 back 400 gig zr modules gsrp dd form factor which is a similar form

13:55 factor to dr4 and fr4 inside data center using an advanced 7 nanometer node for

14:01 the dsp an advanced silicon photonics node from by cmos foundry

14:06 it's 8 by 26 gigabyte pan 4 coming in 802.3 bs format standard

14:13 output is a one channel one lambda 59 60 gigabyte quantum 16 nominally 80 to

14:21 120 kilometer um span reach we have a dsp and a silicon photonics engine inside the module and um we've

14:29 shown this data earlier on polarization scramble 50 kilo radians less than half db penalty and the spec

14:36 is 26 db for zr and this is a typical 400 g switch that you can buy in

14:44 a market and a fully loaded spectrum and you can

14:49 plug these modules directly into a router or a switch and convert that uh into a line

14:56 system and that is the uh the beauty of pluggable modules either

15:02 for 400g zr or 400g dr4 fr4

15:08 and uh the same trend would sort of follow at 800 gig

15:14 where the modules would be switch pluggable and of course there's a challenge

15:20 for cooling but that would be the same trend that

15:25 would follow uh for 800g as well i am going to pass on to ilia uh

15:34 to look at uh the details of 200 gig per lambda modulation equalization and fpc

15:40 ilia great thank you rada great uh set up and background for uh the rest of

15:46 the presentation where um we'll focus on uh 200 gig per lambda pam technology

15:52 and specifically i'll try to give you a flavor on some of the important trade-offs with modulation

15:58 formats equalization techniques and forward error correction um

16:03 radar if we can go to the next slide um

16:12 okay great so i i'd like to start uh first to give a little bit of background uh on

16:17 the existing 400 gig standards because i think there's some interesting trends there that perhaps

16:23 could give us some insights on on 800 gig so here i've

16:28 organized the ieee and msa standards according to

16:33 kind of the generation and the reach at the top i have the 400 gig

16:38 standards based on eight lanes of 50 gig pam4 and you can see that these standards

16:46 pretty much cover all the key data center applications from the short reach multi-mode at 100 meter

16:52 uh based on parallel uh multimode fiber uh you know the fr8 eight lanes of

16:58 a single mode lwdm going up two kilometers uh lra going out to 10 kilometers and

17:05 even er8 uh going out to 40 kilometers uh the current generation of standards

17:11 is based on a hundred gig per lane so four lanes uh to give a 400 g

17:16 um rate and as radar pointed out one of the important trends is to reduce

17:22 the number of optical uh lanes to reduce the optics cost and the power of the modules

17:28 uh so that's the current generation based on four lanes of 100 gig per lane pam technology and again we're

17:33 able to cover most of the key data center applications you know the short reach uh multi-mode sr4 uh dr4 at 500 meters

17:42 you know fr4 going out to two kilometers and lr4 one i think interesting trend

17:49 here though is that we're starting to see some of the limitations uh of going to higher baud rate uh using

17:55 you know intensity modulation and direct detection uh and especially uh the impact of fiber

18:00 chromatic dispersion is starting to uh you know uh starting to have an impact on the reach so for example the

18:06 ieee decided to limit the lr4 reach from the traditional 10 kilometer to six

18:11 kilometers uh to uh to mitigate some of the dispersion penalty uh and even though we have uh an msa

18:18 that has extended this reach to 10 kilometers essentially kind of leveraging the industry advancements in

18:24 dsp technology and equalization that can mitigate this dispersion penalty uh still we're

18:30 starting to see the impact of fiber impairments on on scaling the

18:36 baud rate and i'll talk a lot more about that in the following slides in the context of 200 gig per lambda

18:42 now uh the ieee recently started a new project uh beyond 400 gigabit ethernet

18:50 and although this project is just starting off and we're going to start working on the objectives

18:55 i can speculate here that most likely we will have an 800 gig objective and very

19:01 likely we will look at pam technology for a 200 gig per lane

19:07 and going out to one to two kilometers uh fr4 type of distances uh and um

19:13 i believe you know i will need a stronger fact uh to deal with you know going to higher

19:19 baud rates um and most likely uh we'll need you know to leverage the dsp technology to

19:25 mitigate uh some of the impairments that we're going to have to deal with when we scale to higher baud

19:31 rates so in the next slide i'd like to actually go over the key

19:36 impairments in in in an optical 200 gig per lambda system

19:43 so here i show a little cartoon um just describing the key components uh in

19:49 a pam4 optical interconnect the key components are the dac at the transmitter

19:55 the driver and the laser modulator we have of course the fiber link and at the receiver the photodiode the

20:02 tiaa and the afe and adc and some of the key impairments that we have to consider

20:10 are the transmitter analog bandwidth limitations and electrical reflections so as an example of that i borrowed a

20:17 picture from an 800 gig pluggable msa white paper shown below here

20:22 which shows two uh designs for for 200 gig per lambda transmitter

20:28 based on eml technology and you can see in the frequency response uh the the bandwidth

20:35 analog bandwidth is around 50 gigahertz and there are lots of ripples in the frequency response

20:40 perhaps due to reflections in these uh interconnects and and components so that's something that's very important to to consider in

20:47 in our designs and i'll focus in more on that in the following slides

20:52 uh in the optics we have to carefully consider the laser rin noise it's a

20:58 level dependent noise which has a very important impact on higher order modulation and

21:03 in particular it would impact pem6 more than pam4 and i'll show you some of the ramifications of this uh in in the

21:10 following slides uh and for the fiber propagation the fibrochromatic dispersion as i mentioned

21:15 is a key impairment and we're already seeing that at 100 gig per lane the fiber chromatic dispersion penalty

21:22 actually scales quadratically with baud rate so we expect this to be even even more of a limiter for 200 gig per

21:28 lambda and i'll have a separate slide where i'll just focus in on on the chromatic dispersion impact um there are other other you know

21:36 optical propagation effects like pmd uh and at the receiver uh the key component there is the tia

21:44 and we have to carefully consider you know the specifications of the tiaa noise the linearity and analog bandwidth

21:50 so i'll provide a slide which gives an analysis on on ta performance as well

21:58 okay so um i talked about the impairments what what are the tools that we have uh in our toolbox to deal

22:05 with these impairments uh here i show um kind of a simplified receiver dsp architecture

22:12 and try to highlight some of the key components and tools that we have so first of all in the analog part the

22:17 afe the most important component is the adc and this is really the component that enables dsp

22:24 architecture and and today we have adc technology that can go out to beyond you

22:29 know 100 gig a giga sample per second and analog bandwidth so 50 gigahertz or better

22:35 uh in the dsp domain we have some very powerful tools starting with linear equalization based

22:42 on the ffe and today we can uh you know implement

22:47 ffes with 10 to 30 taps or more and the ffv is extremely useful for both

22:53 mitigating you know analog bandwidth imperfections as well as reflections and all kinds of

22:58 imperfections in the frequency response we can implement a one type dfe which as

23:04 i'll show is very useful for kind of extending uh the ban you know mitigating the bandwidth limitations and

23:10 it also helps a little bit with chromatic dispersion and uh when we're really stressed with a

23:16 very bad channel uh then the most powerful equalization is based on what's called maximum likelihood sequence detection

23:23 and we've implemented kind of a simplified low power mlsd and i'll show you some

23:30 results on mlsd as well and finally a very important tool is the forward air correction um

23:38 you know in in the existing standards uh kp4 fec is the kind of the workfor workhorse

23:45 that we use for both 50 gig and 100 gig per lane pam technology and it's really important to enable

23:53 the reaches and also enable low-cost optics and as we scale to higher baud rates

24:00 i believe we're going to need stronger fec but at the same time we have to keep that fact you know low power and low

24:05 complexity uh and low latency uh one of the advantages of the dsp architecture

24:11 that i show here is that we have available in the effect module not only the heart decisions but also the soft

24:16 information the reliability information and we can exploit that information to

24:22 design you know stronger facts while keeping the complexity fairly low so what i'd like to do in the

24:28 following slides is to discuss a little bit on fact and then i'll move on to the equalization and modulation

24:34 trade-offs here i show effect architecture based on

24:39 so-called concatenated codes this is actually the architecture that we standardized in the 400 gigs gr

24:46 standard it's a very flexible and powerful architecture because it allows you to start with a

24:52 with an existing faq so in this case we can start with the kp4 fact that already exists in the switch host

24:58 asic and we can build on it by adding a simpler uh code inside the dsp of the optical

25:05 module and we can also you know as i mentioned previously exploit the soft information

25:11 in the dsp of the optical module which is very close to the channel and by combining these two effects

25:17 together we we can get substantial improvements in performance in the coding gain while keeping the effect fairly simple

25:23 in the optical module to keep that power low and the latency fairly low

25:28 so this is the architecture we've been working on and uh on the next slide um i'd like to show

25:34 you some simulation results on performance of these concatenated codes

25:40 so these curves show uh several designs of a uh kp4 effect for the auto code and

25:47 different levels of complexity of the of the fact for the inner code uh starting with

25:52 the purple curve which is just kp4 fact by itself uh then the yellow curve we're adding

25:57 a fairly like the most the simplest effect you can think of the just a single parity check code inside

26:03 the optical module dsp and with soft decision decoding we already get about a one

26:08 one and a half db enhanced coding gain uh then the red curve uh we're using a more sophisticated

26:14 hamming code extended hamming code and we get an additional 1db improvements and we can keep going using

26:20 more sophisticated uh you know inner codes like the bch code uh twirl correcting code in the

26:25 blue curve and what we find is that as we use more and more sophisticated and

26:31 complicated inner codes we kind of uh have a diminishing return in terms of the trade-off between complexity power

26:38 and coding gain and that we found a good trade-off is actually this red curve which is um is using a hamming code for the inner

26:46 code and interestingly this is also the design uh we used for the 400 gigs er effect

26:52 standard and with this code we'll have an overall coding gain of about 9.4 db

26:58 while keeping the the complexity and the latency fairly low so i'm going to assume in the following

27:03 slides that will have one of these stronger effects available to us and next i'd like to focus more on the

27:08 modulation and equalization you know trade-offs so here i show

27:14 some system simulation results on the bid errate as a function of the analog component

27:20 bandwidth so the anal component bandwidth is a very important kind of parameter we have to consider um

27:26 and here i'm sweeping the the bandwidth of all the components you know the the driver the the dac the modulator the ka and the

27:34 adc and as we sweep the analog component bandwidth here we're simulating the bit air rate

27:40 for for both the pam4 system at 113 gigabyte uh and a pam-6 at 90 gigabyte

27:47 the horizontal dash curve is is the fact limit for one of the more uh you know advanced facts that i

27:53 just discussed and you can see some interesting trends here so for example with ffe only

28:00 uh which is uh blue curve versus black curve we see that there's a kind of uh

28:06 crossing crossover point uh where uh when you go beyond 45 gigahertz

28:12 you know pam4 gives a better performance while for for really lower analog bandwidth

28:18 m6 gives better performance due to its lower baud rate with the dfe when we add the dfi which

28:24 is shown in the red curve and the green curve that crossover point shifts to lower bandwidths and we

28:30 see that then um you know pam4 performs better even at 40 gigahertz and beyond and when

28:36 we add the mlsd equalization which is shown in the pink dash curve then really pam4 shows a very very good

28:44 performance much more robust to to analog component bandwidth limitations and and good bit error rate uh so based

28:50 on these results we we feel that pem4 is a good choice for modulation format for

28:56 uh for future 200 gig interconnects uh and in this slide uh it just confirms the

29:02 analysis that i just showed and here we're just focusing more on the ta and looking at

29:07 the impact of ka noise and k bandwidth so sweeping out that parameter space uh and comparing pam four to pam six and

29:14 you can see here uh you know red is bad you know it's below the effect threshold uh we wanna be in the green and blue region

29:20 and and this slide shows uh that pam4 has a much wider kind of good region of operation which

29:25 shows it's more robust and and a good choice for modulation uh so if you go to the next slide radar here i

29:33 just want to kind of finish the technical discussion focusing on the chromatic dispersion as i mentioned that's the key

29:39 optical impairment as we go to 200 gig per lambda so here i'm focusing

29:44 on pam 4 technology and and looking at the impact of equalization on the performance and i'm plotting the

29:51 simulated optical power penalty as a function of chromatic dispersion and what we can see here is that with

29:57 just with linear equalization which is the red curve we really don't have enough equalization

30:03 power to deal with a two kilometer fr4 spec and we need to go to at least an ffe dfe

30:09 architecture which should work the the dispersion penalty would be 1db or lower and with mlsd we really

30:17 are doing very well the dispersion penalty is below half a db and we could potentially even go to

30:22 longer distances beyond two kilometers uh so i'm running out of time so i'll just go to the conclusion slide

30:29 so in conclusion you know next generation 800g data center optical interconnect should be designed considering the unique application

30:36 requirements of data center operators the choice of modulation format dsp

30:41 architecture and fec requires careful analysis on the engineering trade-offs analog component technology and optical

30:47 channel specs and our optical system simulations uh show that pam4

30:52 is a good choice for 200 gig per lambda it's a feasible technology for 800g fr4 and dr4

30:58 synergistic with low latency ai breakout and copper and i will coherent technology on the

31:05 other hand may be the most uh most useful for longer reach applications such as uh 40 kilometers and maybe even 10 kilometers

31:12 where coherent dispersion tolerance and lower fiber loss at 1550 nanometers are the key

31:17 advantages um rather do you want to add anything to these i know this is great ilia um we'll we'll yield the floor to mark

31:24 and then take questions in another session okay thank you so um yeah thanks uh for for uh

31:32 inviting me to to speak here today and uh radha and ilya thanks for the uh interesting uh presentation there

31:40 um i wanted to kind of come around um with all the talk about the race to

31:46 800 gig and um you know the next gen uh solutions being uh just on the horizon um and just kind

31:54 of hedge that in with a reality check for i think um you know at least where we are at

32:01 microsoft um may not be indicative of of plans for the other hyperscalers

32:07 um and end users of these devices but at least gives a gives one uh counterpoint um

32:14 as uh as to where we we see things going um in our data

32:20 center ecosystem and so um

32:25 uh like with uh like with anything whenever we start talking about uh something more than six

32:33 months out in a public forum uh it's probably a good idea to caveat it with a disclaimer like like the one

32:40 shown here um it's meant to be a little more tongue-in-cheek than uh specific legal

32:46 legal language but uh just all that to say that you know uh at least that microsoft plans can change

32:53 quickly um but at the same time change is slow to happen it's a big shift so um uh you know

33:00 take take what i say here uh when i'm projecting forward with a grain of salt

33:06 um first i want to start out with the question um is there demand for 800 gig

33:13 you know i think that's the premise of the webinar and then rata nilia would have you believe that there is and the answer is uh pretty

33:20 straightforward at least as far as i can tell yes there is demand um i participate in

33:26 the oif and i'm on the board there and uh back uh the first part of last year we issued

33:33 a survey to network operators we issued it to over 25 network operators or so

33:38 and had them answer questions about beyond 400 gig um and and particularly in scope there

33:45 was uh kind of beyond 400 zr like the 400 gig coherent applications but the answers to

33:51 the questions shed some light on on what would be happening inside the data center as well

33:58 one of the relevant questions there i have in the top right is when are you expecting to need a standard coherent

34:03 wide side interface with capacity larger than 400 gig per lambda um and if you check out the answers uh

34:11 uh over 70 of the respondents came back with 2023 or before um and

34:18 and you know uh another um 25 or so uh said said sometime after

34:24 2025 um but you know big takeaway for there is uh yeah people are expecting uh

34:33 something higher than 400 gig and they're expecting it pretty soon that's two years away uh the next question what router port

34:40 speed do you plan to deploy after 400 gig and uh you know the vast majority here

34:45 um almost 50 percent said 800 gig you know uh some others said one

34:50 points one one t or 1.6 t i think it's pretty unclear what

34:55 the next mac rate uh would look like beyond a 800 gig perhaps but um and then some

35:02 answered other i'm not really sure what they had in mind um but uh you know one of the interesting things that came back to

35:08 in this survey response is what people expect the power of these modules to consume and as you

35:15 can see there the powers that people are saying are about where 400 zr is today and um you know as we know from

35:22 from history and early engineering projections from from what people are looking at for

35:28 what would be maybe an 800 zr um application uh that's going to be quite

35:34 a tall order so um to me this tells me that people are already pretty power constrained

35:40 and they kind of want to have their cake and eat it too but you know we'll just have to see how these things play out

35:46 so you know all of that is backdrop for what i'm about to show you which is the the very microsoft centric view uh i

35:54 would say um and i think it's helpful first and i apologize if if you've seen this

36:01 in other forms before we've been showing uh the next few slides in a few different

36:06 forums like ofc and e-cock for the last couple years but i always feel like it's helpful to level

36:12 set with the audience on how we build things today and that better informs you know our use cases

36:18 and requirements um for the future so looking at what we do at 100

36:24 gig sorry 100 gig and 400 gig um how we build our data centers today

36:29 so my slides will advance uh there we go um so you know we start with the notion

36:36 of iraq which is uh comprised of a bunch of servers tens of servers

36:42 and those servers are connected into the network by a

36:48 top of rack switch and the connections are made with dac cables that direct attached copper

36:56 that are about two meters in length and uh you know in our 100 gig ecosystem they're running off of 50 gig interfaces

37:02 from the servers um there's typically some over subscription of the tor device and that's usually a single switch asic

37:09 device um that has some over subscription

37:14 as it goes to the next tier of switches and those are connected in a class

37:21 fabric across uh multiple you know redundant switches with aoc cables

37:30 typically about 30 meters or less in link the aocs run at 100 gig and dissipate

37:37 around two to two and a half watts per per inch um

37:42 and tens of these racks um are um changed together across these tier

37:49 one switching devices and that makes up what we call a row or a pod set so this is

37:55 kind of the one of the main building blocks that you'd see in our in our physical

38:00 data centers um and other thing here is the tier one and the tor devices are typically um

38:08 uh both single switch devices um kind of the one ru pizza box type of switch uh commodity switch you

38:15 might have seen on the market um going to uh how we then build our data centers

38:23 um connect our data centers together um sorry i'm jumping ahead of myself how we

38:30 connect the the pod sets together within the data center a single data center facility

38:35 um we have uh uh what we call our tier two routers that

38:42 are um connected to our tier one devices across the different pod sets with parallel

38:49 single mode fiber and that fiber is actually a permanent part of the infrastructure it's amortized over the

38:54 lifetime of the data center to mitigate the cost but also because we

38:59 know we will use these fibers across generations those are about a kilometer in length uh

39:07 maximum and um the the tier twos are you note they're a different color

39:13 here that's to denote that they're multi-chip um modular switches that um

39:19 have fabric cards and um line cards for separate line cards for northbound and

39:25 southbound connected ports and um the optics that connect into those

39:31 devices on either end are uh typically psm4 although in some

39:37 legacy data centers we might use cwdm4 to cover a longer distance where we don't have

39:42 that parallel infrastructure and my note about these being modular swift is switches is significant because

39:49 it allows us to mix and match clusters of different generations 100 gig cluster or 40 gig cluster

39:55 and we do the impedance matching uh the term we like to use of of those different generations using the

40:02 line cards in those tier two devices and everything you see here is what

40:07 makes up what we call a data center now the way we connect our

40:12 data centers together within a metropolitan area is

40:18 maybe a little unique today but i think people will see more of this type of

40:23 approach going forward with 400 zr coming onto the market um but what we have is we have

40:29 two um hubs within a region that we identify as rng rngs regional

40:36 network gateways and the data centers are connected um

40:42 again uh in a in a class fabric across the metropolitan area

40:47 over dwdm line systems using pam4 technology which rodder referred to

40:52 earlier on in his deck um and this is um this is

40:58 uh the distances here that um we're talking about are typically

41:04 latency constrained we say 100 kilometers on average it's it's actually

41:09 a bit shorter than that but the it's to ensure that the round trip time from any

41:15 server and any one data center to another data center is bounded by uh a certain amount on the

41:21 order of microseconds that uh makes at the application layer um to the user

41:27 the the um uh if you had storage served out of one data center and

41:32 compute out of another for the same application it should feel seamless like it's coming out of one mega

41:38 data center facility and so that's where this distance comes from um again the the pam4 modules are

41:46 directly plugged into the routers here and and uh dissipate uh about what a

41:52 gray optics does uh that generation around four and a half watts and um the capacity that we would have

41:59 is a single percentage of the total server capacity for a given uh given region for our largest regions

42:06 today this is petabits per seconds of cross-sectional bandwidth going across the metro so just summarizing what this

42:15 looks like for the 100 gig ecosystem i'm at 100 gig down at the bottom it's the dex

42:21 uh connecting the servers to tours two meters uh the tourist to tier ones that's 100

42:27 gig aoc is 30 meters tier 1 to tier 2 100 gig psn fours

42:32 at a kilometer and then 100 gig dwdm pam4 um connecting

42:39 the data centers to the regional hubs and the regional hubs to each other we egress the regional hubs um

42:47 going toward the capital i internet and toward our um private backbone to connect uh to

42:53 other data centers across the the world um using kind of more traditional

42:59 long-haul types of solutions there so this is what it looks like at 100 gig

43:04 going to 400 gig you have to watch closely here uh boom 400 gig

43:12 um so if you'll notice uh basically nothing changed topologically the only thing that

43:17 changed was we scaled up um the the bandwidth of all of these links that are interconnecting the data

43:24 the switching tiers by factor of four and um this is no worthy um because

43:33 of the point that i'll get to on my next slide um the elephant in the room here and

43:40 something that hasn't been much of a major consideration up to this point in time and this point in time meaning the 400

43:46 gig ecosystem is power um the equipment uh network equipment power consumption

43:53 is already problematic for the 400 gig ecosystem and the charts at the right uh are are

43:59 highlighting this phenomena um switches in the 400 gig time frames are projected to be around 3x the power

44:06 of the 100 gig switches uh the optics are coming in around three to four x

44:12 100 gig and this challenge is the the total power envelopes available in the data center facilities

44:18 and so the plots on the right um the one on the bottom uh we generated based on some internal

44:23 data um showing over the generations of the server server length bandwidth or you

44:28 can think of it it really is the mac rates um the uh the power uh

44:34 the relative power of the networking devices versus the servers um uh is ever encroaching and once you

44:42 scale to um 800 gig uh it's it's up around 20 of the total

44:48 data center power being consumed just by moving bits not by the revenue generating servers um

44:55 likewise the plot above it is facebook's data on uh you know without calling out

45:01 the specific generations but showing the same phenomenon that as you go to

45:06 quote next generation um the network power is a big slice of that pie and that's bad

45:13 news when you're um uh your services is cloud right

45:18 um so as i mentioned it's using power that that could be generating revenue it's it's

45:23 uh capacity that is lost uh on the servers uh to the servers and uh

45:30 it costs money uh obviously it's just burning power that that could otherwise be done more efficiently and it's not

45:36 green it doesn't um support our our own sustainability initiatives internally to

45:42 to um build things this way and so you know looking at it like that it

45:47 makes the trajectory of a transition transition to anything beyond 400 gig um you know doing things the same way we

45:55 are now up here all but impossible uh or at least very inefficient um so um looking at possible solutions

46:05 um how do we um really start to chip away at this problem one is photonic innovation and a big

46:12 piece of that um we believe at microsoft uh is uh co-package optics

46:18 and coke package optics if you're not aware is just the notion of uh um packaging together uh

46:25 the switch silicon or you know any type of system on chip really with uh the photonic engines that are um

46:33 primarily based on silicon photonics on a common substrate and by you know this is talked about a

46:39 lot in the industry but by moving the the optical engines closer to the uh the

46:45 asic itself you can reduce the the trace length and and certainty's power and really

46:51 uh drive efficiencies and power and you know what we're seeing is that compared to kind of state of the art plugable

46:57 technologies when you're looking at link efficiency uh the power efficiency of 25

47:02 picojoules per bit on average for a plugable going on board

47:08 like a kobo approach you know improves that by moving the optics closer and eliminating re-timers but really

47:14 uh you see some great efficiency gains by um um co-packaging you know almost uh 30 to

47:21 50 reduction in in the uh the power that's dissipated by the optics

47:26 and even with an eye towards second gen co-package optics where some additional optimizations could perhaps be made

47:33 really driving that down into the sub five people joule per bit range so we we see that as a big uh a big area

47:41 but we also um have to attack it in other ways on for example um collapsing tears as rada

47:49 touched on earlier by using multi-home nics on our servers fanning out

47:55 horizontally and bypassing tours and i'll have a little more detail on that later um

48:00 having simplified forwarding requirements on our switching asics uh you know less fib less buffer equals

48:08 less work and less power um additional integration things like

48:13 integrating the encryption directly on the switching a6 and lastly uh some types of alternate

48:18 cooling you know i showed liquid cooling uh specifically immersion cooling there but we're actually studying uh several

48:25 different approaches to that at microsoft but but the big takeaway is we can't

48:30 just keep scaling link bandwidths like what i showed going from 100 gig to 400

48:36 gig that's the last time um we can do it like that and continue building things

48:41 the same way next-gen systems are going to require all of the above um

48:46 additionally i wanted to just provide this retrospective here if you look at the rollout of our what

48:53 i'll call our 40 gig ecosystem or 100 gig ecosystem and then kind of projecting forward into

48:58 400 gig and you overlap those with ga dates for you know i'll just say optical modules

49:06 at 100 gig 200 gig 400 gig and 800 gig you can see that we tend to move pretty

49:12 slowly we're not at the bleeding edge um deploying things right when they are available

49:17 and once we deploy an ecosystem we sit in that ecosystem for a good

49:23 several years before we start ramping that down and ramping up a new technology

49:29 and just so just looking at the the turnover and churn in our networks we we skipped

49:34 entirely any type of 200 gig solution we're just starting to to look at

49:41 rolling out and deploying our first 400 gig ports in the early part of 2021 the first half

49:47 but won't be really deploying 400 gig in earnest until 2022 and we expect that to take some

49:53 time to uh reach full maturity in our own internal data center ecosystems

49:58 so by the time an 800 gig solution is ready as others have called it out we won't be ready uh at microsoft so

50:06 it's just you know again helpful historical context to to to kind of frame um

50:13 how real is the need for 800g at least for us um and especially in light of the

50:20 power challenges uh we mentioned before um so i want to revisit rada's slide here who

50:27 i guess he stole from facebook but then added uh this table down here and i'll

50:32 say um just that he's he's spot on um with the uh with the radix argument but i would

50:39 argue he didn't quite take it far enough when we project out to 51.2 t

50:45 switches um if you look at the radix there and consider that the lane speed

50:50 uh of those is natively 100 gig uh that gives you you know 512 radex on

50:57 that switch and that allows you to build a network that services over 100 000 servers uh in a

51:04 data center um by um uh leveraging the 100 gig um building

51:12 blocks so 100 gig being the native electrical lane speed is really the simplest uh network that you can you can build uh

51:20 it's one mac uh equals one electrical lane equals one optical lane and the idea here being that you you you

51:28 fan out these um these hundred gig interfaces horizontally so that you're able to to build

51:36 you know a highly power efficient and very flat uh network kind of the errata was getting it um and

51:43 and you know 100 000 servers is nothing to shake a stick at that's that's effectively a 100 megawatt data center

51:49 that's just massive capacity um and you know so i want to show here

51:55 just how that looks at least for this first tier of the network where again today this is revisiting what we do with the

52:01 dac cables going to the tier 1 devices using aocs with a tor in between

52:07 um what we envision doing when we're enabled uh by you know these

52:14 advances in photonics including cpo and and um um

52:19 the 400 gig uh availability for for the servers we would have here at the server a 400

52:27 gig dr4 interface and that would fan out across these tier 1 switches

52:35 um providing that that horizontal scaling that i was referring

52:40 to um and in that way allow you to completely and

52:45 eliminate the tour at the top of rack and just have some type of patch panel or shuffle panel as a d mark

52:51 at the top of rack now and you know when you when you go and compare

52:57 the efficiencies of the old way of doing things on the left um with you know this what i i'm we're

53:04 calling toward bypass um with 100 gig lanes um you can see

53:09 there's a host of advantages um one being the failure domain you know a tour is a single point of

53:15 failure for an entire rack and again if you're serving content maybe that's not a big deal but if

53:20 you're uh providing um compute as a service uh and uh that that could be huge right

53:28 um whereas if you're multi-homing your nick you have that four-way resiliency or i would say

53:34 in-way resiliency depending on the number of uh uh interfaces or planes on your server

53:42 i'm looking at the switch asic count uh four to eight x um in this case when compared to the

53:48 number of switch asics it would take to build this network here um space and power

53:54 again uh reduce space obviously saving the space of the tour and then a third the power um primarily

54:01 by um uh reducing the number of switch a6 but also running more efficient

54:06 uh co-package optics on on the tier one devices uh uh in terms of switch radix um

54:14 on these especially when you get to these high radix switches that are that are um coming up in the future you

54:19 you can't leverage the higher radix which is fully you leave stranded ports on your tors here because again we're

54:26 just a few tins of servers per rack but some of these uh some of these um

54:31 switch chips are 64 or 128 radix um whereas uh here um by fanning out

54:39 again and leveraging the full 100 gig lanes on on these switches you you um can leverage the full radix of

54:47 all the switches in the tier ones on the switch chips um again uh uh i mentioned before

54:54 that we typically run some over subscription going from the tour to the tier one um whereas uh in this scenario it's

55:02 fully non-blocking or i guess non-oversubscribed um and then lastly

55:08 reach limitations of dac cables and eocs are looming whereas you know common

55:14 optical solution like a dr4 based interface would allow you you know

55:19 up to 200 kilometers or more of reach um so that goes away

55:25 so um apologies if i went a little bit over in time um but it's a bit of a nuanced

55:31 discussion so i wanted to to try try and uh be sure i covered it well

55:36 um but the the power um uh in summary the power is the main limiter for

55:41 for beyond 400 gig data centers uh as we see it um we can't continue to simply scale the

55:49 link bandwidths while building the networks exactly as we do today um and additionally the historical

55:56 ecosystem life cycles would indicate that we won't be ready for 800 gig when the industry is

56:04 and really a 32 by 100 gig cpo will suit our needs better um 100 gig electrical lanes will be a

56:12 foundational building block for power efficient data center designs for the foreseeable future

56:18 just that that idea of what i was showing with the tor bypass and and lastly um future data center

56:25 networks are going to require a combination of photonic innovation which we lump cpo

56:30 into that optimize network architectures and advanced hardware implementations

56:37 and i think that's it so thanks for your time

56:42 all right um thank you mark uh we really appreciate that especially

56:47 that um you're you're going green and imperative i think many people across the industry

56:53 are really uh working hard on that problem um so i also wanted to say that these

57:01 slides will be available to everyone we'll send them out in an email to all the uh registered

57:06 attendees probably within a couple of days um at this point we're going to take questions for about 15 minutes we have a

57:13 lot of great questions that have come in um and we'll you know try to answer as many of them as we can get to

57:19 if we don't um you know we'll follow up on that as well so first brought up um

57:26 the question of power in dsps um that's that's the first one we're hearing some considerations for pam5 or

57:33 pam6 are there inefficiencies expected in dsps when

57:38 pam signaling goes up to pam 8 or 16. so

57:45 um this is a two-part answer i'll get elia to chime in as well

57:51 so the when you go to a higher complexity in modulation format uh

57:58 you're working against two things one is your snr is going up exponentially uh

58:03 the good old shannon and secondly your linearity requirements are going up

58:11 uh just as bad um so yes if you go to uh a higher order

58:19 modulation format let's let's call it for what it is uh you suffer snr uh the game you're

58:24 trying to beat and you consume a lot more power and the tia and driver trying to linearize them and that's uh

58:32 you know partially one of the reasons why we think uh pam4 um is is

58:39 a good compromise ilya yeah i think you summarized it really nicely so

58:45 so just to make it more quantitative you know when you go to a pam six versus

58:50 a pan four the the eye opening for your pam six is is decreased uh by a factor of five

58:57 thirds so that's already a penalty in snr of like uh about four db or so also you know higher order

59:04 modulation is more sensitive to to many impairments like for example rin i mentioned earlier

59:10 uh rin is a is a very important impairment in in lasers and it impacts higher order

59:17 modulation more because it's a it's a level dependent noise uh so pam6 suffers more from rin and you

59:23 actually saw that in my plots where um pam6 kind of exhibited an air

59:28 floor at higher analog bandwidths uh and and other impairments isi

59:34 it's it's more sensitive to isi so there are a lot of advantages uh if you can get a high enough uh

59:39 analog bandwidth uh for pam4 it's definitely uh there's a big advantage there for robustness

59:45 for implementation purposes it's more complexity many advantages and also backward

59:51 compatibility of course all right great another question for

59:57 rada on the symbol transmit rate is it still capped around 25 gigabyte owing to

1:00:02 limitations on direct modulation of lasers okay this is an interesting question for

1:00:08 dmls uh it is probably capped at about gigabyte although there are

1:00:16 announcements from several vendors of uh dmls operating at uh at the 53 gigabyte

1:00:25 uh data rate but largely at 53 gigabyte or 100 gig per lane

1:00:30 dr4 fr4 implementations have mostly been uh been eml

1:00:38 and silicon photonics yes there seems to be a natural stop at dmls i wouldn't count

1:00:45 them out but but there seems to be okay um ilya what kind of specs are

1:00:50 required for the adc to enable the dsp for the pam4 receiver

1:00:56 yeah i mean that's a that's a great question so so the adc is a key component uh of uh of the system um

1:01:04 i went over the the requirements on analog bandwidth and i think those are valid for the adc

1:01:09 um so so you need uh analog bandwidth of better than 45 gigahertz for pam4

1:01:15 ideally better than 50 gigahertz um and of course there are many other specs

1:01:20 like specs on uh you know enob you know various uh

1:01:27 distortions um and it really probably would require a separate presentation and discussion just on that

1:01:33 but i think the key spec you know analog bandwidth i would say around 50 gigahertz okay um could you also talk a little bit

1:01:40 about the latency um the latency of you know 300 nanoseconds is it too large for 800g in a 2k

1:01:48 application and i guess latency of of the fact in general here yeah

1:01:55 that's another very good question uh has a lot of implications and trade-offs so um you

1:02:00 know when designing facts there's kind of a fundamental um trade-off between

1:02:06 uh coding gain uh effect overhead and effect latency uh so it's it's very

1:02:12 difficult to to uh to have a very large coding gain very low overhead and very low latency

1:02:18 you have to make some trade-offs um so we believe you know to get the kind of coding gain we need

1:02:25 uh we um we try to play as much we try to optimize that that parameter

1:02:31 space as much as we can you know the overhead keeping that overhead at around six percent not not more than six percent and keeping

1:02:37 that latency less than 300 nanoseconds um 300 nanoseconds i think is a

1:02:43 reasonable number um if you think about a two kilometer link uh you know the latency that just

1:02:48 the propagation delay of that link is is you know microseconds so i think uh that latency is still less much less

1:02:55 than the propagation delay uh but it really depends a lot on the application so if we're talking about it like an ai application where

1:03:02 uh you know the the the links are are very short maybe in the ai cluster then you might

1:03:08 need even lower latency and and there are ways to design these facts that i talked about the concatenated effects to have even lower

1:03:14 latency so in general do you think these facts are going to continue to evolve and we're going to see them

1:03:20 you know in the next and the next generation after that i think so yeah i mean i think fec is a

1:03:27 is a critical um technology to enable uh you know higher speeds

1:03:32 to enable lower cost optics uh and i think even maybe lower power i i think

1:03:38 it's it's a consideration there because you know um so um i think it's going to

1:03:43 evolve we're going to have a discussion on fact for sure in the ieee when we talk about 800 gig

1:03:49 uh so yeah it's going to continue to evolve and it's a very interesting design space you know

1:03:54 somewhat different from long haul where you know we're just trying to push the fact to its lit no to the shannon limit

1:04:00 uh uh in uh in data center there are there are interesting trade-offs with latency and power so

1:04:06 uh it's uh it's an exciting field actually for fact designers okay um so

1:04:12 a lot of these questions i think have to do with engineering trade-offs and you know this this next one is uh

1:04:18 looking at pam6 uh 90.7 gigabyte compatible with 75

1:04:23 gigahertz dwm spacing well in data center wdm the the channel

1:04:30 spacing is actually much wider than than the dwdm

1:04:36 systems that are used for long haul and metro so inside data centers we we typically use what's called cwdm

1:04:43 and where the channel spacing is i think it's around four nanometers uh for cwdm so it's really not an issue

1:04:50 uh uh for for for the you know the spectral um bandwidth of pam6 um uh

1:04:57 so i think uh it's not an issue for the for the type of channel spacings we use for inside data center

1:05:04 okay um uh another one for you ilya and that's on mlsd how many taps

1:05:12 are you assuming in your mlsd and can you comment on the complexity and power

1:05:19 um you know i don't want to talk too much about the implementation of the mlsd except to say that that we are designing

1:05:26 um you know kind of um i would say innovative mlsd architectures for very low power

1:05:33 low complexity uh and it's possible to do that there are there are a lot of trade-offs uh for example pem6 versus

1:05:40 pam4 you know pem6 would require more complicated uh mlsd uh you know so-called

1:05:46 trellis architecture um and uh you know there are a lot of implications

1:05:51 uh so i probably don't want to get too deep into it but just to say that the type of mlsds that we design

1:05:57 are low power mlsds for data center applications okay uh does the additional dfe

1:06:04 tap provide improvements because of equalization without noise amplification or are there other reasons yeah so the

1:06:12 so uh so to think about the dfe first consider just the ffv by itself right so the ffv by itself

1:06:19 can in principle invert the channel and and completely correct uh for the bandwidth limitation but by

1:06:24 inverting the channel the ffe will also enhance noise and that's where the ffe runs out of

1:06:30 juice when you have a very severe bandwidth limitation the dfe helps by canceling the post

1:06:37 cursor isi without noise enhancement so it makes the job of the ffe easier less noise enhancement and hence

1:06:44 better tolerance to to bandwidth limitations and we saw that in the in the um kind of the simulations that i showed

1:06:50 when we added the dfp both pam-6 and pam-4 but especially pm4 because of its higher baud rate

1:06:56 uh benefited a lot from that in terms of its uh robustness to bandwidth limitations okay so one more clarification on sort

1:07:04 of on that line is there any particular reason why you didn't investigate m6 with ffe

1:07:10 and mlssd i think yes i touched on that already so so um

1:07:17 there's several reasons first of all pam6 because of its lower baud rate doesn't really benefit much

1:07:22 from mlsd as you go to say as long as the analog bandwidth is say greater than 40 or 45 gigahertz

1:07:29 you don't get much benefit for pem6 but on the other hand the complexity of the mlsd scales up very rapidly uh

1:07:37 for pam6 so these are the reasons why it's not it's not really useful i think

1:07:43 for pem6 okay a question for mark uh considering power and thermal performance

1:07:51 would there be some case to adopt um cpo at 800g somewhere in your topology at least to

1:07:58 test it out um yeah i guess um

1:08:05 cpo chiplets based uh in 800 gig denominations is kind of what's

1:08:10 in scope there like i mentioned an 800 gig mac itself may not be a good fit

1:08:16 for for what we're looking to do anytime soon but um uh but yeah as a test vehicle or

1:08:22 something we could see you know or like proof of concept um and prototyping we could see you know

1:08:29 leveraging some of the 800 gig photonic developments uh in a cpo

1:08:34 application you know to to just get some hands-on time with the technologies okay and how about kobo

1:08:42 in general um could you be testing that as an opportunity of reducing power and cost yeah

1:08:50 i was just typing a response to that one um yeah so kobo um as of now there are some

1:08:57 you know tinted programs inside microsoft that uh are evaluating kobo and some of

1:09:04 those applications um but for our main stream networking uh

1:09:10 deployments there is not currently a plan to um to roll out kobo we will be

1:09:17 focused on transitioning to co-package optics um in the 2023-2024

1:09:24 time frames there was also a question on the timelines so that that's answering that one okay um you had another question about

1:09:31 how two servers in the same rack can communicate with each other or in the case of a tour bypass scheme

1:09:39 yeah they they just uh through the tier one switch now so

1:09:44 there's not a tour anymore but the the ship uh the tier one would be able to talk to

1:09:50 all the servers um in the same same row in the same rock

1:09:56 okay um let's see i i guess there's a there's a still kind

1:10:01 of a wider question here that i'm going to uh give back to radha and ilya and

1:10:06 that's kind of to see if you guys together or individually want to give up an answer to mark um

1:10:14 to basically uh you know address those power power

1:10:20 concerns okay um as as mark said uh power is the

1:10:27 elephant in the room uh there is no denying uh if you go to higher and higher baud

1:10:33 rates uh is simply we are consuming uh more power

1:10:39 and mark has um very elegantly pointed out uh why uh core packaging yeah essentially

1:10:46 reducing the the 30's reach between the switch and the and the package uh would reduce

1:10:54 it and in this case is also advocating for a lower data rate on our side

1:11:00 we have a multi-pronged approach i'll let elia comment as well and elia had said a number of times a

1:11:07 low paraffaq low power mlsd so the first approach is just just

1:11:13 architecture there's several architecture um unfortunately a lot of which are

1:11:20 proprietary that leads to a lower optimal power consumption secondly

1:11:28 uh this is this never ending march uh hopefully never ends on um

1:11:34 smaller cmos notes our current generation cmos a6 7 nanometer going to five

1:11:43 and uh going to five it's not so much on density uh the chip on the the dsp areas

1:11:49 don't scale so much but the power goes down um i have to go back and take a look uh

1:11:56 how much uh the power reductions we're getting between um seven and five but we are uh designing

1:12:02 chips at the five nanometer node uh that that is one and uh these will be

1:12:09 the two uh approaches that we are taking to get power reduction down and then mark focused on power dissipation which is a

1:12:16 flip side a power consumption if we didn't consume so much you don't have to dissipate or are cool as much uh but um ilia

1:12:26 yes i i agree with you radha so so for us like in the dsp and architecture team in infi low power is really kind of a

1:12:34 passion for us so with every generation of technology you know the some of the key innovations that we work

1:12:39 on are lowering the power of the dsp architecture and effect uh i

1:12:44 think that is actually one of the big differences between say long haul and data center is uh

1:12:50 the in data center uh dsp we really focus a lot on low power architectures

1:12:57 and there's a lot that was done that we've done but we cannot publish because it's proprietary implementations

1:13:03 but uh i mean there are uh thing you know you can read up read about in the literature various approaches for for low power

1:13:09 architectures um that uh that uh you can think about now the other thing is um uh you know with

1:13:17 each generation we can also reduce power by going to um a lo you know smaller uh node right uh

1:13:24 cmos node that also helps especially for uh dsp heavy architectures right

1:13:29 they scale um scale well to lower power at uh as we go to a smaller node so that

1:13:35 that should help as well okay all right well fantastic and with that we're just about out of time here so we're not

1:13:42 going to be able to take any more of these great questions that have come in but we will do our best to answer them offline

1:14:04 you

Watch