Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

SGLang Router and Worker Behavior

In SGLang-based tests, each AMD Instinct MI300X system runs SGLang in data-parallel serving mode. The selected LLM is loaded once per GPU, resulting in eight local GPU-backed model instances per server. Each MI300X server runs one SGLang Router, which receives incoming inference requests and distributes them across local SGLang worker processes.

The SGLang Router listens for incoming inference requests on the service port used by the test, shown as port 30000 in the test diagram. After receiving a request, the router forwards the request internally to one of the local GPU-backed workers.

Each worker is associated with one GPU and hosts one local model instance. In the example topology, workers listen locally using loopback addressing and ports in the 3100X range, where X represents the local worker index.

Worker traffic is local to the MI300X server and is not frontend fabric traffic. The frontend fabric carries requests to the SGLang Router; after that, the router distributes each request internally. This distinction is important because the frontend fabric validation focuses on the client-to-router or Envoy-to-router traffic path, not on loopback traffic inside the inference server.

Table 9: SGLang Router and Worker Behavior

Layer Function
GenAI-Perf Generates inference benchmark traffic toward either a direct inference endpoint or an Envoy endpoint.
Envoy Load Balancer Optionally distributes incoming requests across multiple MI300X inference servers.
SGLang Router Receives inference requests on the MI300X server and routes them to local GPU-backed workers.
SGLang Workers Run model instances on GPUs and process inference requests.
GenAI-Perf Generates inference benchmark traffic toward either a direct inference endpoint or an Envoy endpoint.

Table 10: Example SGLang Worker Mapping

Worker Local Address Local Port
Worker 0 127.0.0.1 31000
Worker 1 127.0.0.1 31001
Worker 2 127.0.0.1 31002
Worker 3 127.0.0.1 31003
Worker 4 127.0.0.1 31004
Worker 5 127.0.0.1 31005
Worker 6 127.0.0.1 31006
Worker 7 127.0.0.1 31007