dse.proto
A low level data model and APIs for describing an Keysight AI/ML experiment
Must immediately follow header comments (no blank lines)
Request to abort a trial
Currently, it will try to abort only the current running config, which is already stored in DSE
Contains information about a trial abort
| Field | Type | Label | Description |
| state | AbortState | Return the state |
|
| message | string | optional | A message containing relevant information for the abort. |
Wrapper message for gRPC response.
| Field | Type | Label | Description |
| collective_implementations | common.CollectiveImplementation | repeated | List of collective implementations |
A set of RDMA packet match conditions.
| Field | Type | Label | Description |
| infrastructure_profile | profiles.InfraProfile | The infrastructure profile to use for the binding. |
|
| prev_binding | bind.Binding | optional | previous binding (if any), useful for incremental changes to existing bindings |
| platform_regions | bind.PlatformRegion | repeated | Assigns platforms to different regions of the infrastructure. DSE server will raise an error if this is non-empty and feature flag onearm is not being used. |
| platform | common.PlatformType | The platform type. Typically obtained from CreateBinding() |
Response containing a list of trial reports
| Field | Type | Label | Description |
| binding | bind.Binding | The created binding object |
Request to get diagnostic files for specified trials.
| Field | Type | Label | Description |
| result_ids | string | repeated | result ids can be obtained from the trial_report, provided by ListTrialReports api call If result_ids is empty, then it will collect diagnostics for the currently configured trial list of result ids to include in the diagnostic file |
Response containing a list of trial reports
| Field | Type | Label | Description |
| filepath | string | filepath to the archived log files (zip, tar, etc.) |
|
| url | string | URL to download the diagnostic file |
Message in a stream of updates returned while running a trial.
The client can print the log messages returned by each update to
have a live indication of progress.
| Field | Type | Label | Description |
| log_messages | string | Sequence number of the log message. |
|
| timestamp | google.protobuf.Timestamp | The timestamp signifies the time at which the log message was generated. |
|
| severity_level | SeverityLevel | The severity level of the log message. |
|
| component_name | string | The name of the component emitting the log message. |
Trial is a message that contains all the required messages used to define a trial run.
| Field | Type | Label | Description |
| workspace | WorkspaceSpec | The workspace that the trial will be stored under. If the workspace does not exist, it will be created. |
|
| tags | string | repeated | A list of tags associated with this trial. |
| platform | common.PlatformType | The type of platform to run the workload over |
|
| nccl_config | common.NcclConfig | configuration settings specific to nccl |
|
| tcp | common.TcpTransport | TCP transport configuration |
|
| rocev2 | common.Rocev2Transport | RoCEv2 transport configuration |
|
| falcon | common.FalconTransport | BEGIN PRIVATE Falcon transport configuration |
|
| kccb | kccb.Config | KCCB configuration |
|
| workload_replay | workload_replay.Config | Workload replay configuration |
|
| binding | bind.Binding | optional | typically obtained from CreateBinding() |
| trial_meta | TrialMeta | hold version info |
|
| impairments | impairment.Impairments | List of impairment metrics |
holds the model version and maybe more in the future
| Field | Type | Label | Description |
| model_version | string | The unique identifier for the trial |
|
| is_readonly | bool | The is_readonly is set if the config cannot be used to run a trial. E.g. part of the config not exposed by the UI is removed before saving. The configuration can be inspected in the UI, but a trial will fail to run. |
Contains information about a trial that has run, is running, or has yet to be started.
| Field | Type | Label | Description |
| timestamp | google.protobuf.Timestamp | The timestamp signifies the time at which a trial run was started. In storage, each trial directory is named according to the following format: ISO 8601 formatted timestamp YYYY-MM-DDTHH:mm:SS.ms:timezone |
|
| workspace | string | workspace name |
|
| path | string | The storage path of the trial directory. |
|
| tags | string | repeated | Tags associated with the trial. |
| description | string | Description of the trial |
|
| state | TrialState | Stores the current state of this trial. |
|
| system_tags | string | repeated | System tags associated with the trial. |
| end_timestamp | google.protobuf.Timestamp | The timestamp signifies the time at which a trial run was completed. |
|
| kccb_summary | app_common.SummaryTable | A table of nccl-like summary results |
|
| workload_replay_summary | app_common.SummaryTable | A table of workload replay summary results |
|
| kccb_artifacts | app_common.TrialArtifacts | A collection of artifacts generated by the trial that are saved in storage. |
|
| workload_replay_artifacts | app_common.TrialArtifacts | A collection of artifacts generated by the trial that are saved in storage. |
|
| use_report_v2 | bool | optional | A flag to indicate whether to use the v2 report format |
| info_report_v2 | storage_v2.TrialReportDetailInfo | optional | Detailed report information in v2 format |
Describes a workspace to be created/updated.
If the workspace already exists, the list of tags will be appended to that
workspace if they are not already attached to that workspace.
| Field | Type | Label | Description |
| name | string | Workspace name |
|
| tags | string | repeated | List of tags to allow filtering trial results |
State of an abort request.
| Name | Number | Description |
| ABORT_UNDEFINED | 0 | Undefined state |
| ABORT_INITIATED | 1 | The abort request has been initiated |
| ABORT_ERROR | 2 | An error occurred during the abort process |
Contains information about the severity level of a log message.
| Name | Number | Description |
| LEVEL_UNSPECIFIED | 0 | Unspecified severity level |
| LEVEL_DEBUG | 1 | Debugging information |
| LEVEL_INFO | 2 | Informational messages |
| LEVEL_WARNING | 3 | Warning messages |
| LEVEL_ERROR | 4 | Error messages |
| LEVEL_CRITICAL | 5 | Critical error messages |
State of a trial run.
| Name | Number | Description |
| UNSPECIFIED | 0 | Trial state not specified, invalid value |
| UNCONFIGURED | 1 | Trial is not configured |
| CONFIGURATION_IN_PROGRESS | 2 | Trial configuration is in progress |
| CONFIGURATION_SUCCESSFUL | 3 | Trial configuration completed successfully |
| RUN_IN_PROGRESS | 4 | Trial run is in progress |
| RUN_SUCCESSFUL | 5 | Trial run completed successfully |
| ERROR | 6 | Trial has encountered an error |
| ABORTED | 7 | Trial has been aborted |
| TERMINATED | 8 | Trial has been terminated |
| ABORT_IN_PROGRESS | 9 | Trial abort is in progress |
DSE service definition
| Method Name | Request Type | Response Type | Description |
| CreateBinding | CreateBindingRequest | CreateBindingResponse | create rank and physical bindings for use with a Trial object in ConfigureTrial/RunTrial |
| ConfigureTrial | Trial | RunLogs stream | ConfigureTrial sets up the trial based on the parameters provided. If the low-level config is not included, the server will use the high-level spec to generate the corresponding low-level config, which is returned as part of the response. |
| AbortTrial | AbortTrialRequest | AbortTrialResponse | Aborts the currently running trial and receive the trial report aborted. Returns an error if no Trial has been configured |
| RunTrial | .google.protobuf.Empty | RunLogs stream | Run the currently configured trial and receive streaming updates. Returns an error if no Trial has been configured |
| GetTrial | .google.protobuf.Empty | Trial | Returns the trial that is currently configured. If no trial is currently configured, the returned object will be uninitialized. |
| GetTrialReport | .google.protobuf.Empty | TrialReport | TrialReport contains state information (not started, in progress, successful, error) Pattern is modeled after this: https://cloud.google.com/apis/design/design_patterns#long_running_operations although it does not match completely. |
| GetTrialReportDetails | .google.protobuf.Empty | .storage_v2.TrialReportDetailInfo | Returns the trial report of the most recently run trial. If no trial has been run, returns an empty message. |
| GetCollectiveImplementations | .google.protobuf.Empty | CollectiveImplementations | Returns a list of all (CC operation, algorithmic implementation) pairs available on the DSE server. |
| GetDiagnosticFile | GetDiagnosticFileRequest | GetDiagnosticFileResponse | Request to get diagnostic files for specified trials. |
| Echo | .common.EchoRequest | .common.EchoResponse | test api |
common.proto
Common data models
Must immediately follow header comments (no blank lines)
Algorithm message specifies a choice of system provided Expanders or a user provided custom implementation
| Field | Type | Label | Description |
| system | AlgorithmType | A system supplied collective algorithm |
|
| custom | string | A path of format package.module.classname to a class that inherits from the keys_ai_ml_chakra.Expander. See the EXPANDERS.md in the keys_ai_ml_chakra package for details on how to create your own custom expander class. |
|
| flow_control_config | FlowControl | Configuration for flow control mechanism used by this algorithm. |
Information about the chassis being used for emulation
| Field | Type | Label | Description |
| address | string | Chassis IP Address or FQDN |
|
| port | string | Chassis port. Formats <front-panel-port> or <front-panel-port>.<fanout>. Examples: '1' or '1.4' |
Used to specify a (collective type, collective algorithm) pair.
The collective algorithm determines how a collective communication operation is expanded into
a set of peer-to-peer operations.
| Field | Type | Label | Description |
| type | keysight_chakra.mlcommons.CollectiveCommType | The type of collective communication operation. |
|
| algorithm | Algorithm | A system-provided or user-provided algorithm specification used to expand a collective communication operation into a set of peer-to-peer operations. |
Congestion control mechanisms
| Field | Type | Label | Description |
| ecn | ExplicitCongestionNotifications | ECN configuration |
|
| pfc | PriorityFlowControl | PFC configuration |
|
| dcqcn_rate_control | DCQCNRateControl | DCQCN rate control configuration |
Data Center Quantized Congestion Notification rate control settings
| Field | Type | Label | Description |
| enabled | bool | optional | Enable/disable DCQCN |
| alpha_factor | int32 | optional | Factor to update Alpha every update period (fixed point fraction of 2^10) |
| alpha_interval | int32 | Alpha update period (microseconds) |
|
| initial_alpha | int32 | Initial Alpha value |
|
| rate_after_first_cnp | int32 | Current and target rate limit set after first CNP (Mbps) |
|
| rate_decrement_factor | float | optional | Maximal ratio of rate decrease in a single event (percentage) |
| min_rate_limit | int32 | Minimal rate limit of the QP (Mbps) |
|
| rate_decrement_coefficient | int32 | The coefficient between Alpha and the rate reduction factor |
|
| rate_decrement_interval | int32 | The minimum time period between rate reductions (microseconds) |
|
| clamp_target_rate | bool | optional | If enabled, whenever a CNP is processed, the target rate will be updated to the current rate |
| rate_increment_interval | int32 | The time period between rate increase events (microseconds) |
|
| rate_increment_byte_counter | int32 | The sent bytes counter between rate increase events (64B) |
|
| rate_increment_threshold | int32 | The threshold of rate increase events for moving to next rate increase phase |
|
| additive_rate_increment | int32 | The rate increase value in the Additive Increase phase (Mbps) |
|
| hyper_rate_increment | int32 | The rate increase value in the Hyper Increase phase (Mbps) |
|
| time_between_cnps | int32 | Minimal time between two consecutive CNPs sent (microseconds) |
A request to echo a simple message
| Field | Type | Label | Description |
| message | string | The message to echo |
The echo response from a service
| Field | Type | Label | Description |
| message | string | The echoed message |
ECN configuration
| Field | Type | Label | Description |
| cnp_dscp | int32 | optional | DSCP of CNP packets |
| data_ecn_bits | EcnBits | optional | Configures the ECN bits for data packets |
| control_ecn_bits | EcnBits | optional | Configures the ECN bits for control packets; eg RoCEv2 ACKs |
| cnp_ecn_bits | EcnBits | optional | Configures the ECN bits for CNP packets |
BEGIN PRIVATE
| Field | Type | Label | Description |
| rdma_message_size | int32 | (Maximum) RMDA message size in Bytes |
|
| qps_per_rankpair | int32 | Number of Queue Pairs per rank pair |
|
| qp_negotiation | RoCEv2QPNegotiationMethod | Queue Pair Negotiation method |
|
| verb | RDMAVerb | RDMA verb to use for data transfers in collective communication operations |
|
| tcp_store_host | string | TCPStore hostname or IP address (only applicable when qp_negotiation is METHOD_TCP_STORE) |
|
| tcp_store_port | uint32 | TCPStore port number (only applicable when qp_negotiation is METHOD_TCP_STORE) |
Used to specify details needed to enable flow control feature
| Field | Type | Label | Description |
| enable_flow_control | bool | Enable flow control mechanism where receiver grants credits to sender to control the data flow. |
|
| max_inflight_credits | uint32 | Maximum number of credits allowed in flight. |
|
| compute_delay | uint32 | Delay between data reception and credit transmission (microseconds) |
|
| credit_distribution | FlowControlDistributionType | Determines how credits are distributed across streams. One credit can either unblock one stream at a time or all streams simultaneously |
IPv4 Addressing configuration
| Field | Type | Label | Description |
| ip_address | string | IP address |
|
| ip_prefix | uint32 | IP prefix |
|
| ip_gateway_address | string | Gateway IP address |
IPv6 Addressing configuration
| Field | Type | Label | Description |
| ip_address | string | IP address |
|
| ip_prefix | uint32 | IP prefix |
|
| ip_gateway_address | string | Gateway IP address |
BEGIN PRIVATE
| Field | Type | Label | Description |
| custom_env_vars | IxperfConfig.CustomEnvVarsEntry | repeated | Custom Ixperf environment variables |
| Field | Type | Label | Description |
| key | string |
|
|
| value | string |
|
Layer 1 configuration
| Field | Type | Label | Description |
| speed_mode | SpeedMode | Speed, modulation and FEC mode |
|
| auto_negotiate | bool | Enable/disable auto negotiation |
|
| link_training | bool | Enable/disable link training |
|
| ieee_defaults | bool | Enable/disable IEEE Defaults. This setting takes precedence over auto-negotiation and link training |
NCCL configuration parameters
| Field | Type | Label | Description |
| custom_env_vars | NcclConfig.CustomEnvVarsEntry | repeated | Custom NCCL environment variables |
| Field | Type | Label | Description |
| key | string |
|
|
| value | string |
|
Network Interface Card settings
| Field | Type | Label | Description |
| ethernet_mtu | int32 | Ethernet Maximum transmission unit |
|
| ipv4_addressing | Ipv4Addressing | IPv4 Addressing specifics |
|
| ipv6_addressing | Ipv6Addressing | IPv6 Addressing specifics |
|
| qos | Qos | optional | BEGIN PRIVATE Quality of service |
| congestion_control | CongestionControl | optional | END PRIVATE Congestion control |
| packet_capture | PacketCapture | optional | BEGIN PRIVATE Packet capture END PRIVATE |
| mac_address | string | optional | MAC address |
| vlan | Vlan | optional | VLAN Tags |
| roce_transport_settings | Rocev2TransportSettings | ROCEv2 Transport specific settings |
BEGIN PRIVATE
Packet capture configuration
| Field | Type | Label | Description |
| enabled | bool | Enable/disable packet capture |
|
| capture_max_file_size | uint32 | optional | Size of the capture file size in bytes. |
| buffer_full_action | BufferFullAction | optional | What to do when the capture buffer is full |
| packet_slice_size | uint32 | optional | Size of each packet slice in bytes. If set to 0, the entire packet is captured |
PFC configuration
| Field | Type | Label | Description |
| enabled | bool | optional | Enable/disable Priority Flow Control |
| priorities | int32 | repeated | List of priorities |
BEGIN PRIVATE
Quality of Service configuration
| Field | Type | Label | Description |
| priority_trust_mode | PrioTrustMode | Pirority Trust mode |
|
| map_dscp_to_prio | Qos.MapDscpToPrioEntry | repeated | DSCP to priority mapping |
| map_prio_to_traffic_class | Qos.MapPrioToTrafficClassEntry | repeated | Priority to traffic class mapping |
| Field | Type | Label | Description |
| key | int32 |
|
|
| value | int32 |
|
| Field | Type | Label | Description |
| key | int32 |
|
|
| value | int32 |
|
RoCEv2 Transport configuration parameters
| Field | Type | Label | Description |
| rdma_message_size | uint32 | (Maximum) RMDA message size in Bytes |
|
| qps_per_rankpair | int32 | Number of Queue Pairs per rank pair |
|
| qp_negotiation | RoCEv2QPNegotiationMethod | Queue Pair Negotiation method |
|
| verb | RDMAVerb | RDMA verb to use for data transfers in collective communication operations |
|
| tcp_store_host | string | TCPStore hostname or IP address (only applicable when qp_negotiation is METHOD_TCP_STORE) |
|
| tcp_store_port | uint32 | TCPStore port number (only applicable when qp_negotiation is METHOD_TCP_STORE) |
|
| retx_retry_interval_ms | int32 | optional | The AckTimeout for RoCEv2 protocol |
| retx_retry_count | int32 | optional | The RetransRetryCount for RoCEv2 protocol |
| max_retry_on_rnr_nak | int32 | optional | Triggers retransmission when RNR NACK is received |
| ack_request_interval | int32 | optional | Request ACK after every N packets |
| reuse_qps | bool | optional | Enable/disable reuse of Queue Pairs (QPs) for RDMA messages between a rank pair |
| support_rx_reordering | bool | optional | Enable/disable the Rx Reordering |
RoCEv2 Transport specific settings
| Field | Type | Label | Description |
| ack_dscp | int32 | optional | Configures ACK DSCP value for RoCEv2 protocol |
| nack_dscp | int32 | optional | Configures NACK DSCP value for RoCEv2 protocol |
| data_dscp | int32 | optional | Configures DSCP for all data traffic |
Information about servers being used for emulation
| Field | Type | Label | Description |
| address | string | Server IP address or FQDN |
|
| nic_interface | string | Server NIC interface |
TCP Transport configuration parameters
VLAN Configuration
| Field | Type | Label | Description |
| enabled | bool | Enable/Disable VLAN |
|
| vlan_tags | VlanTag | repeated | Currently one vlan tag is required. |
VLAN Configuration
| Field | Type | Label | Description |
| priority | int32 | VLAN Priority |
|
| vlan_id | int32 | VLAN ID |
Algorithm message specifies a choice of system provided Expanders or a user provided custom implementation
| Name | Number | Description |
| ALGO_UNSPECIFIED | 0 | Algorithm type not specified |
| ALGO_ALL_TO_ALL_PARALLEL | 1 | All To All collective with chunk transfer in parallel |
| ALGO_ALL_TO_ALL_PXN | 2 | All To All collective that leverages message aggregation and rail-optimized topologies. |
| ALGO_ALL_REDUCE_UNIDIRECTIONAL_RING | 10 | All Reduce collective with chunk transfer in a unidirectional ring |
| ALGO_ALL_REDUCE_BIDIRECTIONAL_RING | 11 | All Reduce collective with chunk transfer in a bidirectional ring |
| ALGO_ALL_REDUCE_VECTOR_HALVING_DOUBLING | 12 | All Reduce collective using a vector halving doubling algorithm |
| ALGO_ALL_REDUCE_DOUBLE_BINARY_TREE | 13 | All Reduce collective with chunk transfer in a double binary tree |
| ALGO_ALL_GATHER_RING | 20 | All Gather collective with chunk transfer in a unidirectional ring |
| ALGO_REDUCE_SCATTER_RING | 30 | Reduce Scatter collective with chunk transfer in a unidirectional ring |
| ALGO_BROADCAST_PARALLEL | 40 | Broadcast collective with chunk transfer in parallel |
| ALGO_GATHER_PARALLEL | 50 | Gather point to point collective (all to one) with chunk transfer in parallel |
| ALGO_ALL_TO_ALL_PARALLEL_PP | 60 | All To All collective with chunk transfer in parallel MSCCL++ implementation |
| ALGO_ALL_REDUCE_UNIDIRECTIONAL_RING_PP | 70 | All Reduce collective with chunk transfer in a unidirectional ring MSCCL++ implementation |
Action to take when the packet capture buffer is full
| Name | Number | Description |
| BUFFER_ACTION_OVERRIDE | 0 | Override the current buffer and save its contents |
| BUFFER_ACTION_STOP | 1 | Stop the capture and dump the data to a file |
ECN field within an IP packet
| Name | Number | Description |
| ECN_UNSPECIFIED | 0 | ECN bits not specified |
| ECN_DISABLED | 1 | 00 - Non-ECT - Packets are marked as not ECN-capable |
| ECN_ECT1 | 2 | 01 - ECT(1) - ECN-capable transport |
| ECN_ECT0 | 3 | 10 - ECT(0) - ECN-capable transport |
Defines the way credits are distributed when flow control feature is used
| Name | Number | Description |
| FLOW_CONTROL_DISTRIBUTION_TYPE_UNSPECIFIED | 0 | Credit distribution type not specified |
| ALL_STREAMS | 1 | Credits distributed to all available streams. |
| SINGLE_STREAM | 2 | Credits distributed to a single stream at a time. |
Platform types available for executing workloads
| Name | Number | Description |
| PLATFORM_UNSPECIFIED | 0 | Platform type not specified |
| PLATFORM_KEYS_NCCL_TEST | 1 | Use Keysight orchestrated NCCL test to execute the workload |
| PLATFORM_KEYS_SW_AGENT | 2 | Use Keysight software agent to execute the workload |
| PLATFORM_KEYS_HW | 3 | Use Keysight hardware to execute the workload |
| PLATFORM_EXTERNAL | 10 | EXPERIMENTAL External Platform |
| SCP_SIMULATION | 50 | use 50 as base for Simulation Platform Use SCP Simulation platform to execute the workload |
BEGIN PRIVATE
Configures trust state
| Name | Number | Description |
| TRUST_UNSPECIFIED | 0 | Priority trust mode not specified |
| TRUST_DSCP | 1 | L3 trust, based on Differentiated Services Code Point Differentiated Services Code Point |
RDMA verbs supported for data transfers
| Name | Number | Description |
| VERB_UNSPECIFIED | 0 | RDMA verb not specified |
| VERB_WRITE | 1 | RDMA Verb Write |
| VERB_SEND | 2 | RDMA Verb Send |
Methods available for RoCEv2 Queue Pair negotiation
| Name | Number | Description |
| METHOD_UNSPECIFIED | 0 | Queue Pair negotiation method not specified |
| METHOD_KEYS_PROPRIETARY | 1 | Negotiate queue pairs using Keysight proprietary implementation |
| METHOD_TCP | 2 | Negotiate queue pairs using TCP |
| METHOD_TCP_STORE | 3 | Negotiate queue pairs using a TCP-based store implementation |
| METHOD_RDMA_CM | 4 | Negotiate queue pairs using RDMA communication management implementation |
| METHOD_AUTOMATIC | 5 | Use the default negotiation method for the selected platform |
Relevant only for RC QPs
| Name | Number | Description |
| TIMEOUT_655360_MU | 0 | Receiver Not Ready Timeout 655.36 milliseconds delay |
| TIMEOUT_10_MU | 1 | Receiver Not Ready Timeout 0.01 milliseconds delay |
| TIMEOUT_20_MU | 2 | Receiver Not Ready Timeout 0.02 milliseconds delay |
| TIMEOUT_30_MU | 3 | Receiver Not Ready Timeout 0.03 milliseconds delay |
| TIMEOUT_40_MU | 4 | Receiver Not Ready Timeout 0.04 milliseconds delay |
| TIMEOUT_60_MU | 5 | Receiver Not Ready Timeout 0.06 milliseconds delay |
| TIMEOUT_80_MU | 6 | Receiver Not Ready Timeout 0.08 milliseconds delay |
| TIMEOUT_120_MU | 7 | Receiver Not Ready Timeout 0.12 milliseconds delay |
| TIMEOUT_160_MU | 8 | Receiver Not Ready Timeout 0.16 milliseconds delay |
| TIMEOUT_240_MU | 9 | Receiver Not Ready Timeout 0.24 milliseconds delay |
| TIMEOUT_320_MU | 10 | Receiver Not Ready Timeout 0.32 milliseconds delay |
| TIMEOUT_480_MU | 11 | Receiver Not Ready Timeout 0.48 milliseconds delay |
| TIMEOUT_640_MU | 12 | Receiver Not Ready Timeout 0.64 milliseconds delay |
| TIMEOUT_960_MU | 13 | Receiver Not Ready Timeout 0.96 milliseconds delay |
| TIMEOUT_1280_MU | 14 | Receiver Not Ready Timeout 1.28 milliseconds delay |
| TIMEOUT_1920_MU | 15 | Receiver Not Ready Timeout 1.92 milliseconds delay |
| TIMEOUT_2560_MU | 16 | Receiver Not Ready Timeout 2.56 milliseconds delay |
| TIMEOUT_3840_MU | 17 | Receiver Not Ready Timeout 3.84 milliseconds delay |
| TIMEOUT_5120_MU | 18 | Receiver Not Ready Timeout 5.12 milliseconds delay |
| TIMEOUT_7680_MU | 19 | Receiver Not Ready Timeout 7.68 milliseconds delay |
| TIMEOUT_10240_MU | 20 | Receiver Not Ready Timeout 10.24 milliseconds delay |
| TIMEOUT_15360_MU | 21 | Receiver Not Ready Timeout 15.36 milliseconds delay |
| TIMEOUT_20480_MU | 22 | Receiver Not Ready Timeout 20.48 milliseconds delay |
| TIMEOUT_30720_MU | 23 | Receiver Not Ready Timeout 30.72 milliseconds delay |
| TIMEOUT_40960_MU | 24 | Receiver Not Ready Timeout 40.96 milliseconds delay |
| TIMEOUT_61440_MU | 25 | Receiver Not Ready Timeout 61.44 milliseconds delay |
| TIMEOUT_81920_MU | 26 | Receiver Not Ready Timeout 81.92 milliseconds delay |
| TIMEOUT_122880_MU | 27 | Receiver Not Ready Timeout 122.88 milliseconds delay |
| TIMEOUT_163840_MU | 28 | Receiver Not Ready Timeout 163.84 milliseconds delay |
| TIMEOUT_245760_MU | 29 | Receiver Not Ready Timeout 245.76 milliseconds delay |
| TIMEOUT_327680_MU | 30 | Receiver Not Ready Timeout 327.68 milliseconds delay |
| TIMEOUT_491520_MU | 31 | Receiver Not Ready Timeout 491.52 milliseconds delay |
Speed, modulation and FEC mode
| Name | Number | Description |
| UNSPECIFIED | 0 | Speed mode not specified |
| MODE_100GE_NRZ_RS_FEC | 9 | 100GE port speed with NRZ modulation and RS FEC |
| MODE_100GE_NRZ_NO_FEC | 10 | 100GE port speed with NRZ modulation and no FEC |
| MODE_100GE_PAM4_53G_KP4_FEC | 8 | 100GE port speed with 53G PAM4 modulation and KP4 FEC |
| MODE_100GE_PAM4_106G_RS_FEC | 7 | 100GE port speed with 106G PAM4 modulation and RS FEC |
| MODE_100GE_PAM4_106G_KP4_FEC | 6 | 100GE port speed with 106G PAM4 modulation and KP4 FEC |
| MODE_200GE_PAM4_53G_KP4_FEC | 5 | 200GE port speed with 53G PAM4 modulation and KP4 FEC |
| MODE_200GE_PAM4_106G_KP4_FEC | 4 | 200GE port speed with 106G PAM4 modulation and KP4 FEC |
| MODE_400GE_PAM4_53G_KP4_FEC | 3 | 400GE port speed with 53G PAM4 lanes and KP4 FEC |
| MODE_400GE_PAM4_106G_KP4_FEC | 2 | 400GE port speed with 106G PAM4 lanes and KP4 FEC |
| MODE_800GE_PAM4_106G_KP4_FEC | 1 | 800GE port speed with 106G PAM4 lanes and KP4 FEC |
Network interface speed types
| Name | Number | Description |
| SPEED_UNSPECIFIED | 0 | Speed type not specified |
| SPEED_100G | 1 | 100 Gigabit Ethernet |
| SPEED_200G | 2 | 200 Gigabit Ethernet |
| SPEED_400G | 3 | 400 Gigabit Ethernet |
| SPEED_800G | 4 | 800 Gigabit Ethernet |
app_common.proto
Common data types and artifacts produced during trial execution
Must immediately follow header comments (no blank lines)
Summary table (nccl-like), available after a trial has finished running
Uses schema since this table will be included in TrialReport for immediate
access in addition to being available in storage as a feather file.
| Field | Type | Label | Description |
| summary_rows | table.TableRow | repeated | Rows of summary data, one per collective operation |
| summary_schema | table.TableSchema | Schema defining columns in the summary table |
List of artifacts (absolute file paths) produced by a single trial run
and saved to persistent storage.
Table schema for each data table.
| Field | Type | Label | Description |
| configuration_files | string | repeated | Paths to configuration files used in the trial |
| summary_data_files | string | repeated | Paths to summary data files |
| port_metrics_data_files | string | repeated | Paths to port-level metrics files |
| fabric_metrics_data_files | string | repeated | Paths to fabric-level metrics files |
| pcap_files | string | repeated | Paths to packet capture files |
| custom_files | string | repeated | Paths to custom user-generated files |
| error_msg_files | string | repeated | Paths to error message log files |
| flow_metrics_data_files | string | repeated | Paths to flow-level metrics files |
| data_chunk_data_files | string | repeated | Paths to data chunk metrics files |
| impairment_metrics_data_files | string | repeated | Paths to impairment metrics files |
| qp_metrics_data_files | string | repeated | Paths to queue pair metrics files |
| datasize_breakdown_metrics_data_files | string | repeated | Paths to data size breakdown metrics files |
| iteration_metrics_data_files | string | repeated | Paths to per-iteration metrics files |
| emulation_summary_metrics_data_files | string | repeated | Paths to emulation summary metrics files |
| workload_details_data_files | string | repeated | Paths to workload detail files |
| packet_drop_metrics_data_files | string | repeated | Paths to packet drop metrics files |
| packet_reorder_metrics_data_files | string | repeated | Paths to packet reorder metrics files |
Types of artifacts produced during trial execution.
Classifies output files by their content and purpose.
| Name | Number | Description |
| ART_UNSPECIFIED | 0 | Artifact type not specified |
| ART_CONFIGURATION | 1 | Configuration files used for the trial |
| ART_SUMMARY_DATA | 2 | High-level summary data and statistics |
| ART_PORT_METRICS_DATA | 5 | Per-port performance metrics |
| ART_FABRIC_METRICS_DATA | 6 | Fabric-wide performance metrics |
| ART_PCAP | 7 | Packet capture files (pcap format) |
| ART_CUSTOM | 8 | User-defined custom artifact files |
| ART_ERROR_MSG | 9 | Error messages and diagnostics |
| ART_DATA_CHUNK_DATA | 10 | Data chunk transfer metrics |
| ART_FLOW_METRICS_DATA | 11 | Per-flow communication metrics |
| ART_IMPAIRMENT_METRICS_DATA | 12 | Network impairment application metrics |
| ART_QP_METRICS_DATA | 13 | Queue pair (QP) level metrics |
| ART_DATASIZE_BREAKDOWN_METRICS_DATA | 14 | Message size distribution breakdown |
| ART_ITERATION_METRICS_DATA | 15 | Per-iteration performance metrics |
| ART_EMULATION_SUMMARY_METRICS_DATA | 16 | Overall emulation summary statistics |
| ART_WORKLOAD_DETAILS_DATA | 17 | Detailed workload execution information |
| ART_PACKET_DROP_METRICS_DATA | 18 | Packet drop impairment metrics |
| ART_PACKET_REORDER_METRICS_DATA | 19 | Packet reorder impairment metrics |
dse_infra.proto
data models for dse infrastructures to be used by applications
Must immediately follow header comments (no blank lines)
Describes a fabric where the internal topology is not modeled.
| Field | Type | Label | Description |
| name | string | optional | The name identifier for this blackbox fabric instance. fabric name |
BEGIN PRIVATE
| Field | Type | Label | Description |
| host_count | uint32 | The number of hosts in the fabric. This value is used to determine the number of racks and switches needed in the fabric based on the tier configurations. |
|
| host_nic_speed | InfraBandwidth | Gap in field numbers lets us add this later without breaking compatibility. uint32 host_nic_radix = 6; |
|
| host_max_ports_up_to_peer | MaxPortsUpToPeer | The maximum number of ports from each host up to its peer switch. |
|
| host_wiring_schema | HostWiringSchema | The schema for the host wiring. |
|
| spine_wiring_schema | SpineWiringSchema | The schema for the spine wiring. |
|
| tier_configs | TierConfig | repeated | The configurations for each tier in the fabric. The first element in the list corresponds to the leaf tier, the second to the tier above it, and so on. |
Describes the overall fabric topology.
| Field | Type | Label | Description |
| blackbox | BlackboxFabric | non-detailed generic fabric (a blackbox) |
|
| clos | ClosFabric | Multi-stage Clos fabric |
|
| rackplane | RackPlaneFabric | BEGIN PRIVATE EXPERIMENTAL: network fabric featuring multiple planes within a rack |
contains the input parameters for the generic host builder
| Field | Type | Label | Description |
| name | string | optional | The name of the generic host. |
| npu_count | uint32 | The number of Neural Processing Units (NPUs) in the host. N.B. generic hosts have a 1:1 ratio of nics and gpus |
|
| custom_bandwidth_gbps | uint32 | The custom bandwidth for the generic host in Gbps. |
|
| nvlink_version_bandwidth | NVLinkVersionBandwidth | BEGIN PRIVATE The bandwidth of NVLink by version. |
Host message is a high level specification of a low level device
The Trial object returned by CreateTrial() includes a representation of each individual device.
| Field | Type | Label | Description |
| count | uint32 | The number of hosts in the fabric. |
|
| zionex | ZionexHost | Zionex host type. |
|
| generic | GenericHost | Generic host type. |
|
| rackplane | RackPlaneHost | EXPERIMENTAL |
|
| use_npu_interconnect | bool | END PRIVATE Whether to use NPU interconnect for the host. |
BEGIN PRIVATE
The bandwidth specification for infrastructure components.
| Field | Type | Label | Description |
| custom_gbps | uint32 | A custom bandwidth for the infrastructure in Gbps. |
|
| fabric_speed | common.SpeedType | The speed of the infrastructure fabric. |
infrastructure configuration comprising hosts and the network fabric
| Field | Type | Label | Description |
| host | Host | The type of host used in the infrastructure. |
|
| fabric | Fabric | optional | Specifies thes fabric topology used in the infrastructure. |
BEGIN PRIVATE
| Field | Type | Label | Description |
| port_count | uint32 | The maximum number of ports under a device. |
|
| use_half | bool | Whether to use half the maximum ports. |
BEGIN PRIVATE
| Field | Type | Label | Description |
| port_count | uint32 | The maximum number of ports from each host up to its peer switch. |
|
| use_max | bool | Whether to use the maximum number of ports. |
BEGIN PRIVATE
Represents an oversubscription ratio with the form
| Field | Type | Label | Description |
| downlink_factor | uint32 | The oversubscription ratio for downlink traffic. |
|
| uplink_factor | uint32 | The oversubscription ratio for uplink traffic. |
EXPERIMENTAL: A network fabric featuring multiple planes within a rack
Using a RackPlaneFabric requires the use of a RackPlaneHost.
The number of rack-switches for intra-rack communication is driven by RackPlaneHost.scale_up_nic_count.
The number of scale-out switches for inter-rack communication is driven by
rack_count and RackPlaneHost.scale_out_nic_count and hosts_per_rack.
| Field | Type | Label | Description |
| rack_count | uint32 | optional | The number of racks in the fabric. |
| hosts_per_rack | uint32 | optional | The number of hosts per rack. |
| scale_out_switch_speed | common.SpeedType | Ethernet speed for inter-rack communication via scale-out switches. |
|
| scale_up_switch_speed | common.SpeedType | Ethernet speed for intra-rack communication on the scale-up network. |
Host type intended to be used in conjunction with RackPlaneFabric.
It connects to each of the RackPlaneFabric planes with a dedicated scale up NIC.
| Field | Type | Label | Description |
| npu_count | uint32 | optional | The number of NPUs in the host. |
| scale_up_nic_count | uint32 | optional | The number of NICs for scale-up traffic. |
| scale_out_nic_count | uint32 | optional | The number of NICs for scale-out traffic. |
| name | string | optional | The name identifier for this RackPlane host instance. |
BEGIN PRIVATE
| Field | Type | Label | Description |
| switch_radix | uint32 | The radix of the switch in the fabric. |
|
| oversubscription_ratio | OversubscriptionRatio | The oversubscription ratio for this tier, expressed as downlink_factor:uplink_factor. This determines the ratio of downward-facing to upward-facing port bandwidth. |
|
| port_speed | InfraBandwidth | The port speed (bandwidth) for all switch ports on this tier. Can be specified as a custom value in Gbps or as a standard fabric speed. |
|
| max_ports_up_to_peer | MaxPortsUpToPeer | The maximum number of ports that can connect from a switch on this tier to a single switch on the next higher tier. Controls link aggregation between tiers. |
|
| max_ports_down_from_device | MaxPortsDownFromDevice | The maximum number of ports on each switch that can connect downward to the next lower tier. Typically uses half the switch radix by default. |
BEGIN PRIVATE
zionex host builder takes no inputs
| Field | Type | Label | Description |
| name | string | optional | The name identifier for this Zionex host instance. |
BEGIN PRIVATE
| Name | Number | Description |
| HOST_WIRING_SCHEMA_UNSPECIFIED | 0 | Undeclared wiring schema. |
| FAIR_DISTRIBUTION | 1 | Fair distribution wiring schema. Connect each host to a different rack switch in round-robin fashion until all hosts have been connected. |
| LEFT_TO_RIGHT | 2 | Left to right wiring schema. Connect hosts to rack switches from left to right. Move to next switch only after a switch's capacity has been filled. |
| RAIL_OPTIMIZED | 3 | Rail optimized wiring schema. The nth NICs of all hosts are connected to the same switches. |
BEGIN PRIVATE
The bandwidth of NVLink by version.
| Name | Number | Description |
| NVLINK_VERSION_BANDWIDTH_UNSPECIFIED | 0 | The bandwidth for the specificed NVLink version is not specified. |
BEGIN PRIVATE
| Name | Number | Description |
| SPINE_WIRING_SCHEMA_UNSPECIFIED | 0 | The schema for the spine wiring is not specified. |
| FULLY_CONNECTED | 1 | The spines are fuly connected. Every spine switch is connected to every switch in the previous tier. |
| SPINE_SETS | 2 | The spines are partially connected in sets as determined by the number of hosts, tiers, and switch radix constraints. The spine layer consists of of one or more spine sets such that every switch within a spine set is connected to the nth switch of every connectivity group in the previous tier. Example of this wiring scheme can be seen in this article: https://packetpushers.net/blog/demystifying-dcn-topologies-clos-fat-trees-part2/ |
kccb.proto
Data models and apis for describing a Keysight Collective Communication Benchmark
Must immediately follow header comments (no blank lines)
Benchmark message is a high level abstraction of a Chakra workload
The message MUST be converted into a list of Chakra et_def.proto Node messages
and the dse.Experiment.workload field should be populated with the
converted list
| Field | Type | Label | Description |
| collective_algorithm | common.Algorithm | communication collective algorithm |
|
| datasize | Datasize | A data size iterator with start, step, and end values |
|
| datasize_list | DatasizeList | A list of data size definitions |
|
| iterations | int32 | Number of benchmark runs per data size |
|
| channels | channels.ChannelsTopology | The channels topology configuration specifying how ranks are distributed and connected. |
|
| iteration_append_delay | int32 | optional | The delay (in ms) between repeated executions of the benchmark. |
High-level KCCB spec for a trial.
| Field | Type | Label | Description |
| benchmark | Benchmark | The benchmark configuration defining the collective communication test. |
Datasize message is a container for specifying the data sizes for each
benchmark collective run
| Field | Type | Label | Description |
| start | uint64 | The initial data size of a benchmark collective operation. |
|
| step | uint32 | The amount to increment the data size between iterations |
|
| end | uint64 | maximum data size after which the benchmark completes |
DatasizeList allows for specifying a custom list of data sizes
| Field | Type | Label | Description |
| size_bytes | uint64 | repeated | the size of data in bytes |
binding.proto
Protocol Buffers definitions for binding logical infrastructure elements to ranks, nics, physical resources.
Must immediately follow header comments (no blank lines)
Binds various settings to logical infrastructure elements.
The type of bound information depends on the type of binding.
| Field | Type | Label | Description |
| custom_binding | CustomBinding | Custom binding configuration that maps ranks to NPUs, configures NIC settings, and assigns physical test resources. |
|
| infrastructure_profile | profiles.InfraProfile | the Binding keeps a copy of the infrastructure profile so that Trials can be re-run with the original versions of (potentially modified) profiles.InfraProfile |
|
| infrastructure | keysight_chakra.infra.Infrastructure | low-level chakra infrastructure matching infrastructure_profile |
|
| infra_annotations | keysight_chakra.infra.Annotation | repeated | Annotations for low-level chakra infrastructure. |
| platform_regions | PlatformRegion | repeated | Assigns platforms to different regions of the infrastructure. Should be populated by the server and be a copy of the platform_regions sent in CreateBindingRequest. |
Binds:
a) Ranks to Logical NPUs
b) NIC settings to Logical NICs
c) assigned test resources
| Field | Type | Label | Description |
| rank_bindings | RankBinding | repeated | List of bindings that map each rank to a specific logical NPU and its available NICs in the infrastructure. |
| nic_bindings | NicBinding | repeated | List of NIC configuration bindings that specify settings and associated physical bindings for each logical NIC. |
| physical_bindings | PhysicalBinding | repeated | List of bindings that associate logical infrastructure elements with physical test resources such as chassis or servers. |
Reference to a logical infrastructure element
| Field | Type | Label | Description |
| device_instance_name | string | name of the logical infrastructure device (e.g. the value dse_infra.GenericHost.name was set to) |
|
| device_index | int32 | 0-based device index |
|
| component_name | string | name of the logical infrastructure component |
|
| component_index | int32 | 0-based component index |
Defines a region within the infrastructure by specifying boundary components.
Used to group infrastructure elements for platform assignment or other
purposes.
| Field | Type | Label | Description |
| boundary_refs | InfraRef | repeated | These components mark the boundary of a region within the infrastructure. |
Logical NIC Settings
| Field | Type | Label | Description |
| infra_ref | InfraRef | ref to a NIC defined in the chakra infrastructure |
|
| nic_settings | common.NicSettings | auto-populated, can be overriden by user |
|
| associated_physical_bindings | InfraRef | repeated | flows from this nic may be generated by test ports from any of these associated physical bindings |
Binds a logical infrastructure element to a physical test resource,
specifying the platform type, physical location, and layer 1 settings.
| Field | Type | Label | Description |
| infra_ref | InfraRef | reference to the logical infrastructure element that the physical resource represents |
|
| platform | common.PlatformType | The platform type for this physical binding (e.g., hardware chassis or software server). |
|
| chassis_location | common.ChassisInfo | with Keysight Hardware platforms |
|
| server_location | common.ServerInfo | with NCCL + Keysight Software platforms |
|
| layer1 | common.Layer1 | Layer 1 physical settings such as link speed, duplex mode, and physical layer parameters. |
|
| capture | common.PacketCapture | optional | BEGIN PRIVATE packet capture settings END PRIVATE |
Assigns a platform to a region in the infrastructure.
Currently for internal use only.
TODO: Links to sample scripts demonstrating usage.
| Field | Type | Label | Description |
| region | InfraRegion | The infrastructure region to which a platform will be assigned. Defines the boundary components that comprise the region. |
|
| platform | common.PlatformType | optional | The platform type to assign to this infrastructure region for testing. |
Assign Ranks to Logical NPUs
| Field | Type | Label | Description |
| infra_ref | InfraRef | ref to NPU used by this rank |
|
| rank_id | int32 | The unique identifier for this rank in the distributed workload. Specifies which rank is bound to the referenced NPU. |
|
| nic_refs | InfraRef | repeated | list of NICs available to NPU (in the same host) |
| .proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby |
| double | double | double | float | float64 | double | float | Float | |
| float | float | float | float | float32 | float | float | Float | |
| int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
| int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum |
| uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) |
| uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) |
| sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
| sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum |
| fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) |
| fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum |
| sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
| sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum |
| bool | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass | |
| string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) |
| bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) |