Model Registration API Documentation¶

Endpoint Overview¶

Endpoint: `https://raiops.azure.com/model/registration`¶

Description: This endpoint will register your model on the RAI Platform

HTTP Method¶

Method: POST

Request Headers¶

The following headers should be included in the request:

Header	Value	Description
`Authorization`	`Bearer <your-token>`	Authentication token
`Content-Type`	`application/json`	Indicates that the request body is in JSON format.
`ms-request-id`	`uuid`	A unique id for the request

Request Body¶

Field	Type	Sample Value	Description
`modelInformation`	`ModelInformation`		model information
`version`	`string`	`v4`	version of the model image
`modelVersion`	`string`	`v3`	version of the model
`sku`	`string`	`Standard_F16s_v2`	target compute instance for model hosting
`acrAddress`	`string`	`sample.azurecr.io/model:tag`	image reference
`sampleRequest`	`string`	`"{\"text\": \"hello world\"}"`	sample request
`sampleResponse`	`string`	`"{\"violence\": 0.9}"`	sample response
`inferenceConfig`	`InferenceConfig`		configuration for model inference on AML
`probeConfig`	`ProbeConfig`		routes for serving
`requestSettings`	`RequestSettings`		request settings on AML
`accuracyTestAMLJob`	`string`	`url_to_aml_job_for_accuracy_test`	the link to the AML job for accuracy test
`loadTestJob`	`string`	`url_to_aml_job_for_load_test`	the link to the AML job for load test
`latencyInfo`	`LatencyInfo`		model performance metrics

Model Information¶

Field	Type	Sample Value	Description
`name`	`string`	`sample-model`	model name
`owner`	`string`	`Model Owner`	model owner
`contact`	`string`	`model_owner@microsoft.com`	email to the model owner
`description`	`string`	`a text model to detect harmful information`	model description
`harmCategory`	`string`	`voilence, sexual, hate, self-harm`	target harm categories, split by comma
`modality`	`string`	`text`	model modality - currently two modalities: image, text. For multimodal models, list all the modalities
`capability`	`string`	`annotation`	model capability
`imageSize`	`int`	`20`	size of the model image in GB
`providerId`	`int`	`1`	id for model provider. contact Hosting PoC for this value
`protocol`	`string`	`1P`	put 1P for this field

Inference Config¶

Field	Type	Sample Value	Description
`livenessRoute`	`string`	`/liveness`	liveness route
`readinessRoute`	`string`	`/readiness`	readiness route
`scoringRoute`	`string`	`/score`	scoring route
`port`	`int`	`8888`	port to serve

Probe Config¶

Field	Type	Sample Value	Description	Required
`LivenessProbeConfig`	`InferenceProbeConfig`		liveness probe config on AML
`readinessProbeConfig`	`InferenceProbeConfig`		readiness probe config on AML

Inference Probe Config¶

Field	Type	Sample Value	Description
`FailureThreahold`	`int`	`5`	failure threshold
`SuccessThreshold`	`int`	`1`	success threshold
`Timeout`	`int`	`100`	time out in seconds
`Period`	`int`	`10`	period in seconds
`InitialDelay`	`int`	`10`	initial delay

Request Settings¶

Field	Type	Sample Value	Description
`MaxConcurrentRequests`	`int`	`24`	maximum concurrent requests
`MaxQueueWaitTime`	`int`	`35`	max queue wait time
`Timeout`	`int`	`5000`	timeout

Latency Info¶

Field	Type	Sample Value	Description	Required
`payloadSize`	`PayloadSize`		A dictionary that contains performance metrics for various payload sizes. The key of the dictionary is the size of the payload and value contains detailed metrics.

Payload Size¶

Field	Type	Sample Value	Description	Required
`size_of_payload`	`PerformanceMetrics`		Detailed evaluation metrics

Performance Metrics¶

Field	Type	Sample Value	Description	Required
`finalResult`	`Metric`		the maximum RPS used for serving
`details`	`List[Metric]`		metrics under different RPS

Metric¶

Field	Type	Sample Value	Description
`modelCode`	`string`	`sample-model`	model identifier, typically a combination of model name and version connected by a hyphen
`raiModelKey`	`string`	`model-key`	used for ensembled model serving, optional for standalone models
`os`	`string`	`Standard_F16s_v2`	SKU used for serving
`sku`	`string`	`Standard_F16s_v2`	SKU used for serving
`rps`	`int`	`50`	Target RPS in the test
`currentRPS`	`int`	`49`	Actual RPS in the test
`totalRPS`	`int`	`49`	Total RPS in the test
`numRequests`	`int`	`15000`	Total number of requests sent in the test
`failureRate`	`float`	`0`	Ratio of failed requests (non-2xx)
`failureCount`	`int`	`0`	Number of failed requests (non-2xx)
`latencyAvg`	`float`	`94.424`	Average latency in milliseconds
`latencyMax`	`float`	`192.232`	Maximum latency in milliseconds
`latencyMin`	`float`	`34.249`	Minimum latency in milliseconds
`latencyP50`	`float`	`100.292`	P50 (Median) latency in milliseconds
`latencyP75`	`float`	`124.928`	P75 latency in milliseconds
`latencyP90`	`float`	`158.234`	P90 latency in milliseconds
`latencyP95`	`float`	`189.342`	P95 latency in milliseconds
`latencyP99`	`float`	`191.422`	P99 latency in milliseconds
`gpuCountAvg`	`float`	`1`	Number of GPUs used in the test
`gpuUtilPercentage`	`float`	`93.2`	Average GPU utilization
`gpuMemory`	`float`	`24.57`	GPU memory consumption in GB
`maxGPUUtilPercentage`	`float`	`99.4`	Maximum GPU utilization
`maxGPUMemory`	`float`	`62.52`	Maximum GPU memory consumption in GB
`cpuCountAvg`	`float`	`1`	Number of CPU cores used in the test
`logicalCPUCountAv`	`float`	`1`	Number of logical CPU cores used in the test
`cpuUtilPercentage`	`float`	`94.5`	Average CPU utilization
`virtualMemoryUsed`	`float`	`50`	Virtual memory used in GB
`maxCPUUtilPercentage`	`float`	`95`	Maximum CPU utilization
`maxVirtualMemoryUsed`	`float`	`15`	Maximum virtual memory consumption in GB
`testStartTime`	`datetime`	`2025-02-05T13:34:34.023732Z`	Start time of the test in UTC
`testEndTime`	`datetime`	`2025-02-05T13:34:34.023732Z`	End time of the test in UTC
`testDuration`	`timedelta`	`0:05:00.585149`	Duration of the test

Sample Request¶

{
 "modelInformation": {
  "name": "syd-adult",
  "owner": "Juanyong Duan",
  "contact": "juanyong.duan@microsoft.com",
  "description": "Sydney Adult Classifier model",
  "harmCategory": "string",
  "modality": "image",
  "capability": "annotation",
  "imageSize": 0,
  "providerId": "1",
  "protocol": "0P"
 },
 "version": "v4",
 "modelVersion": "v4",
 "sku": "Standard_F16s_v2",
 "acrAddress": "raimodelacr.azurecr.io/adult_classifier:v3.20250114.v3",
 "sampleRequest": "{\"image\": \"iVBORw0KGgoAAAANSUhEUgAAAAQAAAAECAIAAAAmkwkpAAAAQ0lEQVR4nGK5vkJp13/HZLP0gxu4mTez3HRssmQJrrN+/JKpUfXbfvbp7qVx//meMYges5F5bzLt154HjQsBAQAA//9gxhgz+Boo6wAAAABJRU5ErkJggg==\"}",
 "sampleResponse": "{\"normal\": 0.9869055151939392, \"racy\": 0.002819292014464736, \"adult\": 0.008862475864589214, \"gory\": 0.0014126901514828205}",
 "inferenceConfig": {
  "livenessRoute": "/liveness",
  "readinessRoute": "/readiness",
  "scoringRoute": "/score",
  "port": 8899
 },
 "probeConfig": {
  "LivenessProbeConfig": {
   "FailureThreshold": 5,
   "SuccessThreshold": 1,
   "Timeout": 100,
   "Period": 100,
   "InitialDelay": 600
  },
  "readinessProbeConfig": {
   "FailureThreshold": 5,
   "SuccessThreshold": 1,
   "Timeout": 10,
   "Period": 10,
   "InitialDelay": 10
  }
 },
 "requestSettings": {
  "MaxConcurrentRequests": 24,
  "MaxQueueWaitTime": 35,
  "Timeout": 5000
 },
 "accuracyTestAMLJob": "string",
 "loadTestAMLJob": "string",
 "latencyInfo": {
  "payloadSize": {
   "1024x1024": {
    "finalResult": {
     "modelCode": "syd-adult-v3",
     "raiModelKey": "",
     "payloadSize": "1024x1024",
     "os": "Standard_F16s_v2",
     "sku": "Standard_F16s_v2",
     "rps": 50,
     "currentRPS": 49,
     "totalRPS": 49,
     "numRequests": 15000,
     "failureRate": 0,
     "failureCount": 0,
     "latencyAvg": 86.30696,
     "latencyMax": 86.30696,
     "latencyMin": 86.30696,
     "latencyP50": 75.884445,
     "latencyP75": 144.33386,
     "latencyP90": 129.93126,
     "latencyP95": 154.7285,
     "latencyP99": 212.7833,
     "gpuCountAvg": 0,
     "gpuUtilPercentage": 0,
     "gpuMemory": 0,
     "maxGPUUtilPercentage": 0,
     "maxGPUMemory": 0,
     "cpuCountAvg": 0,
     "logicalCPUCountAvg": 0,
     "cpuUtilPercentage": 0,
     "virtualMemoryUsed": 0,
     "maxCPUUtilPercentage": 0,
     "maxVirtualMemoryUsed": 0,
     "testStartTime": "2025-02-05T13:34:34.023732Z",
     "testEndTime": "2025-02-05T13:39:34.608881Z",
     "testDuration": "0:05:00.585149"
    },
    "details": [
     {
      "modelCode": "syd-adult-v3",
      "raiModelKey": "",
      "payloadSize": "1024x1024",
      "os": "Standard_F16s_v2",
      "sku": "Standard_F16s_v2",
      "rps": 5,
      "currentRPS": 4,
      "totalRPS": 4,
      "numRequests": 1500,
      "failureRate": 0,
      "failureCount": 0,
      "latencyAvg": 49.302048,
      "latencyMax": 49.302048,
      "latencyMin": 49.302048,
      "latencyP50": 47.5619,
      "latencyP75": 57.595535,
      "latencyP90": 55.91107,
      "latencyP95": 59.261154,
      "latencyP99": 67.62917,
      "gpuCountAvg": 0,
      "gpuUtilPercentage": 0,
      "gpuMemory": 0,
      "maxGPUUtilPercentage": 0,
      "maxGPUMemory": 0,
      "cpuCountAvg": 0,
      "logicalCPUCountAvg": 0,
      "cpuUtilPercentage": 0,
      "virtualMemoryUsed": 0,
      "maxCPUUtilPercentage": 0,
      "maxVirtualMemoryUsed": 0,
      "testStartTime": "2025-02-05T12:49:20.801117Z",
      "testEndTime": "2025-02-05T12:54:21.114637Z",
      "testDuration": "0:05:00.313520"
     },
     {
      "modelCode": "syd-adult-v3",
      "raiModelKey": "",
      "payloadSize": "1024x1024",
      "os": "Standard_F16s_v2",
      "sku": "Standard_F16s_v2",
      "rps": 20,
      "currentRPS": 19,
      "totalRPS": 19,
      "numRequests": 6000,
      "failureRate": 0,
      "failureCount": 0,
      "latencyAvg": 55.59504,
      "latencyMax": 55.59504,
      "latencyMin": 55.59504,
      "latencyP50": 52.751766,
      "latencyP75": 74.4868,
      "latencyP90": 67.97182,
      "latencyP95": 74.63064,
      "latencyP99": 96.22183,
      "gpuCountAvg": 0,
      "gpuUtilPercentage": 0,
      "gpuMemory": 0,
      "maxGPUUtilPercentage": 0,
      "maxGPUMemory": 0,
      "cpuCountAvg": 0,
      "logicalCPUCountAvg": 0,
      "cpuUtilPercentage": 0,
      "virtualMemoryUsed": 0,
      "maxCPUUtilPercentage": 0,
      "maxVirtualMemoryUsed": 0,
      "testStartTime": "2025-02-05T13:04:24.873751Z",
      "testEndTime": "2025-02-05T13:09:25.263794Z",
      "testDuration": "0:05:00.390043"
     },
     {
      "modelCode": "syd-adult-v3",
      "raiModelKey": "",
      "payloadSize": "1024x1024",
      "os": "Standard_F16s_v2",
      "sku": "Standard_F16s_v2",
      "rps": 45,
      "currentRPS": 44,
      "totalRPS": 44,
      "numRequests": 13500,
      "failureRate": 0,
      "failureCount": 0,
      "latencyAvg": 76.5693,
      "latencyMax": 76.5693,
      "latencyMin": 76.5693,
      "latencyP50": 69.56816,
      "latencyP75": 122.449356,
      "latencyP90": 110.96486,
      "latencyP95": 129.19327,
      "latencyP99": 175.33055,
      "gpuCountAvg": 0,
      "gpuUtilPercentage": 0,
      "gpuMemory": 0,
      "maxGPUUtilPercentage": 0,
      "maxGPUMemory": 0,
      "cpuCountAvg": 0,
      "logicalCPUCountAvg": 0,
      "cpuUtilPercentage": 0,
      "virtualMemoryUsed": 0,
      "maxCPUUtilPercentage": 0,
      "maxVirtualMemoryUsed": 0,
      "testStartTime": "2025-02-05T13:29:32.426193Z",
      "testEndTime": "2025-02-05T13:34:32.951592Z",
      "testDuration": "0:05:00.525399"
     },
     {
      "modelCode": "syd-adult-v3",
      "raiModelKey": "",
      "payloadSize": "1024x1024",
      "os": "Standard_F16s_v2",
      "sku": "Standard_F16s_v2",
      "rps": 50,
      "currentRPS": 49,
      "totalRPS": 49,
      "numRequests": 15000,
      "failureRate": 0,
      "failureCount": 0,
      "latencyAvg": 86.30696,
      "latencyMax": 86.30696,
      "latencyMin": 86.30696,
      "latencyP50": 75.884445,
      "latencyP75": 144.33386,
      "latencyP90": 129.93126,
      "latencyP95": 154.7285,
      "latencyP99": 212.7833,
      "gpuCountAvg": 0,
      "gpuUtilPercentage": 0,
      "gpuMemory": 0,
      "maxGPUUtilPercentage": 0,
      "maxGPUMemory": 0,
      "cpuCountAvg": 0,
      "logicalCPUCountAvg": 0,
      "cpuUtilPercentage": 0,
      "virtualMemoryUsed": 0,
      "maxCPUUtilPercentage": 0,
      "maxVirtualMemoryUsed": 0,
      "testStartTime": "2025-02-05T13:34:34.023732Z",
      "testEndTime": "2025-02-05T13:39:34.608881Z",
      "testDuration": "0:05:00.585149"
     }
    ]
   }
  }
 }
}

Model Registration API Documentation¶

Endpoint Overview¶

Endpoint: https://raiops.azure.com/model/registration¶

HTTP Method¶

Request Headers¶

Request Body¶

Model Information¶

Inference Config¶

Probe Config¶

Inference Probe Config¶

Request Settings¶

Latency Info¶

Payload Size¶

Performance Metrics¶

Metric¶

Sample Request¶

Endpoint: `https://raiops.azure.com/model/registration`¶