Skip to content

Model Registration API Documentation

Endpoint Overview

Endpoint: https://raiops.azure.com/model/registration

  • Description: This endpoint will register your model on the RAI Platform

HTTP Method:

  • Method: POST

Request Headers

The following headers should be included in the request:

Header Value Description Required
Authorization Bearer <your-token> Authentication token ✅
Content-Type application/json Indicates that the request body is in JSON format. ✅
ms-request-id uuid A unique id for the request ✅

Request Body

Field Type Sample Value Description Required
modelInformation ModelInformation model information ✅
version string v4 version of the model image ✅
modelVersion string v3 version of the model ✅
sku string Standard_F16s_v2 target compute instance for model hosting ✅
acrAddress string sample.azurecr.io/model:tag image reference ✅
sampleRequest string "{\"text\": \"hello world\"}" sample request ✅
sampleResponse string "{\"violence\": 0.9}" sample response ✅
inferenceConfig InferenceConfig configuration for model inference on AML
probeConfig ProbeConfig routes for serving ✅
requestSettings RequestSettings request settings on AML
accuracyTestAMLJob string url_to_aml_job_for_accuracy_test the link to the AML job for accuracy test
loadTestJob string url_to_aml_job_for_load_test the link to the AML job for load test
latencyInfo LatencyInfo model performance metrics ✅

Model Information

Field Type Sample Value Description Required
name string sample-model model name ✅
owner string Model Owner model owner ✅
contact string model_owner@microsoft.com email to the model owner ✅
description string a text model to detect harmful information model description
harmCategory string voilence, sexual, hate, self-harm target harm categories, split by comma ✅
modality string text model modality - currently two modalities: image, text. For multimodal models, list all the modalities ✅
capability string annotation model capability
imageSize int 20 size of the model image in GB
providerId int 1 id for model provider. contact Hosting PoC for this value ✅
protocol string 1P put 1P for this field ✅

Inference Config

Field Type Sample Value Description Required
livenessRoute string /liveness liveness route ✅
readinessRoute string /readiness readiness route ✅
scoringRoute string /score scoring route ✅
port int 8888 port to serve ✅

Probe Config

Field Type Sample Value Description Required
LivenessProbeConfig InferenceProbeConfig liveness probe config on AML
readinessProbeConfig InferenceProbeConfig readiness probe config on AML

Inference Probe Config

Field Type Sample Value Description Required
FailureThreahold int 5 failure threshold
SuccessThreshold int 1 success threshold
Timeout int 100 time out in seconds
Period int 10 period in seconds
InitialDelay int 10 initial delay

Request Settings

Field Type Sample Value Description Required
MaxConcurrentRequests int 24 maximum concurrent requests
MaxQueueWaitTime int 35 max queue wait time
Timeout int 5000 timeout

Latency Info

Field Type Sample Value Description Required
payloadSize PayloadSize A dictionary that contains performance metrics for various payload sizes. The key of the dictionary is the size of the payload and value contains detailed metrics. ✅

Payload Size

Field Type Sample Value Description Required
size_of_payload PerformanceMetrics Detailed evaluation metrics ✅

Performance Metrics

Field Type Sample Value Description Required
finalResult Metric the maximum RPS used for serving ✅
details List[Metric] metrics under different RPS ✅

Metric

Field Type Sample Value Description Required
modelCode string sample-model model identifier, typically a combination of model name and version connected by a hyphen ✅
raiModelKey string model-key used for ensembled model serving, optional for standalone models
os string Standard_F16s_v2 SKU used for serving ✅
sku string Standard_F16s_v2 SKU used for serving ✅
rps int 50 Target RPS in the test ✅
currentRPS int 49 Actual RPS in the test ✅
totalRPS int 49 Total RPS in the test ✅
numRequests int 15000 Total number of requests sent in the test ✅
failureRate float 0 Ratio of failed requests (non-2xx) ✅
failureCount int 0 Number of failed requests (non-2xx) ✅
latencyAvg float 94.424 Average latency in milliseconds ✅
latencyMax float 192.232 Maximum latency in milliseconds ✅
latencyMin float 34.249 Minimum latency in milliseconds ✅
latencyP50 float 100.292 P50 (Median) latency in milliseconds ✅
latencyP75 float 124.928 P75 latency in milliseconds ✅
latencyP90 float 158.234 P90 latency in milliseconds ✅
latencyP95 float 189.342 P95 latency in milliseconds ✅
latencyP99 float 191.422 P99 latency in milliseconds ✅
gpuCountAvg float 1 Number of GPUs used in the test
gpuUtilPercentage float 93.2 Average GPU utilization
gpuMemory float 24.57 GPU memory consumption in GB
maxGPUUtilPercentage float 99.4 Maximum GPU utilization
maxGPUMemory float 62.52 Maximum GPU memory consumption in GB
cpuCountAvg float 1 Number of CPU cores used in the test
logicalCPUCountAv float 1 Number of logical CPU cores used in the test
cpuUtilPercentage float 94.5 Average CPU utilization
virtualMemoryUsed float 50 Virtual memory used in GB
maxCPUUtilPercentage float 95 Maximum CPU utilization
maxVirtualMemoryUsed float 15 Maximum virtual memory consumption in GB
testStartTime datetime 2025-02-05T13:34:34.023732Z Start time of the test in UTC ✅
testEndTime datetime 2025-02-05T13:34:34.023732Z End time of the test in UTC ✅
testDuration timedelta 0:05:00.585149 Duration of the test ✅

Sample Request

{
    "modelInformation": {
        "name": "syd-adult",
        "owner": "Juanyong Duan",
        "contact": "juanyong.duan@microsoft.com",
        "description": "Sydney Adult Classifier model",
        "harmCategory": "string",
        "modality": "image",
        "capability": "annotation",
        "imageSize": 0,
        "providerId": "1",
        "protocol": "0P"
    },
    "version": "v4",
    "modelVersion": "v4",
    "sku": "Standard_F16s_v2",
    "acrAddress": "raimodelacr.azurecr.io/adult_classifier:v3.20250114.v3",
    "sampleRequest": "{\"image\": \"iVBORw0KGgoAAAANSUhEUgAAAAQAAAAECAIAAAAmkwkpAAAAQ0lEQVR4nGK5vkJp13/HZLP0gxu4mTez3HRssmQJrrN+/JKpUfXbfvbp7qVx//meMYges5F5bzLt154HjQsBAQAA//9gxhgz+Boo6wAAAABJRU5ErkJggg==\"}",
    "sampleResponse": "{\"normal\": 0.9869055151939392, \"racy\": 0.002819292014464736, \"adult\": 0.008862475864589214, \"gory\": 0.0014126901514828205}",
    "inferenceConfig": {
        "livenessRoute": "/liveness",
        "readinessRoute": "/readiness",
        "scoringRoute": "/score",
        "port": 8899
    },
    "probeConfig": {
        "LivenessProbeConfig": {
            "FailureThreshold": 5,
            "SuccessThreshold": 1,
            "Timeout": 100,
            "Period": 100,
            "InitialDelay": 600
        },
        "readinessProbeConfig": {
            "FailureThreshold": 5,
            "SuccessThreshold": 1,
            "Timeout": 10,
            "Period": 10,
            "InitialDelay": 10
        }
    },
    "requestSettings": {
        "MaxConcurrentRequests": 24,
        "MaxQueueWaitTime": 35,
        "Timeout": 5000
    },
    "accuracyTestAMLJob": "string",
    "loadTestAMLJob": "string",
    "latencyInfo": {
        "payloadSize": {
            "1024x1024": {
                "finalResult": {
                    "modelCode": "syd-adult-v3",
                    "raiModelKey": "",
                    "payloadSize": "1024x1024",
                    "os": "Standard_F16s_v2",
                    "sku": "Standard_F16s_v2",
                    "rps": 50,
                    "currentRPS": 49,
                    "totalRPS": 49,
                    "numRequests": 15000,
                    "failureRate": 0,
                    "failureCount": 0,
                    "latencyAvg": 86.30696,
                    "latencyMax": 86.30696,
                    "latencyMin": 86.30696,
                    "latencyP50": 75.884445,
                    "latencyP75": 144.33386,
                    "latencyP90": 129.93126,
                    "latencyP95": 154.7285,
                    "latencyP99": 212.7833,
                    "gpuCountAvg": 0,
                    "gpuUtilPercentage": 0,
                    "gpuMemory": 0,
                    "maxGPUUtilPercentage": 0,
                    "maxGPUMemory": 0,
                    "cpuCountAvg": 0,
                    "logicalCPUCountAvg": 0,
                    "cpuUtilPercentage": 0,
                    "virtualMemoryUsed": 0,
                    "maxCPUUtilPercentage": 0,
                    "maxVirtualMemoryUsed": 0,
                    "testStartTime": "2025-02-05T13:34:34.023732Z",
                    "testEndTime": "2025-02-05T13:39:34.608881Z",
                    "testDuration": "0:05:00.585149"
                },
                "details": [
                    {
                        "modelCode": "syd-adult-v3",
                        "raiModelKey": "",
                        "payloadSize": "1024x1024",
                        "os": "Standard_F16s_v2",
                        "sku": "Standard_F16s_v2",
                        "rps": 5,
                        "currentRPS": 4,
                        "totalRPS": 4,
                        "numRequests": 1500,
                        "failureRate": 0,
                        "failureCount": 0,
                        "latencyAvg": 49.302048,
                        "latencyMax": 49.302048,
                        "latencyMin": 49.302048,
                        "latencyP50": 47.5619,
                        "latencyP75": 57.595535,
                        "latencyP90": 55.91107,
                        "latencyP95": 59.261154,
                        "latencyP99": 67.62917,
                        "gpuCountAvg": 0,
                        "gpuUtilPercentage": 0,
                        "gpuMemory": 0,
                        "maxGPUUtilPercentage": 0,
                        "maxGPUMemory": 0,
                        "cpuCountAvg": 0,
                        "logicalCPUCountAvg": 0,
                        "cpuUtilPercentage": 0,
                        "virtualMemoryUsed": 0,
                        "maxCPUUtilPercentage": 0,
                        "maxVirtualMemoryUsed": 0,
                        "testStartTime": "2025-02-05T12:49:20.801117Z",
                        "testEndTime": "2025-02-05T12:54:21.114637Z",
                        "testDuration": "0:05:00.313520"
                    },
                    {
                        "modelCode": "syd-adult-v3",
                        "raiModelKey": "",
                        "payloadSize": "1024x1024",
                        "os": "Standard_F16s_v2",
                        "sku": "Standard_F16s_v2",
                        "rps": 20,
                        "currentRPS": 19,
                        "totalRPS": 19,
                        "numRequests": 6000,
                        "failureRate": 0,
                        "failureCount": 0,
                        "latencyAvg": 55.59504,
                        "latencyMax": 55.59504,
                        "latencyMin": 55.59504,
                        "latencyP50": 52.751766,
                        "latencyP75": 74.4868,
                        "latencyP90": 67.97182,
                        "latencyP95": 74.63064,
                        "latencyP99": 96.22183,
                        "gpuCountAvg": 0,
                        "gpuUtilPercentage": 0,
                        "gpuMemory": 0,
                        "maxGPUUtilPercentage": 0,
                        "maxGPUMemory": 0,
                        "cpuCountAvg": 0,
                        "logicalCPUCountAvg": 0,
                        "cpuUtilPercentage": 0,
                        "virtualMemoryUsed": 0,
                        "maxCPUUtilPercentage": 0,
                        "maxVirtualMemoryUsed": 0,
                        "testStartTime": "2025-02-05T13:04:24.873751Z",
                        "testEndTime": "2025-02-05T13:09:25.263794Z",
                        "testDuration": "0:05:00.390043"
                    },
                    {
                        "modelCode": "syd-adult-v3",
                        "raiModelKey": "",
                        "payloadSize": "1024x1024",
                        "os": "Standard_F16s_v2",
                        "sku": "Standard_F16s_v2",
                        "rps": 45,
                        "currentRPS": 44,
                        "totalRPS": 44,
                        "numRequests": 13500,
                        "failureRate": 0,
                        "failureCount": 0,
                        "latencyAvg": 76.5693,
                        "latencyMax": 76.5693,
                        "latencyMin": 76.5693,
                        "latencyP50": 69.56816,
                        "latencyP75": 122.449356,
                        "latencyP90": 110.96486,
                        "latencyP95": 129.19327,
                        "latencyP99": 175.33055,
                        "gpuCountAvg": 0,
                        "gpuUtilPercentage": 0,
                        "gpuMemory": 0,
                        "maxGPUUtilPercentage": 0,
                        "maxGPUMemory": 0,
                        "cpuCountAvg": 0,
                        "logicalCPUCountAvg": 0,
                        "cpuUtilPercentage": 0,
                        "virtualMemoryUsed": 0,
                        "maxCPUUtilPercentage": 0,
                        "maxVirtualMemoryUsed": 0,
                        "testStartTime": "2025-02-05T13:29:32.426193Z",
                        "testEndTime": "2025-02-05T13:34:32.951592Z",
                        "testDuration": "0:05:00.525399"
                    },
                    {
                        "modelCode": "syd-adult-v3",
                        "raiModelKey": "",
                        "payloadSize": "1024x1024",
                        "os": "Standard_F16s_v2",
                        "sku": "Standard_F16s_v2",
                        "rps": 50,
                        "currentRPS": 49,
                        "totalRPS": 49,
                        "numRequests": 15000,
                        "failureRate": 0,
                        "failureCount": 0,
                        "latencyAvg": 86.30696,
                        "latencyMax": 86.30696,
                        "latencyMin": 86.30696,
                        "latencyP50": 75.884445,
                        "latencyP75": 144.33386,
                        "latencyP90": 129.93126,
                        "latencyP95": 154.7285,
                        "latencyP99": 212.7833,
                        "gpuCountAvg": 0,
                        "gpuUtilPercentage": 0,
                        "gpuMemory": 0,
                        "maxGPUUtilPercentage": 0,
                        "maxGPUMemory": 0,
                        "cpuCountAvg": 0,
                        "logicalCPUCountAvg": 0,
                        "cpuUtilPercentage": 0,
                        "virtualMemoryUsed": 0,
                        "maxCPUUtilPercentage": 0,
                        "maxVirtualMemoryUsed": 0,
                        "testStartTime": "2025-02-05T13:34:34.023732Z",
                        "testEndTime": "2025-02-05T13:39:34.608881Z",
                        "testDuration": "0:05:00.585149"
                    }
                ]
            }
        }
    }
}