Model Registration API Documentation¶
Endpoint Overview¶
Endpoint: https://raiops.azure.com/model/registration
¶
- Description: This endpoint will register your model on the RAI Platform
HTTP Method:¶
- Method:
POST
Request Headers¶
The following headers should be included in the request:
Header | Value | Description | Required |
---|---|---|---|
Authorization |
Bearer <your-token> |
Authentication token | |
Content-Type |
application/json |
Indicates that the request body is in JSON format. | |
ms-request-id |
uuid |
A unique id for the request |
Request Body¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
modelInformation |
ModelInformation |
model information | ||
version |
string |
v4 |
version of the model image | |
modelVersion |
string |
v3 |
version of the model | |
sku |
string |
Standard_F16s_v2 |
target compute instance for model hosting | |
acrAddress |
string |
sample.azurecr.io/model:tag |
image reference | |
sampleRequest |
string |
"{\"text\": \"hello world\"}" |
sample request | |
sampleResponse |
string |
"{\"violence\": 0.9}" |
sample response | |
inferenceConfig |
InferenceConfig |
configuration for model inference on AML | ||
probeConfig |
ProbeConfig |
routes for serving | ||
requestSettings |
RequestSettings |
request settings on AML | ||
accuracyTestAMLJob |
string |
url_to_aml_job_for_accuracy_test |
the link to the AML job for accuracy test | |
loadTestJob |
string |
url_to_aml_job_for_load_test |
the link to the AML job for load test | |
latencyInfo |
LatencyInfo |
model performance metrics |
Model Information¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
name |
string |
sample-model |
model name | |
owner |
string |
Model Owner |
model owner | |
contact |
string |
model_owner@microsoft.com |
email to the model owner | |
description |
string |
a text model to detect harmful information |
model description | |
harmCategory |
string |
voilence, sexual, hate, self-harm |
target harm categories, split by comma | |
modality |
string |
text |
model modality - currently two modalities: image, text. For multimodal models, list all the modalities | |
capability |
string |
annotation |
model capability | |
imageSize |
int |
20 |
size of the model image in GB | |
providerId |
int |
1 |
id for model provider. contact Hosting PoC for this value | |
protocol |
string |
1P |
put 1P for this field |
Inference Config¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
livenessRoute |
string |
/liveness |
liveness route | |
readinessRoute |
string |
/readiness |
readiness route | |
scoringRoute |
string |
/score |
scoring route | |
port |
int |
8888 |
port to serve |
Probe Config¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
LivenessProbeConfig |
InferenceProbeConfig |
liveness probe config on AML | ||
readinessProbeConfig |
InferenceProbeConfig |
readiness probe config on AML |
Inference Probe Config¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
FailureThreahold |
int |
5 |
failure threshold | |
SuccessThreshold |
int |
1 |
success threshold | |
Timeout |
int |
100 |
time out in seconds | |
Period |
int |
10 |
period in seconds | |
InitialDelay |
int |
10 |
initial delay |
Request Settings¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
MaxConcurrentRequests |
int |
24 |
maximum concurrent requests | |
MaxQueueWaitTime |
int |
35 |
max queue wait time | |
Timeout |
int |
5000 |
timeout |
Latency Info¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
payloadSize |
PayloadSize |
A dictionary that contains performance metrics for various payload sizes. The key of the dictionary is the size of the payload and value contains detailed metrics. |
Payload Size¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
size_of_payload |
PerformanceMetrics |
Detailed evaluation metrics |
Performance Metrics¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
finalResult |
Metric |
the maximum RPS used for serving | ||
details |
List[Metric] |
metrics under different RPS |
Metric¶
Field | Type | Sample Value | Description | Required |
---|---|---|---|---|
modelCode |
string |
sample-model |
model identifier, typically a combination of model name and version connected by a hyphen | |
raiModelKey |
string |
model-key |
used for ensembled model serving, optional for standalone models | |
os |
string |
Standard_F16s_v2 |
SKU used for serving | |
sku |
string |
Standard_F16s_v2 |
SKU used for serving | |
rps |
int |
50 |
Target RPS in the test | |
currentRPS |
int |
49 |
Actual RPS in the test | |
totalRPS |
int |
49 |
Total RPS in the test | |
numRequests |
int |
15000 |
Total number of requests sent in the test | |
failureRate |
float |
0 |
Ratio of failed requests (non-2xx) | |
failureCount |
int |
0 |
Number of failed requests (non-2xx) | |
latencyAvg |
float |
94.424 |
Average latency in milliseconds | |
latencyMax |
float |
192.232 |
Maximum latency in milliseconds | |
latencyMin |
float |
34.249 |
Minimum latency in milliseconds | |
latencyP50 |
float |
100.292 |
P50 (Median) latency in milliseconds | |
latencyP75 |
float |
124.928 |
P75 latency in milliseconds | |
latencyP90 |
float |
158.234 |
P90 latency in milliseconds | |
latencyP95 |
float |
189.342 |
P95 latency in milliseconds | |
latencyP99 |
float |
191.422 |
P99 latency in milliseconds | |
gpuCountAvg |
float |
1 |
Number of GPUs used in the test | |
gpuUtilPercentage |
float |
93.2 |
Average GPU utilization | |
gpuMemory |
float |
24.57 |
GPU memory consumption in GB | |
maxGPUUtilPercentage |
float |
99.4 |
Maximum GPU utilization | |
maxGPUMemory |
float |
62.52 |
Maximum GPU memory consumption in GB | |
cpuCountAvg |
float |
1 |
Number of CPU cores used in the test | |
logicalCPUCountAv |
float |
1 |
Number of logical CPU cores used in the test | |
cpuUtilPercentage |
float |
94.5 |
Average CPU utilization | |
virtualMemoryUsed |
float |
50 |
Virtual memory used in GB | |
maxCPUUtilPercentage |
float |
95 |
Maximum CPU utilization | |
maxVirtualMemoryUsed |
float |
15 |
Maximum virtual memory consumption in GB | |
testStartTime |
datetime |
2025-02-05T13:34:34.023732Z |
Start time of the test in UTC | |
testEndTime |
datetime |
2025-02-05T13:34:34.023732Z |
End time of the test in UTC | |
testDuration |
timedelta |
0:05:00.585149 |
Duration of the test |
Sample Request¶
{
"modelInformation": {
"name": "syd-adult",
"owner": "Juanyong Duan",
"contact": "juanyong.duan@microsoft.com",
"description": "Sydney Adult Classifier model",
"harmCategory": "string",
"modality": "image",
"capability": "annotation",
"imageSize": 0,
"providerId": "1",
"protocol": "0P"
},
"version": "v4",
"modelVersion": "v4",
"sku": "Standard_F16s_v2",
"acrAddress": "raimodelacr.azurecr.io/adult_classifier:v3.20250114.v3",
"sampleRequest": "{\"image\": \"iVBORw0KGgoAAAANSUhEUgAAAAQAAAAECAIAAAAmkwkpAAAAQ0lEQVR4nGK5vkJp13/HZLP0gxu4mTez3HRssmQJrrN+/JKpUfXbfvbp7qVx//meMYges5F5bzLt154HjQsBAQAA//9gxhgz+Boo6wAAAABJRU5ErkJggg==\"}",
"sampleResponse": "{\"normal\": 0.9869055151939392, \"racy\": 0.002819292014464736, \"adult\": 0.008862475864589214, \"gory\": 0.0014126901514828205}",
"inferenceConfig": {
"livenessRoute": "/liveness",
"readinessRoute": "/readiness",
"scoringRoute": "/score",
"port": 8899
},
"probeConfig": {
"LivenessProbeConfig": {
"FailureThreshold": 5,
"SuccessThreshold": 1,
"Timeout": 100,
"Period": 100,
"InitialDelay": 600
},
"readinessProbeConfig": {
"FailureThreshold": 5,
"SuccessThreshold": 1,
"Timeout": 10,
"Period": 10,
"InitialDelay": 10
}
},
"requestSettings": {
"MaxConcurrentRequests": 24,
"MaxQueueWaitTime": 35,
"Timeout": 5000
},
"accuracyTestAMLJob": "string",
"loadTestAMLJob": "string",
"latencyInfo": {
"payloadSize": {
"1024x1024": {
"finalResult": {
"modelCode": "syd-adult-v3",
"raiModelKey": "",
"payloadSize": "1024x1024",
"os": "Standard_F16s_v2",
"sku": "Standard_F16s_v2",
"rps": 50,
"currentRPS": 49,
"totalRPS": 49,
"numRequests": 15000,
"failureRate": 0,
"failureCount": 0,
"latencyAvg": 86.30696,
"latencyMax": 86.30696,
"latencyMin": 86.30696,
"latencyP50": 75.884445,
"latencyP75": 144.33386,
"latencyP90": 129.93126,
"latencyP95": 154.7285,
"latencyP99": 212.7833,
"gpuCountAvg": 0,
"gpuUtilPercentage": 0,
"gpuMemory": 0,
"maxGPUUtilPercentage": 0,
"maxGPUMemory": 0,
"cpuCountAvg": 0,
"logicalCPUCountAvg": 0,
"cpuUtilPercentage": 0,
"virtualMemoryUsed": 0,
"maxCPUUtilPercentage": 0,
"maxVirtualMemoryUsed": 0,
"testStartTime": "2025-02-05T13:34:34.023732Z",
"testEndTime": "2025-02-05T13:39:34.608881Z",
"testDuration": "0:05:00.585149"
},
"details": [
{
"modelCode": "syd-adult-v3",
"raiModelKey": "",
"payloadSize": "1024x1024",
"os": "Standard_F16s_v2",
"sku": "Standard_F16s_v2",
"rps": 5,
"currentRPS": 4,
"totalRPS": 4,
"numRequests": 1500,
"failureRate": 0,
"failureCount": 0,
"latencyAvg": 49.302048,
"latencyMax": 49.302048,
"latencyMin": 49.302048,
"latencyP50": 47.5619,
"latencyP75": 57.595535,
"latencyP90": 55.91107,
"latencyP95": 59.261154,
"latencyP99": 67.62917,
"gpuCountAvg": 0,
"gpuUtilPercentage": 0,
"gpuMemory": 0,
"maxGPUUtilPercentage": 0,
"maxGPUMemory": 0,
"cpuCountAvg": 0,
"logicalCPUCountAvg": 0,
"cpuUtilPercentage": 0,
"virtualMemoryUsed": 0,
"maxCPUUtilPercentage": 0,
"maxVirtualMemoryUsed": 0,
"testStartTime": "2025-02-05T12:49:20.801117Z",
"testEndTime": "2025-02-05T12:54:21.114637Z",
"testDuration": "0:05:00.313520"
},
{
"modelCode": "syd-adult-v3",
"raiModelKey": "",
"payloadSize": "1024x1024",
"os": "Standard_F16s_v2",
"sku": "Standard_F16s_v2",
"rps": 20,
"currentRPS": 19,
"totalRPS": 19,
"numRequests": 6000,
"failureRate": 0,
"failureCount": 0,
"latencyAvg": 55.59504,
"latencyMax": 55.59504,
"latencyMin": 55.59504,
"latencyP50": 52.751766,
"latencyP75": 74.4868,
"latencyP90": 67.97182,
"latencyP95": 74.63064,
"latencyP99": 96.22183,
"gpuCountAvg": 0,
"gpuUtilPercentage": 0,
"gpuMemory": 0,
"maxGPUUtilPercentage": 0,
"maxGPUMemory": 0,
"cpuCountAvg": 0,
"logicalCPUCountAvg": 0,
"cpuUtilPercentage": 0,
"virtualMemoryUsed": 0,
"maxCPUUtilPercentage": 0,
"maxVirtualMemoryUsed": 0,
"testStartTime": "2025-02-05T13:04:24.873751Z",
"testEndTime": "2025-02-05T13:09:25.263794Z",
"testDuration": "0:05:00.390043"
},
{
"modelCode": "syd-adult-v3",
"raiModelKey": "",
"payloadSize": "1024x1024",
"os": "Standard_F16s_v2",
"sku": "Standard_F16s_v2",
"rps": 45,
"currentRPS": 44,
"totalRPS": 44,
"numRequests": 13500,
"failureRate": 0,
"failureCount": 0,
"latencyAvg": 76.5693,
"latencyMax": 76.5693,
"latencyMin": 76.5693,
"latencyP50": 69.56816,
"latencyP75": 122.449356,
"latencyP90": 110.96486,
"latencyP95": 129.19327,
"latencyP99": 175.33055,
"gpuCountAvg": 0,
"gpuUtilPercentage": 0,
"gpuMemory": 0,
"maxGPUUtilPercentage": 0,
"maxGPUMemory": 0,
"cpuCountAvg": 0,
"logicalCPUCountAvg": 0,
"cpuUtilPercentage": 0,
"virtualMemoryUsed": 0,
"maxCPUUtilPercentage": 0,
"maxVirtualMemoryUsed": 0,
"testStartTime": "2025-02-05T13:29:32.426193Z",
"testEndTime": "2025-02-05T13:34:32.951592Z",
"testDuration": "0:05:00.525399"
},
{
"modelCode": "syd-adult-v3",
"raiModelKey": "",
"payloadSize": "1024x1024",
"os": "Standard_F16s_v2",
"sku": "Standard_F16s_v2",
"rps": 50,
"currentRPS": 49,
"totalRPS": 49,
"numRequests": 15000,
"failureRate": 0,
"failureCount": 0,
"latencyAvg": 86.30696,
"latencyMax": 86.30696,
"latencyMin": 86.30696,
"latencyP50": 75.884445,
"latencyP75": 144.33386,
"latencyP90": 129.93126,
"latencyP95": 154.7285,
"latencyP99": 212.7833,
"gpuCountAvg": 0,
"gpuUtilPercentage": 0,
"gpuMemory": 0,
"maxGPUUtilPercentage": 0,
"maxGPUMemory": 0,
"cpuCountAvg": 0,
"logicalCPUCountAvg": 0,
"cpuUtilPercentage": 0,
"virtualMemoryUsed": 0,
"maxCPUUtilPercentage": 0,
"maxVirtualMemoryUsed": 0,
"testStartTime": "2025-02-05T13:34:34.023732Z",
"testEndTime": "2025-02-05T13:39:34.608881Z",
"testDuration": "0:05:00.585149"
}
]
}
}
}
}