Creating an Azure DevOps Build for Signing and Registering
By reading through this doc, you will be able to
- have a high-level understanding of how to use
shrike.build
, and - create a single-YAML pipeline build in Azure DevOps for validating, signing and registering Azure ML components.
Requirements
To enjoy this tutorial, you need to
- have at least one Azure ML component YAML specification file in your team's repository,
- have an Azure ML service connection set up in your Azure DevOps for your Azure subscription,
- have an ESRP service connection set up in your Azure DevOps, and
- have a basic knowledge of Azure DevOps pipeline YAML schema.
Configuration
Command line arguments and configuration YAML file are both supported by shrike.build
. The order of precedence from least to greatest (the last listed variables override all other variables) is: default values, configuration file, command line arguments.
An example of configuration YAML file:
# Choose from two signing mode: aml, or aether
signing_mode: aml
# Two methods are provided to find "active" components: all, or smart
# For "all" option, all the components will be validated/signed/registered
# For "smart" option, only those changed components will be processed.
activation_method: all
# Regular expression that a branch must satisfy in order for this code to
# sign or register components.
compliant_branch: ^refs/heads/main$
# Glob path of all component specification files.
component_specification_glob: 'steps/**/module_spec.yaml'
log_format: '[%(name)s][%(levelname)s] - %(message)s'
# List of workspace ARM IDs (fill in the <> with the appropriate values for your Azure ML workspace)
workspaces:
- /subscriptions/<Subscription-Id>/resourcegroups/<Name-Of-Resource-Group>/providers/Microsoft.MachineLearningServices/workspaces/<Azure-ML-Workspace-Name>
# Boolean argument: What to do when the same version of a component has already been registered.
# Default: False
fail_if_version_exists: False
# Boolean argument: Will the build number be used or not
use_build_number: True
To consume this configuration file, we should pass its path to the command line, that is
python -m shrike.build.commands.prepare --configuration-file PATH/TO/MY_CONFIGURATION_FILE
activation_method
and fail_if_version_exists
at runtime, we should append them to the command line:
python -m shrike.build.commands.prepare --configuration-file PATH/TO/MY_CONFIGURATION_FILE --activation-method smart --fail-if-version-exists
"Smart" mode
The shrike
package supports a "smart" activation_method
. To use it, just include the following line to your build configuration file.
activation_method: smart
Using this "smart" mode will only register the components that were modified, given a list of modified files. The logic used to identify which components are modified is as follows.
- The modified file needs to be tracked in git to be picked up by the tool. If it isn't tracked in git, it won't be considered - even if it listed in a component's
additional_includes
file. - If a file located in the component folder is changed, then the component is considered to be modified.
- If a file listed in the
additional_includes
file (file directly listed, or its parent folder listed) is changed, then the component is considered to be modified. The paths listed in theadditional_includes
file are all assumed to be relative to the location of that file.
Note: A corollary of point 2 above is that if you modify a function in a helper file listed in the
additional_includes
, your component will be considered as modified even if it does not use that function at all. That is why we use quotes around "smart": the logic is not smart enough to detect only the components truly affected by a change (implementing that logic would be a much more complicated task).Note: Another corollary of point 2 is that if you want to use the "smart" mode, you need to be as accurate as possible with the files listed in the
additional_includes
, otherwise components might be registered even though the changes didn't really affect them. Imagine the extreme case where you have a hugeutils
directory listed inadditional_includes
instead of the specific list of utils files: every change to that directory, even if not relevant to your component of interest, will trigger the registration. This would defeat the purpose of having a smart mode in the first place.
It is worth reiterating that for the tool to work properly, the name of the compliant branch in your config file should be of the form "
^refs/heads/<YourCompliantBranchName>$
". (Notice how it starts with "^refs/heads/
" and ends with "$
".) However, regular expressions are not supported by the "smart" mode, since there would be some ambiguity in determining the list of modified files when there are several compliant (i.e. reference) branches.
To identify the latest merge into the compliant branch, the tool relies on the Azure DevOps convention that the commit message starts with "Merged PR". If you customize the commit message, please make sure it still starts with "Merged PR", otherwise the "smart" logic will not work properly.
In some (rare) instances, we have seen pull requests being
successfully merged, but with the build failing. In these cases, the new
components introduced/modified in the problematic PR have not been
signed/registered, and unless they are modified by a subsequent PR, they will
not be picked up by the "smart" mode. There are 2 common workarounds to this
issue. The most straightforward is to activate the "all" activation mode in a PR
following the failed build, then revert to "smart" for the PR after that. This
will ensure all components are registered, but will also mess up the
components results recycling logic:
some components will wrongly be considered as a new, hence their results won't
be recycled. The second option is to do a mock PR that just bumps up the
version numbers or adds a dummy comment to
the specification files of the components modified in the problematic PR. This
option has the advantage of not interfering with components results recycling,
but is harder to implement if the problematic PR affects many components.
Preparation step
In this section, we briefly describe the workflow of the prepare
command in the shrike
library, that is
- Search all Azure ML components in the working directory by matching the glob path of component specification files,
- Add repo and commit info to "tags" and "description" section of
spec.yaml
, - Validate all "active" components,
- Build all "active" components, and
- Create files
catlog.json
andcatalog.json.sig
for each "active" component.
Note: While building "active" components, all additional dependency files specified in
.additional_includes
will be copied into the component build folder by theprepare
command. However, for those dependecy files that are not checked into the repository, such as OdinML Jar (from NuGet packages) and .zip files, we need to write extra "tasks" to copy them into the component build folder.
A sample YAML script of preparation step
- task: AzureCLI@2
displayName: Preparation
inputs:
azureSubscription: $(MY_AML_WORKSPACE_SERVICE_CONNECTION)
scriptLocation: inlineScript
scriptType: pscore
inlineScript: |
python -m shrike.build.commands.prepare --configuration-file PATH/TO/MY_CONFIGURATION_FILE
workingDirectory: $(MY_WORK_DIRECTORY)
Customized validation on components (optional)
At the prepare
step of the signing and registering build,
the shrike.build.command.prepare.validate_all_components()
function
executes an azure cli command
az ml component validate --file ${component_spec_path}
to validate
whether the given component spec YAML file has any syntax errors or matches
the strucuture of the pre-defined schema.
Apart from the standard validation via az cli, users can also enforce customized "strict"
validation on Azure ML components. There are two parameters - enable_component_validation
(type: boolean
, default: False
) and component_validation
(type: dict
, default: None
) that could be specified in the
configuration file. If config.enable_component_validation is True
,
it will first check whether the components are compliant, then run
the user-provided customized validation.
We expect users to write JSONPath expressions
to query Azure ML component spec YAML elements. For example,
the path of component name is $.name
, while the path of
image is $.environment.docker.image
.
Then, users are expected to translate their specific "strict" validation rules to
regular expression patterns. For example, enforcing the component
name to start with "smartreply." could be translated to
a string pattern ^smartreply.[A-Za-z0-9-_.]+$
.
After that, the JSONPath expressions and corresponding regular expressions
will be combined into a dict and assigned to config.component_validation
in the configuration file.
Assuming we enforce two "strict" validation requirements on
the component: (1)
the component name starts with smartreply.
, (2)
all the input parameter descriptions start with a
capital letter. Below is an example of the configuration file that specifies
the above two validation requirements.
activation_method: all
compliant_branch: ^refs/heads/develop$
component_specification_glob: 'components/**/module_spec.yaml'
log_format: '[%(name)s][%(levelname)s] - %(message)s'
signing_mode: aml
workspaces:
- /subscriptions/<Subscription-Id>/resourcegroups/<Name-Of-Resource-Group>/providers/Microsoft.MachineLearningServices/workspaces/<Azure-ML-Workspace-Name>
allow_duplicate_versions: True
use_build_number: True
# strict component validation
enable_component_validation: True
component_validation:
'$.name': '^smartreply.[A-Za-z0-9-_.]+$'
'$.inputs..description': '^[A-Z].*'
Please refer to this proposal doc for more details on the customized validation.
ESRP CodeSign
After creating catlog.json
and catalog.json.sig
files for each built component in the preparation step, we leverage the ESRP, that is Engineer Sercurity and Release Platform, to sign
the contents of components. In the sample YAML script below, we need to customize ConnectedServiceName
and FolderPath
. In TEEGit
repo, the name of ESRP service connection for Torus tenant
(Tenant Id: cdc5aeea-15c5-4db6-b079-fcadd2505dc2) is Substrate AI ESRP
. For other repos, if the service connection for ESRP has not been set up yet, please refer to the
ESRP CodeSign task Wiki for detailed instructions.
- task: EsrpCodeSigning@1
displayName: ESRP CodeSigning
inputs:
ConnectedServiceName: $(MY_ESRP_SERVICE_CONNECTION)
FolderPath: $(MY_WORK_DIRECTORY)
Pattern: '*.sig'
signConfigType: inlineSignParams
inlineOperation: |
[
{
"KeyCode": "CP-460703-Pgp",
"OperationCode": "LinuxSign",
"parameters": {},
"toolName": "sign",
"toolVersion": "1.0"
}
]
SessionTimeout: 20
VerboseLogin: true
Note: This step requires one-time authorization from the administrator of your ESRP service connection. Please contact your manager or tech lead for authorization questions.
Component registration
The last step is to register all signed components in your Azure ML workspaces. The register
class in the shrike
library implements the registration procedure by executing the
Azure CLI command az ml component --create --file {component}
. The Python call is
python -m shrike.build.commands.register --configuration-file path/to/config
register
class can detect signed and built components.
There are five configuration parameters related to the registration step: --compliant-branch
, --source-branch
, --fail-if-version-exists
, --use-build-number
, and --all-component-version
. They should be customized in the configure-file
according to your specific use case.
- The
register
class checks whether the value ofsource_branch
matches that ofcompliant_branch
before starting registration. If their pattern doesn't match, an error message will be logged and the registration step will be terminated. - If
fail_if_version_exists
is True, an error is raised and the registration step is terminated when the version number of some signed component already exists in the workspace; Otherwise, only a warning is raised and the registration step continues. - If
all_component_version
is notNone
, the value ofall_component_version
is used as the version number for all signed components. - If
use_build_number
is True, the build number is used as the version number for all signed components (Overriding the value ofall_component_version
ifall_component_version
is notNone
).
A sample YAML task for registration is
- task: AzureCLI@2
displayName: AML Component Registration
inputs:
azureSubscription: $(MY_AML_WORKSPACE_SERVICE_CONNECTION)
scriptLocation: inlineScript
scriptType: pscore
inlineScript: |
python -m shrike.build.commands.register --configuration-file PATH/TO/MY_CONFIGURATION_FILE
workingDirectory: $(MY_WORK_DIRECTORY)
Note: The
shrike
library is version-aware. For a component of product-ready version number (e.g., a.b.c), it is set as the default version in the registration step; Otherwise, for a component of non-product-ready version number (e.g., a.b.c-alpha), it will not be labelled as default.
Handling components which use binaries
For some components (e.g., Linux/Windows components running .NET Core DLLs or Windows Exes, or HDI components leveraging the ODIN-ML JAR or Spark .NET), the signed snapshot needs to contain some binaries. As long as those binaries are compiled from human-reviewed source code or come from internal (authenticated) feeds, this is fine. Teams may inject essentially arbitrary logic into their Azure DevOps pipeline, either for compiling C# code, or downloading \& extracting NuGets from the Polymer NuGet feed.
Æther-style code signing
This tool also assists with Æther-style code signing. Just write a configuration file like:
component_specification_glob: '**/ModuleAutoApprovalManifest.json'
signing_mode: aether
and then run a code signing step like this just after the "prepare" command. Note: your ESRP service connection will need to have access to the CP-230012
key, otherwise you'll encounter the error described in:
Got unauthorized to access CP-230012 when calling Aether-style signing service
- task: EsrpCodeSigning@1
displayName: sign modules
inputs:
ConnectedServiceName: $(MY_ESRP_SERVICE_CONNECTION)
FolderPath: $(MY_WORK_DIRECTORY)
Pattern: '*.cat'
signConfigType: inlineSignParams
inlineOperation: |
[
{
"keyCode": "CP-230012",
"operationSetCode": "SigntoolSign",
"parameters": [
{
"parameterName": "OpusName",
"parameterValue": "Microsoft"
},
{
"parameterName": "OpusInfo",
"parameterValue": "http://www.microsoft.com"
},
{
"parameterName": "PageHash",
"parameterValue": "/NPH"
},
{
"parameterName": "FileDigest",
"parameterValue": "/fd sha256"
},
{
"parameterName": "TimeStamp",
"parameterValue": "/tr \"http://rfc3161.gtm.corp.microsoft.com/TSS/HttpTspServer\" /td sha256"
}
],
"toolName": "signtool.exe",
"toolVersion": "6.2.9304.0"
}
]
SessionTimeout: 20
VerboseLogin: true
Æther does not support "true" CI/CD, but you will be able to use your build drops to register compliant Æther modules following Signed Builds.
For reference, you may imitate this build used by the AML Data Science team.
Note: there is no need to run the Azure ML-style and Æther-style code signing in separate jobs. So long as they both run in a Windows VM, it may be the same job.
Per-component builds
If you want your team to be able to manually trigger "Æther-style" per-component builds from their compliant branches, consider creating a separate build definition with the following changes.
Top of build definition.
name: $(Date:yyyyMMdd)$(Rev:.r)-dev
parameters:
- name: aml_component
type: string
default: '**'
```
Inline script portion of your "prepare" and "register" steps (you will need to customize the configuration file name and glob to your repository).
```bash
python -m shrike.build.commands.register --configuration-file sign-register-config-dev.yaml --component-specification-glob src/steps/${{ parameters.aml_component }}/component_spec.yaml
Then, members of your team can manually trigger builds via the Azure DevOps UI, setting the aml_component
parameter to the name of the component they want to code-sign and register.
Tips
- Another way of achieving similar functionality is to run several "smart mode" builds which trigger against all
compliant/*
branches. To do so, you will need several build config files, one for each compliant branch. - Name your builds something like
*-dev
so that these versions of the components don't get registered as default. - See the Search Relevance team's example for something complete: [yaml] SearchRelevance AML Components Signing and Registering - dev.