Creating an Azure DevOps Build for Signing and Registering

By reading through this doc, you will be able to

have a high-level understanding of how to use shrike.build, and
create a single-YAML pipeline build in Azure DevOps for validating, signing and registering Azure ML components.

Requirements

To enjoy this tutorial, you need to

have at least one Azure ML component YAML specification file in your team's repository,
have an Azure ML service connection set up in your Azure DevOps for your Azure subscription,
have an ESRP service connection set up in your Azure DevOps, and
have a basic knowledge of Azure DevOps pipeline YAML schema.

Configuration

Command line arguments and configuration YAML file are both supported by shrike.build. The order of precedence from least to greatest (the last listed variables override all other variables) is: default values, configuration file, command line arguments.

An example of configuration YAML file:

# Choose from two signing mode: aml, or aether
signing_mode: aml

# Two methods are provided to find "active" components: all, or smart
# For "all" option, all the components will be validated/signed/registered
# For "smart" option, only those changed components will be processed.
activation_method: all

# Regular expression that a branch must satisfy in order for this code to
# sign or register components.
compliant_branch: ^refs/heads/main$

# Glob path of all component specification files.
component_specification_glob: 'steps/**/module_spec.yaml'

log_format: '[%(name)s][%(levelname)s] - %(message)s'

# List of workspace ARM IDs (fill in the <> with the appropriate values for your Azure ML workspace)
workspaces: 
- /subscriptions/<Subscription-Id>/resourcegroups/<Name-Of-Resource-Group>/providers/Microsoft.MachineLearningServices/workspaces/<Azure-ML-Workspace-Name>

# Boolean argument: What to do when the same version of a component has already been registered.
# Default: False
fail_if_version_exists: False

# Boolean argument: Will the build number be used or not
use_build_number: True

To consume this configuration file, we should pass its path to the command line, that is

python -m shrike.build.commands.prepare --configuration-file PATH/TO/MY_CONFIGURATION_FILE

If we want to override the values of activation_method and fail_if_version_exists at runtime, we should append them to the command line:

python -m shrike.build.commands.prepare --configuration-file PATH/TO/MY_CONFIGURATION_FILE --activation-method smart --fail-if-version-exists

"Smart" mode

The shrike package supports a "smart" activation_method. To use it, just include the following line to your build configuration file.

activation_method: smart

Using this "smart" mode will only register the components that were modified, given a list of modified files. The logic used to identify which components are modified is as follows.

The modified file needs to be tracked in git to be picked up by the tool. If it isn't tracked in git, it won't be considered - even if it listed in a component's additional_includes file.
If a file located in the component folder is changed, then the component is considered to be modified.
If a file listed in the additional_includes file (file directly listed, or its parent folder listed) is changed, then the component is considered to be modified. The paths listed in the additional_includes file are all assumed to be relative to the location of that file.

Note: A corollary of point 2 above is that if you modify a function in a helper file listed in the additional_includes, your component will be considered as modified even if it does not use that function at all. That is why we use quotes around "smart": the logic is not smart enough to detect only the components truly affected by a change (implementing that logic would be a much more complicated task).

Note: Another corollary of point 2 is that if you want to use the "smart" mode, you need to be as accurate as possible with the files listed in the additional_includes, otherwise components might be registered even though the changes didn't really affect them. Imagine the extreme case where you have a huge utils directory listed in additional_includes instead of the specific list of utils files: every change to that directory, even if not relevant to your component of interest, will trigger the registration. This would defeat the purpose of having a smart mode in the first place.

It is worth reiterating that for the tool to work properly, the name of the compliant branch in your config file should be of the form "^refs/heads/<YourCompliantBranchName>$". (Notice how it starts with "^refs/heads/" and ends with "$".) However, regular expressions are not supported by the "smart" mode, since there would be some ambiguity in determining the list of modified files when there are several compliant (i.e. reference) branches.

To identify the latest merge into the compliant branch, the tool relies on the Azure DevOps convention that the commit message starts with "Merged PR". If you customize the commit message, please make sure it still starts with "Merged PR", otherwise the "smart" logic will not work properly.

In some (rare) instances, we have seen pull requests being successfully merged, but with the build failing. In these cases, the new components introduced/modified in the problematic PR have not been signed/registered, and unless they are modified by a subsequent PR, they will not be picked up by the "smart" mode. There are 2 common workarounds to this issue. The most straightforward is to activate the "all" activation mode in a PR following the failed build, then revert to "smart" for the PR after that. This will ensure all components are registered, but will also mess up the components results recycling logic: some components will wrongly be considered as a new, hence their results won't be recycled. The second option is to do a mock PR that just bumps up the version numbers or adds a dummy comment to the specification files of the components modified in the problematic PR. This option has the advantage of not interfering with components results recycling, but is harder to implement if the problematic PR affects many components.

Preparation step

In this section, we briefly describe the workflow of the prepare command in the shrike library, that is

Search all Azure ML components in the working directory by matching the glob path of component specification files,
Add repo and commit info to "tags" and "description" section of spec.yaml,
Validate all "active" components,
Build all "active" components, and
Create files catlog.json and catalog.json.sig for each "active" component.

Note: While building "active" components, all additional dependency files specified in .additional_includes will be copied into the component build folder by the prepare command. However, for those dependecy files that are not checked into the repository, such as OdinML Jar (from NuGet packages) and .zip files, we need to write extra "tasks" to copy them into the component build folder.

A sample YAML script of preparation step

- task: AzureCLI@2
  displayName: Preparation
  inputs:
    azureSubscription: $(MY_AML_WORKSPACE_SERVICE_CONNECTION)
    scriptLocation: inlineScript
    scriptType: pscore
    inlineScript: |
      python -m shrike.build.commands.prepare --configuration-file PATH/TO/MY_CONFIGURATION_FILE
    workingDirectory: $(MY_WORK_DIRECTORY)

Customized validation on components (optional)

At the prepare step of the signing and registering build, the shrike.build.command.prepare.validate_all_components() function executes an azure cli command az ml component validate --file ${component_spec_path} to validate whether the given component spec YAML file has any syntax errors or matches the strucuture of the pre-defined schema.

Apart from the standard validation via az cli, users can also enforce customized "strict" validation on Azure ML components. There are two parameters - enable_component_validation (type: boolean, default: False) and component_validation (type: dict, default: None) that could be specified in the configuration file. If config.enable_component_validation is True, it will first check whether the components are compliant, then run the user-provided customized validation.

We expect users to write JSONPath expressions to query Azure ML component spec YAML elements. For example, the path of component name is $.name, while the path of image is $.environment.docker.image. Then, users are expected to translate their specific "strict" validation rules to regular expression patterns. For example, enforcing the component name to start with "smartreply." could be translated to a string pattern ^smartreply.[A-Za-z0-9-_.]+$. After that, the JSONPath expressions and corresponding regular expressions will be combined into a dict and assigned to config.component_validation in the configuration file.

Assuming we enforce two "strict" validation requirements on the component: (1) the component name starts with smartreply., (2) all the input parameter descriptions start with a capital letter. Below is an example of the configuration file that specifies the above two validation requirements.

activation_method: all
compliant_branch: ^refs/heads/develop$
component_specification_glob: 'components/**/module_spec.yaml'
log_format: '[%(name)s][%(levelname)s] - %(message)s'
signing_mode: aml
workspaces: 
- /subscriptions/<Subscription-Id>/resourcegroups/<Name-Of-Resource-Group>/providers/Microsoft.MachineLearningServices/workspaces/<Azure-ML-Workspace-Name>
allow_duplicate_versions: True
use_build_number: True

# strict component validation
enable_component_validation: True
component_validation:
  '$.name': '^smartreply.[A-Za-z0-9-_.]+$'
  '$.inputs..description': '^[A-Z].*'

Please refer to this proposal doc for more details on the customized validation.

ESRP CodeSign

After creating catlog.json and catalog.json.sig files for each built component in the preparation step, we leverage the ESRP, that is Engineer Sercurity and Release Platform, to sign the contents of components. In the sample YAML script below, we need to customize ConnectedServiceName and FolderPath. In TEEGit repo, the name of ESRP service connection for Torus tenant (Tenant Id: cdc5aeea-15c5-4db6-b079-fcadd2505dc2) is Substrate AI ESRP. For other repos, if the service connection for ESRP has not been set up yet, please refer to the ESRP CodeSign task Wiki for detailed instructions.

- task: EsrpCodeSigning@1
    displayName: ESRP CodeSigning
    inputs:
    ConnectedServiceName: $(MY_ESRP_SERVICE_CONNECTION)
    FolderPath: $(MY_WORK_DIRECTORY)
    Pattern: '*.sig'
    signConfigType: inlineSignParams
    inlineOperation: |
      [
        {
          "KeyCode": "CP-460703-Pgp",
          "OperationCode": "LinuxSign",
          "parameters": {},
          "toolName": "sign",
          "toolVersion": "1.0"
        }
      ]
    SessionTimeout: 20
    VerboseLogin: true

Note: This step requires one-time authorization from the administrator of your ESRP service connection. Please contact your manager or tech lead for authorization questions.

Component registration

The last step is to register all signed components in your Azure ML workspaces. The register class in the shrike library implements the registration procedure by executing the Azure CLI command az ml component --create --file {component}. The Python call is

python -m shrike.build.commands.register --configuration-file path/to/config

In this step, the register class can detect signed and built components. There are five configuration parameters related to the registration step: --compliant-branch, --source-branch, --fail-if-version-exists, --use-build-number, and --all-component-version. They should be customized in the configure-file according to your specific use case.

The register class checks whether the value of source_branch matches that of compliant_branch before starting registration. If their pattern doesn't match, an error message will be logged and the registration step will be terminated.
If fail_if_version_exists is True, an error is raised and the registration step is terminated when the version number of some signed component already exists in the workspace; Otherwise, only a warning is raised and the registration step continues.
If all_component_version is not None, the value of all_component_version is used as the version number for all signed components.
If use_build_number is True, the build number is used as the version number for all signed components (Overriding the value of all_component_version if all_component_version is not None).

A sample YAML task for registration is

- task: AzureCLI@2
    displayName: AML Component Registration
    inputs:
    azureSubscription: $(MY_AML_WORKSPACE_SERVICE_CONNECTION)
    scriptLocation: inlineScript
    scriptType: pscore
    inlineScript: |
      python -m shrike.build.commands.register --configuration-file PATH/TO/MY_CONFIGURATION_FILE
    workingDirectory: $(MY_WORK_DIRECTORY)

Note: The shrike library is version-aware. For a component of product-ready version number (e.g., a.b.c), it is set as the default version in the registration step; Otherwise, for a component of non-product-ready version number (e.g., a.b.c-alpha), it will not be labelled as default.

Handling components which use binaries

For some components (e.g., Linux/Windows components running .NET Core DLLs or Windows Exes, or HDI components leveraging the ODIN-ML JAR or Spark .NET), the signed snapshot needs to contain some binaries. As long as those binaries are compiled from human-reviewed source code or come from internal (authenticated) feeds, this is fine. Teams may inject essentially arbitrary logic into their Azure DevOps pipeline, either for compiling C# code, or downloading \& extracting NuGets from the Polymer NuGet feed.

Æther-style code signing

This tool also assists with Æther-style code signing. Just write a configuration file like:

component_specification_glob: '**/ModuleAutoApprovalManifest.json'
signing_mode: aether

and then run a code signing step like this just after the "prepare" command. Note: your ESRP service connection will need to have access to the CP-230012 key, otherwise you'll encounter the error described in:

Got unauthorized to access CP-230012 when calling Aether-style signing service

- task: EsrpCodeSigning@1
  displayName: sign modules
  inputs:
    ConnectedServiceName: $(MY_ESRP_SERVICE_CONNECTION)
    FolderPath: $(MY_WORK_DIRECTORY)
    Pattern: '*.cat'
    signConfigType: inlineSignParams
    inlineOperation: |
      [
        {
          "keyCode": "CP-230012",
          "operationSetCode": "SigntoolSign",
          "parameters": [
              {
                "parameterName": "OpusName",
                "parameterValue": "Microsoft"
              },
              {
                "parameterName": "OpusInfo",
                "parameterValue": "http://www.microsoft.com"
              },
              {
                "parameterName": "PageHash",
                "parameterValue": "/NPH"
              },
              {
                "parameterName": "FileDigest",
                "parameterValue": "/fd sha256"
              },
              {
                "parameterName": "TimeStamp",
                "parameterValue": "/tr \"http://rfc3161.gtm.corp.microsoft.com/TSS/HttpTspServer\" /td sha256"
              }
          ],
          "toolName": "signtool.exe",
          "toolVersion": "6.2.9304.0"
        }
      ]
    SessionTimeout: 20
    VerboseLogin: true

Æther does not support "true" CI/CD, but you will be able to use your build drops to register compliant Æther modules following Signed Builds.

For reference, you may imitate this build used by the AML Data Science team.

Note: there is no need to run the Azure ML-style and Æther-style code signing in separate jobs. So long as they both run in a Windows VM, it may be the same job.

Per-component builds

If you want your team to be able to manually trigger "Æther-style" per-component builds from their compliant branches, consider creating a separate build definition with the following changes.

Top of build definition.

name: $(Date:yyyyMMdd)$(Rev:.r)-dev

parameters:
- name: aml_component
  type: string
  default:  '**'
  ```

Inline script portion of your "prepare" and "register" steps (you will need to customize the configuration file name and glob to your repository).

```bash
python -m shrike.build.commands.register --configuration-file sign-register-config-dev.yaml --component-specification-glob src/steps/${{ parameters.aml_component }}/component_spec.yaml

Then, members of your team can manually trigger builds via the Azure DevOps UI, setting the aml_component parameter to the name of the component they want to code-sign and register.

Tips

Another way of achieving similar functionality is to run several "smart mode" builds which trigger against all compliant/* branches. To do so, you will need several build config files, one for each compliant branch.
Name your builds something like *-dev so that these versions of the components don't get registered as default.
See the Search Relevance team's example for something complete: [yaml] SearchRelevance AML Components Signing and Registering - dev.