Examples

The simplest way to use this library is with the common pySpark entry script. It will expect command line arguments --zipFile and --binaryName, the values of which will be used to determine the archive and binary inside that archive to invoke using the .NET runner. All other command line arguments are passed directly to the compiled assembly.

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""
Demonstrate minimal use of the Spark .NET runner functionality.
"""

from shrike.spark import run_spark_net


if __name__ == "__main__":
    run_spark_net()

The compiled C# assembly is responsible for prefixing log statements with SystemLog: and catching + prefixing exception stack traces. Long-term, that functionality will be provided by This library could provide a C# version of the logging functionality, but for now you will have to home-brew your own.

It is possible to easily customize the command-line arguments for "zip file" and "assembly name", e.g. like this.

run_spark_net("--zip-file", "--assembly-name")

For more advanced configuration, e.g. customizing the Spark session, use the run_spark_net_from_known_assembly method like below.

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""
Demonstrate advanced use of the Spark .NET runner functionality.
"""

import argparse
from pyspark.sql import SparkSession
from shrike.spark import run_spark_net_from_known_assembly


def main(args):
    spark = SparkSession.builder.appName(args.app_name).getOrCreate()

    run_spark_net_from_known_assembly(
        spark, "dotnet-publish.zip", "assembly-name", ["--input", args.input_path]
    )


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--app-name")
    parser.add_argument("--input-path")
    args = parser.parse_args()
    main(args)