Examples
The simplest way to use this library is with the common pySpark entry script.
It will expect command line arguments --zipFile
and --binaryName
, the values
of which will be used to determine the archive and binary inside that archive
to invoke using the .NET runner. All other command line arguments are passed
directly to the compiled assembly.
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
Demonstrate minimal use of the Spark .NET runner functionality.
"""
from shrike.spark import run_spark_net
if __name__ == "__main__":
run_spark_net()
The compiled C# assembly is responsible for prefixing log statements
with SystemLog:
and catching + prefixing exception stack traces. Long-term,
that functionality will be provided by
This library could provide a C# version of the logging functionality,
but for now you will have to home-brew your own.
It is possible to easily customize the command-line arguments for "zip file" and "assembly name", e.g. like this.
run_spark_net("--zip-file", "--assembly-name")
For more advanced configuration, e.g. customizing the Spark session, use the
run_spark_net_from_known_assembly
method like below.
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
Demonstrate advanced use of the Spark .NET runner functionality.
"""
import argparse
from pyspark.sql import SparkSession
from shrike.spark import run_spark_net_from_known_assembly
def main(args):
spark = SparkSession.builder.appName(args.app_name).getOrCreate()
run_spark_net_from_known_assembly(
spark, "dotnet-publish.zip", "assembly-name", ["--input", args.input_path]
)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--app-name")
parser.add_argument("--input-path")
args = parser.parse_args()
main(args)