Microsoft ML for Apache Spark ·

11 Jun 2024, 00:00

MMLSpark can be conveniently installed on existing Spark clusters via the --packages option, examples:

spark-shell --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
‍
pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1

This can be used in other Spark contexts too, for example, you can use MMLSpark in AZTK by adding it to the .aztk/spark-default.conf file.

Step 1: Create a Databricks account

Step 2: Install MMLSpark

To install MMLSpark on the Databricks cloud, create a new library from Maven coordinates in your workspace. For the coordinates use: com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1. Next, ensure this library is attached to your cluster (or all clusters). Finally, ensure that your Spark cluster has Spark 2.3 and Scala 2.11. You can use MMLSpark in both your Scala and PySpark notebooks.

Step 3: Load our Examples (Optional)

To load our examples, right click in your workspace, click "import" and use the following URL:

https://mmlspark.blob.core.windows.net/dbcs/MMLSpark%20Examples%20v1.0.0-rc1.dbc

The easiest way to evaluate MMLSpark is via our pre-built Docker container. To do so, run the following command:

docker run -it -p 8888:8888 mcr.microsoft.com/mmlspark/release

To try out MMLSpark on a Python (or Conda) installation first install PySpark via pip with pip install pyspark. Next, use --packages or add the package at runtime to get the scala sources

import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp")\
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1")\
.getOrCreate()
import mmlspark

If you are building a Spark application in Scala, add the following lines to your build.sbt:

resolvers += "MMLSpark Repo" at "https://mmlspark.azureedge.net/maven"
libraryDependencies += "com.microsoft.ml.spark" %% "mmlspark" % "1.0.0-rc1"

ncG1vNJzZmilnaHAsa3RpGWbpJ%2BXe6S70Z5lsKGemby4v42nnK1np5qvtLXTnmaippSaxW%2B006aj

Microsoft ML for Apache Spark

MMLSpark can be conveniently installed on existing Spark clusters via the--packagesoption, examples: spark-shell --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1

Step 1: Create a Databricks account

Step 2: Install MMLSpark

Step 3: Load our Examples (Optional)

Share!