MMLSpark can be conveniently installed on existing Spark clusters via the --packages option, examples:
spark-shell --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
This can be used in other Spark contexts too, for example, you can use MMLSpark in AZTK by adding it to the .aztk/spark-default.conf file.
Step 1: Create a Databricks account
Step 2: Install MMLSpark
To install MMLSpark on the Databricks cloud, create a new library from Maven coordinates in your workspace. For the coordinates use: com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1. Next, ensure this library is attached to your cluster (or all clusters). Finally, ensure that your Spark cluster has Spark 2.3 and Scala 2.11. You can use MMLSpark in both your Scala and PySpark notebooks.![](https://cdn.statically.io/img/mmlspark.blob.core.windows.net/website/images/databricks_library.jpeg)
Step 3: Load our Examples (Optional)
To load our examples, right click in your workspace, click "import" and use the following URL:
https://mmlspark.blob.core.windows.net/dbcs/MMLSpark%20Examples%20v1.0.0-rc1.dbc
The easiest way to evaluate MMLSpark is via our pre-built Docker container. To do so, run the following command:
docker run -it -p 8888:8888 mcr.microsoft.com/mmlspark/release
To try out MMLSpark on a Python (or Conda) installation first install PySpark via pip with pip install pyspark. Next, use --packages or add the package at runtime to get the scala sources
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp")\
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1")\
.getOrCreate()
import mmlspark
If you are building a Spark application in Scala, add the following lines to your build.sbt:
resolvers += "MMLSpark Repo" at "https://mmlspark.azureedge.net/maven"
libraryDependencies += "com.microsoft.ml.spark" %% "mmlspark" % "1.0.0-rc1"
ncG1vNJzZmilnaHAsa3RpGWbpJ%2BXe6S70Z5lsKGemby4v42nnK1np5qvtLXTnmaippSaxW%2B006aj