10:45 AM Sunday Room: SC-127
Recent releases of Spark machine learning libraries have shifted focus from the individual algorithms approach of the spark.mllib package to the data-driven pipelines approach of spark.ml. We will look at how to structure ML processes of data loading, modeling, predictions, and results analysis and distribution using the latest spark.ml api's.
Note: this year's session will focus only on the scala API's.
We will touch on one or more of the algorithms in the following areas:
- Dimensionality Reduction / Feature extraction
- Clustering
- Classification and Regression
Depending on time available we may also touch on the following topics:
- Statistical tools
- Data generation and randomization
- Evaluators