Building Spark ML pipelines with sparklyr

Abstract

We provide an overview of the recently implemented Pipelines API in sparklyr, an R package for interfacing with Apache Spark. This new feature allows users to build and tune data transformation and machine learning pipelines that are interoperable with Scala and Python, simplifying handoffs between data science and data engineering. We go over the components of pipelines and walk through practical examples.

Date

Feb 3, 2018

Event

rstudio::conf

Location

San Diego

Links

Slides Video