From Prototyping to Deployment at Scale with R and sparklyr


Sparklyr has enabled data scientists to use familiar R and tidyverse syntax to interactively analyze data and build models at scale via Apache Spark. However, one common pain point in organizations is operationalizing these models either in a batch prediction or real-time scoring setting. With support for Spark ML pipelines in sparklyr, data scientists can use R to build pipelines that are fully interoperable with Scala using a familiar API.

For real-time scoring, an R interface to MLeap, an open source engine for serializing and serving Spark ML models, is provided. These functionalities faciliate collaboration among data scientists and implementation engineers and shorten time to production. We discuss the mechanics of sparklyr ML pipelines and demonstrate an end-to-end example.

Spark+AI Summit
San Francisco