Thoughts Resampled

Anatoliy Plastinin's Blog

Spark SQL and Parquet files

This is a short note on how to deal with Parquet files with Spark.

Previously I showed how to write parquet files using just parquet library.

But Spark SQL has built-in support for Parquet data format, which makes processing data in parquet files easy using simple DataFrames API.

Reading DataFrame from parquet is simple as:

val df ="s3a://bucket/data/")

And writing data to parquet files:


More advanced topics like partitioning and schema merging will be covered later.