This is a short note on how to deal with Parquet files with Spark.
Previously I showed how to write parquet files using just parquet library.
But Spark SQL has built-in support for Parquet data format, which makes processing data in parquet files easy using simple DataFrames API.
Reading DataFrame from parquet is simple as:
1
|
|
And writing data to parquet files:
1
|
|
More advanced topics like partitioning and schema merging will be covered later.