The following sample code (by Python and C#) shows how to group DataFrame and compute the aggregated values such as Max, Min, or Average.
Continue reading “[Spark By Example] Aggregation”[Spark By Example] Date & Timestamp
Handling date and time is one of most important part of data processing. But due to its complex formats, you need to how to convert types and manipulate date and time.
Continue reading “[Spark By Example] Date & Timestamp”[Spark By Example] DataFrame Columns
The following sample code (by Python and C#) shows how to handle columns in the DataFrame. You can check the number of columns, add a new column, rename the existing column, or even remove columns.
Continue reading “[Spark By Example] DataFrame Columns”[Spark By Example] DataFrame Query
The following sample code (by Python and C#) shows how to read query data in the DataFrame. Once you have a DataFrame object, you can query the data using the SQL-like syntax regardless of the origin of data.
Continue reading “[Spark By Example] DataFrame Query”[Spark By Example] Read JSON – Complex Type
The following sample code (by Python and C#) shows how to read JSON file with complex objects.
CSV handles the flat data structure. But with JSON, you can read complex data structure into the DataFrame.
Continue reading “[Spark By Example] Read JSON – Complex Type”[Spark By Example] Read JSON – Array Type
The following sample code (by Python and C#) shows how to read JSON file with array data.
With JSON, it is easy to specify the schema. Therefore, you can directly parse the array data into the DataFrame.
Continue reading “[Spark By Example] Read JSON – Array Type”[Spark By Example] Read JSON with Schema
The following sample code (by Python and C#) shows how to read JSON file with schema. With JSON, it is always a good idea to provide the schema for your data.
Continue reading “[Spark By Example] Read JSON with Schema”[Spark By Example] Read CSV – Array Type
The following sample code (by Python and C#) shows how to read CSV file with a column of array.
CSV does not support complex objects such as an array. To make it work, you need to pass a JSON array in CSV and parse it. You will learn this trick here.
JSON array format is like this.
["Ford", "Toyota", "BMW", "Fiat"]