When you run a container in a pod, you might want to run a command at the start-up. The process consists of 2 stages – at the container (docker) level and at the Kubernetes level.
Continue reading “[Kubernetes] Commands”[Spark By Example] Spark SQL – UDFs
In Spark SQL, you can define your custom functions and use them in the SQL statement. The following example shows how to create a very simple UDF, register it, and use it in the SQL.
Continue reading “[Spark By Example] Spark SQL – UDFs”[Spark By Example] DataFrameReader
DataFrameReader is an interface to load a DataFrame from external sources.
You cannot create the DataFrameReader object, but you can access it through the “SparkSession.read” property.
Continue reading “[Spark By Example] DataFrameReader”[Spark By Example] Spark SQL – Grouping
Let’s play with Spark SQL more.
[Note] When the underlying DataFrame schema is changed, the view should be updated again.
Continue reading “[Spark By Example] Spark SQL – Grouping”[Spark By Example] Spark SQL – TempView
With Spark SQL, you can use the familiar SQL syntax to query the data.
Continue reading “[Spark By Example] Spark SQL – TempView”[Spark By Example] Aggregation
The following sample code (by Python and C#) shows how to group DataFrame and compute the aggregated values such as Max, Min, or Average.
Continue reading “[Spark By Example] Aggregation”[Spark By Example] Date & Timestamp
Handling date and time is one of most important part of data processing. But due to its complex formats, you need to how to convert types and manipulate date and time.
Continue reading “[Spark By Example] Date & Timestamp”[Spark By Example] DataFrame Columns
The following sample code (by Python and C#) shows how to handle columns in the DataFrame. You can check the number of columns, add a new column, rename the existing column, or even remove columns.
Continue reading “[Spark By Example] DataFrame Columns”