Scriptorium

[Kubernetes] Commands

When you run a container in a pod, you might want to run a command at the start-up. The process consists of 2 stages – at the container (docker) level and at the Kubernetes level.

[Spark By Example] Spark SQL – UDFs

In Spark SQL, you can define your custom functions and use them in the SQL statement. The following example shows how to create a very simple UDF, register it, and use it in the SQL.

[Spark By Example] DataFrameReader

DataFrameReader is an interface to load a DataFrame from external sources.

You cannot create the DataFrameReader object, but you can access it through the “SparkSession.read” property.

[Spark By Example] Spark SQL – Grouping

Let’s play with Spark SQL more.

[Note] When the underlying DataFrame schema is changed, the view should be updated again.

[Spark By Example] Spark SQL – TempView

With Spark SQL, you can use the familiar SQL syntax to query the data.

[Spark By Example] Aggregation

The following sample code (by Python and C#) shows how to group DataFrame and compute the aggregated values such as Max, Min, or Average.

[Spark By Example] Date & Timestamp

Handling date and time is one of most important part of data processing. But due to its complex formats, you need to how to convert types and manipulate date and time.

[Spark By Example] DataFrame Columns

The following sample code (by Python and C#) shows how to handle columns in the DataFrame. You can check the number of columns, add a new column, rename the existing column, or even remove columns.