The following sample code (by Python and C#) shows how to read CSV file without schema.
Continue reading “[Spark By Example] Read CSV”Author Archives: Pyongwon Lee
[Spark By Example] Word Count
The following sample code (by Python and C#) shows how to count the word in a text file.
Continue reading “[Spark By Example] Word Count”Setting up .NET for Apache Spark
In general, you are developing Spark application using Scala, Python, or R. But do not panic if you are a C# developer. .NET for Apache Spark provides high-level APIs for using Spark from C#.
https://learn.microsoft.com/en-us/dotnet/spark/
Continue reading “Setting up .NET for Apache Spark”Install PySpark on Windows
The first step of working with big data is to set up your environment. For learning and testing purposes, you can set the environment in a single machine. Let’s install PySpark on Windows 10 or Windows 11.
Continue reading “Install PySpark on Windows”[DevOps] Deployment Strategies
Deployment strategies define how you want to deliver your software.
Deployment Strategies – Introduction to DevOps on AWS (amazon.com)
Continue reading “[DevOps] Deployment Strategies”Docker Logging Drivers
A logging driver is a pluggable framework for accessing log data from Docker services and containers.
Continue reading “Docker Logging Drivers”Kubernetes – Networking
Kubernetes mangages many pods and containers. The management of networking in a cluster is not a simple matter. Kubernetes is doing the job using the CNI (Container Network Interface) plugins. There are many CNI providers, such as Flannel, Calico, Canal, and Weave Net.
Continue reading “Kubernetes – Networking”Docker – Working with Images
In this post, let’s play with images through a couple of tutorials.
- [Tutorial 1] Create a Docker image (Ubuntu 20.04 + Python3) with a “hello-world” application written in Python.
- [Tutorial 2] Create an nginx image and connect the host port to the nginx port in the container
