Kubernetes Architecture: Mastering Container Orchestration

Kubernetes, often called K8s, is an open-source platform designed to automate containerized applications' deployment, scaling, and management. Understanding its architecture is fundamental for effectively managing and deploying applications at scale. This comprehensive guide delves into the intricate details of Kubernetes architecture, exploring each component and its roles within the ecosystem.

Introduction

Kubernetes, a Greek word for "helmsman," serves as the ship's captain in the world of container orchestration. Originating from Google, Kubernetes has become the de facto standard for managing containerized applications. It provides a resilient framework to run distributed systems efficiently, enabling high availability, scalability, and ease of deployment.

World Largest Container Ship Leaves The UK

Before continuing with the in-depth explanation of the Kubernetes Architecture, firstly, let's understand a few important concepts, REST and gRPC.

What are REST and gRPC?

In Kubernetes, REST (Representational State Transfer) and gRPC (gRPC Remote Procedure Call) are two common communication protocols used for interactions between services and with the Kubernetes API server (we will understand the API server provided by the Master Node in K8s later).

Is REST Still A Good API Design Style to Use? | Nordic APIs |

REST in Kubernetes:

REST (Representational State Transfer) is a widely used architectural style for designing networked applications, particularly in Kubernetes. The Kubernetes API server exposes a RESTful interface, allowing users and internal components to perform operations on Kubernetes resources such as pods, services, and deployments. These operations are carried out over HTTP/HTTPS protocols, using standard HTTP methods like GET, POST, PUT, and DELETE. REST's simplicity and ubiquity make it a popular choice for integrating with the Kubernetes API, enabling seamless management and automation of Kubernetes clusters. Furthermore, RESTful APIs facilitate service-to-service communication within a Kubernetes cluster, where microservices can expose their functionality via standardized REST endpoints, promoting interoperability and ease of use.

gRPC in Kubernetes:

gRPC (gRPC Remote Procedure Call) is a high-performance, open-source remote procedure call framework that offers an efficient alternative to REST for service-to-service communication within Kubernetes. Developed by Google, gRPC uses HTTP/2 for transport, and Protocol Buffers (protobuf) for serialization, and provides features such as bi-directional streaming, flow control, and multiplexing of requests. These capabilities make gRPC well-suited for microservices architectures that demand low latency and high throughput. Within a Kubernetes cluster, gRPC can facilitate communication between microservices with greater efficiency and performance compared to REST, especially in environments where real-time data exchange and high-speed communication are critical. By leveraging gRPC, developers can achieve more robust and scalable interactions between services, enhancing the overall efficiency of their Kubernetes deployments.

Kubernetes Architecture Overview

Kubernetes employs a master-slave architecture, comprising the Control Plane (Master Node) and the Data Plane (Worker Nodes). This architecture ensures that the cluster's state is maintained and managed effectively, allowing for seamless orchestration of containerized applications.

Simplified Kubernetes Architecture | by Mohan Pawar | Medium

Master Node (Control Plane)

The Master Node is the brain of the Kubernetes cluster. It manages the cluster's lifecycle, schedules workloads, and monitors the cluster's health. The key components of the Master Node include:

API Server
etcd
Scheduler
Controller Manager
Cloud Controller Manager

Worker Nodes (Data Plane)

Worker Nodes are responsible for running the actual applications. They host the pods, which contain the application containers. Each Worker Node consists of:

Kubelet
Kube-proxy
Pods

Master Node Components

The Master Node orchestrates the cluster, ensuring the desired state of the applications. Let's explore each component in detail.

API Server

The API Server is the front end of the Kubernetes control plane. It exposes the Kubernetes API and serves as the central management entity.

1. Role and Importance

Central Component: The Kubernetes API Server is a crucial element of the Kubernetes cluster. It acts as the gateway for all API requests, handling communication between users and the cluster.
Application Impact: If the API Server fails, it does not impact the running applications on the worker nodes. However, it halts any updates, configuration changes, or interactions with the cluster until it is back online.

2. Authentication Methods

Client Certificates:
- Mechanism: Each user or service uses a unique certificate issued by a Certificate Authority (CA).
- Verification: The API Server verifies these certificates against its trusted CA to authenticate users or services.
Bearer Tokens:
- Mechanism: Tokens are included in the HTTP headers of API requests.
- Verification: The API Server checks these tokens against a list of known tokens or verifies them through a token authentication webhook.
Basic Authentication:
- Mechanism: Uses a static username and password for authentication.
- Usage: Generally used for simpler scenarios or testing, as it lacks robustness compared to other methods.
OAuth2 Tokens:
- Mechanism: Tokens issued by an OAuth2 provider are used for authentication.
- Verification: The API Server verifies these tokens with the OAuth2 provider.
OpenID Connect (OIDC):
- Mechanism: Integrates with identity providers supporting OIDC. Users authenticate with the OIDC provider, which issues a token.
- Verification: The API Server verifies the token with the OIDC provider to ensure its validity.

3. Authorization Methods

Role-Based Access Control (RBAC):
- Mechanism: Manages permissions by defining roles and their associated permissions within specific namespaces.
- Usage: RBAC is the most common method for managing access in Kubernetes, allowing fine-grained control over what actions users can perform.
Attribute-Based Access Control (ABAC):
- Mechanism: Permissions are based on user and resource attributes, with policies defined in an external policy file.
- Evaluation: The API Server evaluates requests against this policy file to determine if the action is permitted.
Webhook Authorization:
- Mechanism: Uses an external service to make authorization decisions.
- Usage: Useful for custom or complex authorization requirements.
AlwaysAllow and AlwaysDeny:
- Mechanism: Simple policies used primarily for testing purposes.
- Usage: AlwaysAllow grants access to all requests, while AlwaysDeny blocks all requests.

4. Mutating Admission Controllers

Purpose: These controllers can modify incoming requests before they are persisted in the cluster. They enforce organizational policies, add default values, or make additional configurations.
Key Functions:
- Automatic Labeling: Adds labels to resources based on criteria, such as adding environment labels to pods.
- Injecting Sidecars: Adds sidecar containers (e.g., monitoring agents) to pod specifications, often used with service meshes like Istio.
- Setting Default Values: Provides default values for fields that users may omit, such as setting resource limits for pods.
Process:
- Intercepting Request: The API Server passes the request to the mutating admission controller before it is saved.
- Processing Mutation: The controller evaluates the request against predefined policies and applies necessary modifications.
- Modifying Request: Alters the request object to comply with organizational policies.
- Forwarding Request: Sends the modified request to the next stage for further processing or persistence.

5. Validating Admission Controllers

Purpose: After mutation, these controllers ensure that the request complies with various rules and constraints.
Functions: Validate that resource quotas are not exceeded, required labels are present, and other compliance checks are met.

Understand Admission Controllers in depth: [kubernetes.io/docs/reference/access-authn-a..authz/admission-controllers/)

6. Execution and Persistence

Process:
- Execution: If the request passes all admission controllers, it is executed according to the specified changes.
- Persistence: The request's changes are saved in etcd, the distributed key-value store that holds the cluster's state.
Post-Persistence:
- Scheduler and Controllers: If the request involves creating or updating resources like pods, the Kubernetes scheduler schedules the pod on an appropriate node, and controllers manage its lifecycle.

The Kubernetes API architecture | by Daniele Polencic | Medium

An amazing blog w.r.t API Architecture: https://medium.com/@danielepolencic/the-kubernetes-api-architecture-81da0ede0e34

etcd

1. Role and Functionality

Database Type: ETCD is a distributed key-value store.
Consistency: Ensures strong consistency, meaning any read request returns the most recent write.
Primary Data Store: Acts as the main data store for all cluster data in Kubernetes.
State Preservation: Maintains the cluster's state and allows for restoration after failures through regular snapshots and backups.
Consensus Algorithm: Uses the Raft algorithm for leader election and data consistency.
Performance: Designed for high-performance read and write operations with low latency.

2. Raft Consensus Algorithm in ETCD

Purpose: Manages a replicated log to ensure all distributed nodes agree on a consistent state.
Leader Election:
- Mechanism: Elects a single leader to handle all client requests that modify the state, ensuring a single source of truth.
- Process: Based on terms and votes; nodes vote for candidates, and a majority is required to elect a leader.
Roles in Raft:
- Leader: Manages client interactions, logs entries, and replicates them to followers.
- Follower: Receives and applies log entries from the leader.
- Candidate: Initiates elections when it doesn't hear from the leader.
Log Replication:
- Append Entries: The leader appends log entries and replicates them to followers.
- Acknowledgment: Followers acknowledge receipt, and the leader commits entries after a majority acknowledgment.
- Commitment: Committed entries are applied to the state machine.
Safety and Consistency:
- Log Matching: Ensures identical log entries for the same index and term across all nodes.
- Leader Completeness: Guarantees that committed entries appear in all future leaders' logs.
- State Machine Safety: Ensures that only committed entries are executed in order.
Fault Tolerance:
- Majority Agreement: Requires majority agreement for decisions, tolerating minority node failures.
- Leader Detection: Uses a timeout mechanism for leader election upon leader failure.
Client Interaction:
- Read Requests: Handled by the leader to reflect the latest state.
- Write Requests: Processed and replicated by the leader before being committed.
Efficiency:
- Heartbeat Mechanism: Periodic heartbeats from the leader to maintain authority and detect failures.
- Efficient Elections: Randomized election timeouts to reduce split votes.

3. Data Security

Encryption: ETCD supports data encryption to ensure security and confidentiality of stored data.

Raft Algorithm, Explained. Part 1 — Leader Election | by Zixuan Zhang | Towards Data Science

Beautiful Visual Representation of Raft Consensus Algorithm: https://thesecretlivesofdata.com/raft

Q) Why should we always use an odd number of nodes in Kubernetes in production scenarios?

Ans) Using an odd number of nodes in a Kubernetes production cluster ensures optimal fault tolerance and resource efficiency. It maximizes the system’s availability and resilience while minimizing costs and complexity.

i) Majority Quorum:

Definition: A majority quorum is the minimum number of nodes that must agree on a proposal to make a decision.
Requirement: For a cluster of N nodes, a majority is defined as [N/2] +1. This means more than half of the nodes must agree for the system to make decisions and maintain consistency.

ii) Fault Tolerance:

With Odd Nodes: An odd number of nodes ensures that the system can tolerate up to [N/2] node failures without losing quorum. For example, in a 3-node cluster, 1 node can fail and the cluster will still function because 2 nodes (the majority) are still up.
With Even Nodes: If you use an even number of nodes, the system would still function, but it does not provide any advantage in terms of fault tolerance. For instance, in a 4-node cluster, the system can still only tolerate 1-node failure while maintaining a majority, same as a 3-node cluster. If 2 nodes fail, it would lose quorum, which is similar to the tolerance of a 3-node cluster but with more nodes to manage and higher costs.

iii) Decision-Making:

Easier Leader Election: In consensus algorithms like RAFT, leader elections, and decision-making processes are more straightforward with an odd number of nodes, reducing the risk of split votes and ensuring smoother operation.

Scheduler

The Kubernetes scheduler is a control plane component responsible for placing pods onto nodes within the cluster, ensuring that workloads are balanced and that specified resource requirements and constraints are met.

Scheduling Process:

Pod Queue: When a pod is created but not yet assigned to a node, it enters a scheduling queue.
Filtering: The scheduler filters out nodes that do not meet the pod's requirements (e.g., insufficient resources, taints, or node affinity).
Scoring: The remaining nodes are scored based on various criteria (e.g., resource utilization, affinity/anti-affinity rules). The node with the highest score is selected.
Binding: The scheduler assigns the pod to the chosen node by creating a binding object.

Scheduling Policies:

Predicates: Rules that determine whether a pod can be scheduled on a node, such as checking for available resources, node taints, and affinity rules.
Priorities: Rules that rank nodes meeting the predicate criteria to find the most suitable one, including balancing resource usage and adhering to affinity rules.

Configuration and Extensibility:

Custom Policies: Users can define custom scheduling policies using the Kubernetes Scheduler Policy API.
Scheduling Profiles: Multiple scheduling profiles can be used to handle different types of workloads with specific scheduling needs.
Scheduler Extenders: Allow external processes to influence scheduling decisions, useful for custom constraints or external resource management.

Plugins and Frameworks:

Scheduling Plugins: Kubernetes uses plugins for different stages of the scheduling process (filtering, scoring, pre-binding, etc.).
Scheduling Framework: Introduced to allow easier extension and customization of the scheduler’s behavior using plugins.

Common Scheduling Scenarios:

Resource Requests: Ensuring pods are placed on nodes with sufficient CPU and memory.
Taints and Tolerations: Allowing or preventing pods from being scheduled on certain nodes based on taints and corresponding tolerations.
Node Affinity/Anti-Affinity: Scheduling pods based on labels that indicate preferred or required node characteristics.
Pod Affinity/Anti-Affinity: Scheduling pods relative to other pods based on their labels and defined rules.

Scheduling Framework | Kubernetes

Controller Manager

The Kubernetes Controller Manager is a crucial component of the control plane responsible for ensuring that the state defined for different components (e.g., cluster, node, pod, deployment) matches the current state. It automates various cluster operations, such as node management, replication, and service account management.

Responsibilities:

Ensures the desired state matches the current state for various cluster components.
Automates cluster operations, including node management, replication, and service account management.

Types of Controllers:

Node Controller: Manages node events like adding or removing nodes and ensures the node status is updated.
Replication Controller: Ensures the specified number of pod replicas are running at all times.
Endpoints Controller: Manages endpoint objects, representing the IP addresses and ports comprising a service.
Service Account and Token Controllers: Manage service accounts and their associated authentication tokens.
Job Controller: Ensures that specified jobs (one-off tasks) are completed.
DaemonSet Controller: Ensures that a daemon pod runs on all or specific nodes.
StatefulSet Controller: Manages the deployment and scaling of a set of pods with unique identities and stable, persistent storage.
Deployment Controller: Manages application updates by rolling out new replicas incrementally.
ReplicaSet Controller: Ensures a stable set of pod replicas.
Garbage Collector Controller: Cleans up resources that are no longer needed.

Features:

Efficiency: Runs multiple controllers in a single process, reducing complexity and resource overhead.
Customization: Users can create custom controllers to manage specific resources or custom resources (CRDs) within the cluster.

Write your first Kubernetes operator in go | by Shahin Mahmud | Medium

The Kubernetes Controller Manager ensures that the system is continuously reconciled, maintaining high availability through leader election and enabling customization with custom controllers. By automating key cluster operations, it plays a critical role in maintaining the desired state of the Kubernetes cluster.

Worker Node Components

Worker Nodes run the application containers and include the following components:

Kube-proxy

kube-proxy is a network component in Kubernetes responsible for managing network rules on nodes. It ensures that network traffic reaches the correct pod, facilitating communication within the cluster.

Responsibilities:

Service Abstraction: Implements service abstraction by maintaining network rules and handling traffic forwarding between services and pods.
Traffic Forwarding: Uses service definitions to direct traffic to one of the available pods backing the service, providing basic load balancing.
Network Rules: Creates and manages IP tables or IPVS rules to direct traffic coming to a service IP to the appropriate backend pod. IP tables rules direct traffic from a service IP to the appropriate pod IPs. IPVS creates virtual servers and manages traffic distribution to backend pods using various scheduling algorithms.

Traffic Management:

Cluster Traffic: Handles traffic within the cluster, ensuring requests to a service IP are directed to one of the pods providing that service.
External Traffic: Manages traffic from outside the cluster to services, ensuring proper routing and load balancing.

Service Types:

ClusterIP: Default service type that exposes the service on a cluster-internal IP, only reachable from within the cluster.
NodePort: Exposes the service on a static port on each node’s IP, allowing external traffic to access the service.
LoadBalancer: Uses the cloud provider’s load balancer to expose the service externally, providing a single IP address for access.
ExternalName: Maps a service to a DNS name, allowing external services to be accessed through a Kubernetes service.

Deployment:

DaemonSet: kube-proxy runs as a DaemonSet, ensuring there is an instance of kube-proxy running on each node in the cluster.

Kubernetes: Service, load balancing, kube-proxy, and iptables

Kubelet

The kubelet is a critical component of a Kubernetes node responsible for managing and running containers on that node. It ensures that containers are running as expected and communicates the node's status back to the Kubernetes control plane.

Responsibilities:

Agent Role: Acts as an agent on each node, maintaining the desired state of containers as defined by the Kubernetes API.
Container Management: Starts, stops, and maintains containers based on pod specifications, ensuring they are running and healthy.
Health Checks: Performs periodic health checks on containers and pods, restarting them if they fail or become unhealthy.
Container Runtime Interaction: Works with container runtimes (e.g., Docker, containerd, CRI-O) to manage container lifecycle tasks such as pulling images, and starting, and stopping containers.
Volume Management: Manages and mounts volumes specified in pod specs, ensuring containers have access to required storage resources.
Secure Communication: Communicates with the Kubernetes API server using TLS certificates for secure authentication and encryption.
Metrics Collection: Collects and reports resource usage metrics (CPU, memory, etc.) from containers to the API server for monitoring and scaling purposes.

Dynamic Kubelet Configuration | Kubernetes

The kubelet is essential for ensuring that the containers on a node are running according to their specifications, performing health checks, and reporting the node's status back to the Kubernetes control plane. By interacting with the container runtime and managing resources, kubelet helps maintain the efficient and reliable operation of the Kubernetes cluster.

Pod

A pod is the smallest and simplest unit of deployment in Kubernetes. It represents a single instance of a running process within the cluster.

Key Features:

Encapsulation: Pods encapsulate one or more containers that share the same network namespace, storage, and other resources.
Container Communication: Containers within a pod share the same IP address and port space, allowing them to communicate easily with each other.
Volume Sharing: Pods can include one or more volumes for persistent storage, which are shared among all containers in the pod.
Unique IP Address: Each pod gets a unique IP address, enabling containers within the pod to communicate over localhost.

Pod Lifecycle:

Creation: Pods are created based on specifications in a PodSpec, which includes container images, resource requests/limits, volumes, and network settings.
Running: Once created, pods are scheduled onto nodes where kubelet starts and manages their containers.
Termination: Pods can be terminated manually or by the system (e.g., during deployment updates). Containers are stopped and cleaned up.

Common Uses:

Microservices: Deploying microservices, with each service running in a separate pod.
Batch Jobs: Running batch jobs or cron jobs that perform periodic tasks.

What are Kubernetes Pods? Steps to Create, Kill & Models Of Pod

A Kubernetes pod is fundamental for deploying and scaling applications within a Kubernetes cluster. It provides a shared environment for containers, ensuring efficient and reliable operation of containerized applications.

What is CNI (Container Network Interface)?

CNI (Container Network Interface) is a framework used in Kubernetes to manage networking for containers. It ensures that containers can communicate over the network, get assigned IP addresses, and connect to other services.

CNI (Container Network Interface) in Kubernetes

Key Points:

Network Setup: CNI handles the network configuration for containers in a pod, including assigning IP addresses and setting up communication between containers and external services.
Pod Setup: When Kubernetes creates a new pod, it uses CNI to configure the network for that pod. The CNI plugin takes care of network-related tasks such as IP address assignment and routing.
Plugins: CNI plugins are modular tools that perform various network functions. Popular plugins include Calico, Flannel, and Weave, each offering different features like network traffic management or security.
Integration with kubelet: The kubelet, which runs on each node, works with the CNI to ensure that network setup is correctly applied for each pod.

CNCF hosts Container Networking Interface (CNI) | CNCF

What is the difference between kube-proxy and CNI?

kube-proxy and CNI (Container Network Interface) serve different roles in Kubernetes networking. kube-proxy operates at the service level, managing how network traffic is routed between services and their corresponding pods within the cluster. It ensures that requests to a service are properly distributed to the available pod instances using methods like IP tables or IPVS for load balancing and routing. On the other hand, CNI is responsible for setting up the network configuration for individual containers within a pod. It handles assigning IP addresses to pods, managing their network connectivity, and configuring communication paths between containers and external services. Essentially, kube-proxy deals with traffic management at the service level, while CNI focuses on network setup and connectivity for containers.

What is CRI (Container Runtime Interface)?

CRI (Container Runtime Interface) in Kubernetes is a standardized interface that allows Kubernetes to interact with different container runtimes, such as Docker, containerd, and CRI-O.

Communication: The kubelet on each node uses CRI to send instructions to the container runtime for operations like starting, stopping, and monitoring containers.
Container Management: CRI provides a uniform method for container lifecycle management, ensuring that Kubernetes can handle containers consistently across different runtimes.
Importance: CRI enables flexibility by allowing Kubernetes to work with various container runtimes while maintaining consistent and efficient container management.

Conclusion

Kubernetes architecture is a sophisticated yet robust system designed to manage containerized applications efficiently. Understanding its components and their interactions is essential for leveraging the full potential of Kubernetes. By mastering the architecture, you can deploy, scale, and manage applications with confidence, ensuring high availability and optimal performance. Happy Kuberneting :)

Kubernetes Architecture: Mastering Container Orchestration

Table of contents

Introduction

What are REST and gRPC?

REST in Kubernetes:

gRPC in Kubernetes:

Kubernetes Architecture Overview

Master Node (Control Plane)

Worker Nodes (Data Plane)

Master Node Components

API Server

etcd

i) Majority Quorum:

ii) Fault Tolerance:

iii) Decision-Making:

Scheduler

Scheduling Process:

Scheduling Policies:

Configuration and Extensibility:

Plugins and Frameworks:

Common Scheduling Scenarios:

Controller Manager

Responsibilities:

Types of Controllers:

Features:

Worker Node Components

Kube-proxy

Responsibilities:

Traffic Management:

Service Types:

Deployment:

Kubelet

Responsibilities:

Pod

Key Features:

Pod Lifecycle:

Common Uses:

What is CNI (Container Network Interface)?

CNI (Container Network Interface) in Kubernetes

Key Points:

What is CRI (Container Runtime Interface)?

Conclusion