HPA Kubernetes: A Guide To Scaling Applications Effectively

Get up to 50% off now

Become a partner with CyberPanel and gain access to an incredible offer of up to 50% off on CyberPanel add-ons. Plus, as a partner, you’ll also benefit from comprehensive marketing support and a whole lot more. Join us on this journey today!

Scaling applications effectively is one of the primary challenges in modern cloud native environments. Kubernetes addresses this using Horizontal Pod Autoscaler (HPA), which is a built-in controller that automatically adjusts the number of pods in the deployment, replication controller, or stateful sets based on real-time resource usage.

Rather than over-provisioning or risking performance issues during peak traffic, HPA maintains the perfect balance between resource consumption and application responsiveness. So, whether you are dealing with fluctuating Kubernetes workloads or preparing for scaling, understanding the concept of HPA Kubernetes is essential.

In this guide, we shall explore how Horizontal Pod Autoscaler functions in Kubernetes and its functionality.

How HPA Kubernetes Works: Core Concepts

The Horizontal Pod Autoscaler (HPA) is a Kubernetes controller that would automatically scale pod replicas in a deployment, replication, and other workloads.

Here’s how it works:

Metrics Collection: HPA Kubernetes fetches the resource usage metrics (like the CPU or memory) from the Kubernetes Metrics Server or other custom metrics.
Evaluation Loop: At regular intervals, the HPA Kubernetes will compare all the observed metrics to target the threshold defined in your configurations.
Scaling Decision: Based on the ratio of observed to target metrics, HPA calculates the desired number of replicas. For example:

desiredReplicas = currentReplicas × (currentMetric / targetMetric)
Update Replica Count: If there’s a significant difference, the number of pods in the deployment is adjusted up or down accordingly.

HPA operates at the deployment level—it does not change pod specs like CPU limits or memory (that’s the role of the Vertical Pod Autoscaler).

Tech Delivered to Your Inbox!

Get exclusive access to all things tech-savvy, and be the first to receive

the latest updates directly in your inbox.

When Should You Use HPA Kubernetes?

HPA Kubernetes is your ideal provider when the application is experiencing fluctuating workloads or inconsistent traffic. Some common scenarios include:

Web applications with variable traffic: where you need to increase pods during peak hours and reduce during low traffic hours.
Data processing jobs: automatic scalability that is based on CPU/memory usage during peak hours.
Microservices architecture: scale individual services independently based on their own resource demands.
APIs or backend services: maintain consistent performance without constant manual intervention.

Use HPA when:

You need to optimize resource usage.
Your app performance is directly related to the CPU/ memory load.
You need to scale horizontally based on demand.

Avoid HPA when:

The application state is not easily replicated across pods.
You require vertical scaling (more resources per pod)—in that case, consider VPA.

Important Metrics Used by HPA Kubernetes

The effectiveness of the Horizontal Pod Autoscaler is completely dependent on the metrics that it uses for decisions. HPA Kubernetes supports multiple types of metrics to determine when to scale the pods up or down.

CPU Utilization

CPU utilization is one of the most widely used metrics with HPA. It is the percentage of the CPU resources being used by the container in comparison to the resource limits or requests defined in the pod specs.

Default behavior: HPA uses this if no other metric is specified.
Example: If your HPA is set to maintain 50% CPU usage and usage rises to 100%, it will trigger a scale-up.

You should use CPU utilisation because it is simple, effective, and supported out of the box by the Kubernetes Metrics Server.

Memory Utilization

Memory utilization tracks the exact amount of memory consumed by the pods in Kubernetes compared to the requested memory resources.

Memory-based scaling requires proper memory requests to be defined in your pod specs.
HPA doesn’t support memory metrics by default—you’ll need to explicitly configure this using the resource metric type.

Custom Metrics

For more advanced scaling needs, HPA Kubernetes can use custom metrics, such as:

Request rates (e.g., HTTP requests per second)
Queue length (e.g., messages in a RabbitMQ or Kafka topic)
Business metrics (e.g., active users or transactions)

Custom metrics are defined by using the Custom Metrics API or External Metrics API, which are often exposed using the Prometheus Adapter or Stackdriver.

Setting Up HPA Kubernetes – Step-by-Step Guide

Setting up HPA Kubernetes is pretty straightforward if you have already defined the resource requests in your deployments. Here is a step by step guide.

Enhance Your CyerPanel Experience Today!

Discover a world of enhanced features and show your support for our ongoing development with CyberPanel add-ons. Elevate your experience today!

Step 1: Ensure the Metrics Server is Installed

HPA relies on the Kubernetes Metrics Server to gather CPU and memory metrics.

kubectl get deployment metrics-server -n kube-system

If it’s not installed, you can install it using:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Step 2: Define Resource Requests in Your Deployment

HPA works only if your pods define resource requests for CPU (and optionally memory).

resources:

requests:

cpu: “100m”

limits:

cpu: “200m”

Step 3: Create an HPA Resource

You can use kubectl autoscale or define it via YAML.

Option 1: CLI

kubectl autoscale deployment my-app –cpu-percent=50 –min=2 –max=10

Option 2: YAML

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

minReplicas: 2

maxReplicas: 10

metrics:

– type: Resource

resource:

target:

type: Utilization

averageUtilization: 50

Apply it:

kubectl apply -f hpa.yaml

Step 4: Verify HPA Behavior

Monitor the HPA:

kubectl get hpa

kubectl describe hpa my-app-hpa

You can also simulate CPU load with a stress tool or use kubectl exec to run CPU-intensive commands.

Advanced HPA Configurations

If you require more control, you can use HPA Kubernetes v2 for multiple metrics support and scalable behaviour.

Scaling Based on Multiple Metrics

You can combine CPU, memory, and custom metrics:

metrics:

– type: Resource

resource:

target:

type: Utilization

averageUtilization: 60

– type: Resource

resource:

target:

type: Utilization

averageUtilization: 70

Custom Scaling Policies

Fine-tune how aggressively HPA scales:

behavior:

scaleUp:

stabilizationWindowSeconds: 30

policies:

– type: Percent

value: 100

periodSeconds: 60

scaleDown:

stabilizationWindowSeconds: 60

policies:

– type: Pods

value: 1

periodSeconds: 60

Troubleshooting HPA Kubernetes Common Issues

Issue	Likely Cause	Solution
HPA not scaling pods	Metrics Server not installed or not running	Install or restart the Metrics Server
HPA remains at minimum replicas	Low CPU/memory usage	Test with a load generator or verify actual resource usage
Error: unknown metric source type	Invalid metric type in HPA YAML	Use valid types: Resource, Pods, or External
No CPU/memory data	Resource requests/limits not defined in pod specs	Define resources.requests in your deployment configuration
Overreacting to short spikes	No scaling stabilization configured	Add behavior settings to control how fast HPA reacts
Custom metrics not working	Prometheus Adapter not set up	Install and configure the Prometheus Adapter
HPA stuck in Pending state	API server or Metrics Server issues	Check logs, ensure API access, and verify RBAC permissions

HPA vs VPA vs Cluster Autoscaler – Comparison Table

Feature	HPA (Horizontal Pod Autoscaler)	VPA (Vertical Pod Autoscaler)	Cluster Autoscaler
What it scales	Number of pod replicas	Resource requests (CPU/memory) per pod	Number of nodes in the cluster
When it triggers	Based on metrics like CPU, memory, or custom	When pods are under/over-provisioned	When pods are pending due to lack of resources
Best for	Apps with varying load or traffic	Stateful apps or those with changing resource needs	Automatically increasing/decreasing cluster size
Requires Metrics Server?	✅ Yes	✅ Yes	❌ No
Can they be used together?	✅ Yes	✅ Yes, but not with HPA on the same deployment	✅ Yes
Granularity	Per deployment/pod level	Per pod level	Cluster/node level
Configuration complexity	Medium	Medium	High (with cloud provider-specific setup)

Conclusion – HPA In Kubernetes

The Horizontal Pod Autoscaler is one of the essential parts of building effective Kubernetes workloads. When implemented in the right manner, HPA in Kubernetes can improve application performance, increase scalability, and eliminate downtime.

FAQs

What metrics does HPA use to scale pods?

HPA typically uses CPU and memory utilization but can also work with custom or external metrics through tools like Prometheus Adapter.

Can HPA and VPA be used together?

Yes, but with limitations. You should avoid using HPA and VPA on the same resource simultaneously if both are targeting CPU or memory to prevent conflicting behavior.

Why is my HPA not scaling?

Common reasons include missing metrics server, incorrect resource requests, low actual usage, or misconfigured HPA settings.

Marium Fahim

Hi! I am Marium, and I am a full-time content marketer fueled by an iced coffee. I mainly write about tech, and I absolutely love doing opinion-based pieces. Hit me up at [email protected].

Unlock Benefits

Become a Community Member

New Year discount limited time 25% off on our life-time plans using code: LMT25

Try our free employee tracking app! Sign up now – click here

CyberPanel Features

Backup

Wordpress Manager

More Features

Docker app

Support

Email Tester

FTP Manager

MySQL Manager

Firewall Management

SSL Manager

CYBERPANEL TOOLS

Server Load Tester

Test Email Delivery

CyberPanel Repo

DNS Checker

RESOURCES

SUPPORT

PARTNERS

DEVELOPERS

HPA Kubernetes Jobs: Run One-Time or Batch Tasks in Kubernetes

Table of Contents

How HPA Kubernetes Works: Core Concepts

When Should You Use HPA Kubernetes?

Important Metrics Used by HPA Kubernetes

CPU Utilization

Memory Utilization

Custom Metrics

Setting Up HPA Kubernetes – Step-by-Step Guide

Step 1: Ensure the Metrics Server is Installed

Step 2: Define Resource Requests in Your Deployment

Step 3: Create an HPA Resource

Step 4: Verify HPA Behavior

Advanced HPA Configurations

Troubleshooting HPA Kubernetes Common Issues

HPA vs VPA vs Cluster Autoscaler – Comparison Table

Conclusion – HPA In Kubernetes

FAQs

Reviews 75 • Excellent

4.4 ⓘ

Other Projects

Company

Resources

Reviews 75 • Excellent

4.4 ⓘ

Other Projects

Company

Resources