Scaling applications effectively is one of the primary challenges in modern cloud native environments. Kubernetes addresses this using Horizontal Pod Autoscaler (HPA), which is a built-in controller that automatically adjusts the number of pods in the deployment, replication controller, or stateful sets based on real-time resource usage.
Rather than over-provisioning or risking performance issues during peak traffic, HPA maintains the perfect balance between resource consumption and application responsiveness. So, whether you are dealing with fluctuating Kubernetes workloads or preparing for scaling, understanding the concept of HPA Kubernetes is essential.
In this guide, we shall explore how Horizontal Pod Autoscaler functions in Kubernetes and its functionality.
How HPA Kubernetes Works: Core Concepts
The Horizontal Pod Autoscaler (HPA) is a Kubernetes controller that would automatically scale pod replicas in a deployment, replication, and other workloads.
Here’s how it works:
- Metrics Collection: HPA Kubernetes fetches the resource usage metrics (like the CPU or memory) from the Kubernetes Metrics Server or other custom metrics.
- Evaluation Loop: At regular intervals, the HPA Kubernetes will compare all the observed metrics to target the threshold defined in your configurations.
- Scaling Decision: Based on the ratio of observed to target metrics, HPA calculates the desired number of replicas. For example:
desiredReplicas = currentReplicas × (currentMetric / targetMetric) - Update Replica Count: If there’s a significant difference, the number of pods in the deployment is adjusted up or down accordingly.
HPA operates at the deployment level—it does not change pod specs like CPU limits or memory (that’s the role of the Vertical Pod Autoscaler).
Get exclusive access to all things tech-savvy, and be the first to receive
the latest updates directly in your inbox.
When Should You Use HPA Kubernetes?
HPA Kubernetes is your ideal provider when the application is experiencing fluctuating workloads or inconsistent traffic. Some common scenarios include:
- Web applications with variable traffic: where you need to increase pods during peak hours and reduce during low traffic hours.
- Data processing jobs: automatic scalability that is based on CPU/memory usage during peak hours.
- Microservices architecture: scale individual services independently based on their own resource demands.
- APIs or backend services: maintain consistent performance without constant manual intervention.
Use HPA when:
- You need to optimize resource usage.
- Your app performance is directly related to the CPU/ memory load.
- You need to scale horizontally based on demand.
Avoid HPA when:
- The application state is not easily replicated across pods.
- You require vertical scaling (more resources per pod)—in that case, consider VPA.
Important Metrics Used by HPA Kubernetes
The effectiveness of the Horizontal Pod Autoscaler is completely dependent on the metrics that it uses for decisions. HPA Kubernetes supports multiple types of metrics to determine when to scale the pods up or down.
CPU Utilization
CPU utilization is one of the most widely used metrics with HPA. It is the percentage of the CPU resources being used by the container in comparison to the resource limits or requests defined in the pod specs.
- Default behavior: HPA uses this if no other metric is specified.
- Example: If your HPA is set to maintain 50% CPU usage and usage rises to 100%, it will trigger a scale-up.
You should use CPU utilisation because it is simple, effective, and supported out of the box by the Kubernetes Metrics Server.
Memory Utilization
Memory utilization tracks the exact amount of memory consumed by the pods in Kubernetes compared to the requested memory resources.
- Memory-based scaling requires proper memory requests to be defined in your pod specs.
- HPA doesn’t support memory metrics by default—you’ll need to explicitly configure this using the resource metric type.
Custom Metrics
For more advanced scaling needs, HPA Kubernetes can use custom metrics, such as:
- Request rates (e.g., HTTP requests per second)
- Queue length (e.g., messages in a RabbitMQ or Kafka topic)
- Business metrics (e.g., active users or transactions)
Custom metrics are defined by using the Custom Metrics API or External Metrics API, which are often exposed using the Prometheus Adapter or Stackdriver.
Setting Up HPA Kubernetes – Step-by-Step Guide
Setting up HPA Kubernetes is pretty straightforward if you have already defined the resource requests in your deployments. Here is a step by step guide.

Step 1: Ensure the Metrics Server is Installed
HPA relies on the Kubernetes Metrics Server to gather CPU and memory metrics.
kubectl get deployment metrics-server -n kube-system
If it’s not installed, you can install it using:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Step 2: Define Resource Requests in Your Deployment
HPA works only if your pods define resource requests for CPU (and optionally memory).
resources:
requests:
cpu: “100m”
limits:
cpu: “200m”
Step 3: Create an HPA Resource
You can use kubectl autoscale or define it via YAML.
Option 1: CLI
kubectl autoscale deployment my-app –cpu-percent=50 –min=2 –max=10
Option 2: YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
– type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Apply it:
kubectl apply -f hpa.yaml
Step 4: Verify HPA Behavior
Monitor the HPA:
kubectl get hpa
kubectl describe hpa my-app-hpa
You can also simulate CPU load with a stress tool or use kubectl exec to run CPU-intensive commands.
Advanced HPA Configurations
If you require more control, you can use HPA Kubernetes v2 for multiple metrics support and scalable behaviour.
- Scaling Based on Multiple Metrics
You can combine CPU, memory, and custom metrics:
metrics:
– type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
– type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
- Custom Scaling Policies
Fine-tune how aggressively HPA scales:
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
– type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 60
policies:
– type: Pods
value: 1
periodSeconds: 60
Troubleshooting HPA Kubernetes Common Issues
Issue | Likely Cause | Solution |
HPA not scaling pods | Metrics Server not installed or not running | Install or restart the Metrics Server |
HPA remains at minimum replicas | Low CPU/memory usage | Test with a load generator or verify actual resource usage |
Error: unknown metric source type | Invalid metric type in HPA YAML | Use valid types: Resource, Pods, or External |
No CPU/memory data | Resource requests/limits not defined in pod specs | Define resources.requests in your deployment configuration |
Overreacting to short spikes | No scaling stabilization configured | Add behavior settings to control how fast HPA reacts |
Custom metrics not working | Prometheus Adapter not set up | Install and configure the Prometheus Adapter |
HPA stuck in Pending state | API server or Metrics Server issues | Check logs, ensure API access, and verify RBAC permissions |
HPA vs VPA vs Cluster Autoscaler – Comparison Table
Feature | HPA (Horizontal Pod Autoscaler) | VPA (Vertical Pod Autoscaler) | Cluster Autoscaler |
What it scales | Number of pod replicas | Resource requests (CPU/memory) per pod | Number of nodes in the cluster |
When it triggers | Based on metrics like CPU, memory, or custom | When pods are under/over-provisioned | When pods are pending due to lack of resources |
Best for | Apps with varying load or traffic | Stateful apps or those with changing resource needs | Automatically increasing/decreasing cluster size |
Requires Metrics Server? | ✅ Yes | ✅ Yes | ❌ No |
Can they be used together? | ✅ Yes | ✅ Yes, but not with HPA on the same deployment | ✅ Yes |
Granularity | Per deployment/pod level | Per pod level | Cluster/node level |
Configuration complexity | Medium | Medium | High (with cloud provider-specific setup) |
Conclusion – HPA In Kubernetes
The Horizontal Pod Autoscaler is one of the essential parts of building effective Kubernetes workloads. When implemented in the right manner, HPA in Kubernetes can improve application performance, increase scalability, and eliminate downtime.
FAQs
What metrics does HPA use to scale pods?
HPA typically uses CPU and memory utilization but can also work with custom or external metrics through tools like Prometheus Adapter.
Can HPA and VPA be used together?
Yes, but with limitations. You should avoid using HPA and VPA on the same resource simultaneously if both are targeting CPU or memory to prevent conflicting behavior.
Why is my HPA not scaling?
Common reasons include missing metrics server, incorrect resource requests, low actual usage, or misconfigured HPA settings.