Alfresco Transform Service auto scaling
This document describes auto scaling principles implemented in this Helm chart.
This document do not explain how to setup Kubernetes worker nodes auto-scaling. That is a completely different topic which can be addressed in different ways and is up to the Kubernetes administrator.
Horizontal Pod Auto scaling
For general concepts about HPA please refer to official Kubernetes documentation.
Tengine pods auto scaling is disabled by default. If you want to enable it you need to use the additional value:
tengine:
autoscaling:
enabled: true
Where tengine is one of
imagemagick
,libreoffice
,transformmisc
,tika
orpdfrenderer
.
The default configuration implemented in this chart aims at being able to cope with peak load or spare resources on low level of utilization while also minimizing the number of scaling events because changing the cluster topology means additional computation.
Default behaviour
Without any further configuration scaling would happen as follow:
- Cluster would spin up new pods every minute if the current average pods’ load remains above 75% CPU usage on average during 30 seconds.
- Cluster would spin up no more than 2 pods or 50% more pods of the existing replicas (whichever is bigger) per minute.
- There would never be more than 3 replicas in the Replicaset
- There would never be less than 1 replicas in the Replicaset
- Cluster would remove one pod at most if the CPU load get consistently below 75% on average for 5 minutes.
- Cluster would kill pod only one after the other within a minute.
CPU utilization/load is calculated with regards to the CPU resource request setting (
.resources.requests.cpu
) which is now set to 1 CPU by default.
Customizing auto scaling
The values and behaviour exposed above are defaults we think are sensible to start with. Of course they will not fit every single deployment/installation of Alfresco on Kubernetes.
Below are ways to tweak the auto scaling behaviour for your own setup.
Settings the CPU resources correctly
CPU Resources request is the basis of calculation for the cluster to trigger scaling events. It is then very important to make sure it is set appropriately before enabling auto scaling. Imagine you have a production Kubernetes cluster with large worker nodes (say 3 * 32 CPU nodes). Having CPU requests set to 1 would most likely make the cluster spin up new pods very quickly. Instead it would be better to ensure your tengine pods have a sensible .resources.requests.cpu
value set to say 4. Also you should note that the very same .resources.requests.cpu
value is used by the Kubernetes scheduler, so setting it too high is not a good idea either. It should be set to a value which will allow pods to be scheduled on worker nodes alongside other pods.
default
.resources.limits.cpu
is set to 4 CPU so you will also want to increase this value to something like 12.
Just by setting a sensible .resources.requests.cpu
the auto scaling behaviour would already make much more sense given the worker nodes’ size.
Configuring the auto scaling behaviour
The autoscaling.*
values below can be fined tuned. There a few things to take int account when changing those as explained:
.minReplicas
: This parameter is used to limit the lowest number of replicas.maxReplicas
: This parameter is used to limit the highest number of replicas.behavior.scaleUp.stabilizationWindowSeconds
: This parameter is used to avoid flapping replicasets. A very short peak load is not worth a scale up. So you surely want to avoid scaling without making sure the load has increased consistently and scale up needs to happen. That parameter tells for how long the load should be over target before the HPA controller decides to increase the number of replicas. The shorter this value, the more likely you are to spin up pods for short peak workload, so the pod could even be ready after the load is actually back to normal. Setting this too low when CPU resources requests is set to 1 is counter productive as it’s pretty easy to make 1 CPU busy..behavior.scaleDown.stabilizationWindowSeconds
: This parameter is used in the same manner as for scale up events. One notable difference though is that scaling down has an immediate effect on the way the application handles the workload while scaling up is more expensive as pods need to startup before it can actually start handling requests. For that reason we think one should always be more cautious when brining the number of replica down. You probably want to avoid taking pods down too quickly if your workload is not very consistent.- `.behavior.scaleUp.policies[]:
.periodSeconds
: The faster an individual tengine pod is to startup (Tomcat complete startup) the lower this parameter can be. 60s appeared a good value for startup times around 90s. The lower this parameter the faster new pods can be spun up and the faster peak load can be handled..type
&value
: For exact details on setting this part of the auto scaler policy check the kubernetes doc. Policies let you define the way you want to act upon scaling events. For scale up events, if you know your load peak are steep (but consistent) then you will want to scale the replicaset by more pods than if you load is growing more slowly.