Fleets and autoscaling

Hathora automatically scales your game servers across global regions, balancing performance and cost. This page explains how Hathora's fleets work and how to optimize them for your game's needs.

note

This page applies to Pro and Enterprise customers only—on the Explore tier, your processes are scheduled on shared hardware.

Fleets

For Hathora Pro and Enterprise customers, all processes run on a dedicated single-tenant fleet. A fleet is a pool of compute resources spanning multiple regions that dynamically scale up or down based on demand.

Monitor fleet performance

Hathora monitors two live fleet metrics to dynamically scale bare metal and cloud capacity in each region:

Provisioned: Total number of vCPUs running in a region
Utilization: Percentage of vCPUs in use. Since this value is based on requested processes, it may exceed 100%.

Monitor these metrics up to 7 days back for each region in the Fleet tab:

Set a cloud minimum

For faster start-up times, you can configure a cloud minimum capacity in each region. Setting a cloud minimum is helpful in regions without bare metal capacity or ahead of a significant event (like playtests or launch day).

tip

In regions with no capacity, we will scale from 0. It takes ~2 minutes to scale up cloud capacity. We recommend setting a cloud minimum for live, production use cases.

For use cases like using a cron job or a script to programmatically change minimums, update cloud minimums with the Fleets API.

Intelligent autoscaling

Hathora’s autoscaler ensures that game servers are allocated within 5 seconds. It also balances resources across bare metal and cloud infrastructure. Bare metal capacity is always used first when provisioning processes to ensure your committed capacity is used efficiently.

The autoscaler monitors both provisioned capacity and utilization in each region. By default, it will scale to a target utilization between 75-85% in a region, ensuring at least 15% capacity remains available as a buffer. When the average utilization:

Exceeds 85%: the autoscaler adds as much cloud capacity as needed to bring utilization below 85%, even during sudden bursts of usage. It takes approximately 2 minutes to add new cloud capacity.
Drops below 75%: once your processes stop, the autoscaler automatically removes cloud capacity.

Customize scale-up behavior

You can now configure the scale-up threshold to control how aggressively the autoscaler adds capacity. By default, the threshold is 85%, but you can adjust it to fit your game’s tolerance for latency and buffer requirements:

Set it higher (up to 100%): to defer scaling until capacity is fully saturated.
Set it lower (down to 40%): to scale up earlier and minimize the risk of allocation delays.

Scaling down

Hathora aggressively drains underutilized nodes in regions where utilization falls below 75%. The autoscaler safely and cost-efficiently removes cloud capacity by terminating the least utilized backend cloud instance after all game processes on it have completed.

note

When you customize the autoscaler, the scale-down threshold is automatically set to 10% below the scale-up threshold, ensuring capacity is added and removed predictably.

Fleets​

Monitor fleet performance​

Set a cloud minimum​

Intelligent autoscaling​

Customize scale-up behavior​

Scaling down​