How does scoring happens in Borg

In Borg, scoring is a critical part of task scheduling, where the system evaluates and ranks available machines (nodes) to determine the best fit for a task.

Jan 06, 2025

Scoring ensures that tasks are placed efficiently to maximize resource utilization while maintaining system performance, availability, and fairness.

Here’s an overview of how scoring works in Borg:

1. Resource Matching

The first step in scoring is to filter out machines that don’t meet the task's resource requirements. Borg considers:

CPU, memory, and disk space required by the task.
Specialized hardware needs, such as GPUs or SSDs.
Affinity/anti-affinity rules, ensuring tasks are placed according to predefined constraints (e.g., tasks that must or must not run on the same machine).

Only machines that satisfy these requirements proceed to the scoring phase.

2. Scoring Function

Borg uses a scoring function to rank machines based on multiple factors. The goal is to select the "best" machine for a task while balancing cluster-wide goals.

Key factors in the scoring function include:

a. Fit Score (Resource Efficiency):

Measures how well the task’s resource needs match the machine’s available resources.
A machine that leaves fewer "stranded" resources (resources that can’t be used by other tasks) gets a better score.
Example:
- If a task requires 2 CPUs and 4 GB RAM, Borg prefers a machine with just enough resources to accommodate it, rather than a machine with significantly more resources.

b. Load Balancing:

Scores machines to ensure even distribution of tasks across the cluster.
Avoids overloading specific machines or racks while leaving others underutilized.
Helps prevent hotspots that could affect performance and availability.

c. Task Priority and Preemption:

Borg considers the priority of the task:
- High-priority tasks (e.g., production jobs) may preempt lower-priority tasks (e.g., batch jobs).
Scoring accounts for whether preemption is required and the associated cost (e.g., moving a lower-priority task to another machine).

d. Machine Locality (Data Proximity):

Prefers machines that are closer to the data needed by the task to reduce network latency and improve performance.
For example, tasks that frequently access data in a specific rack or zone may get a higher score for machines in that location.

e. Machine Stability:

Borg may prefer machines with a history of stable performance.
Machines with frequent failures or under maintenance may receive a lower score.

f. Anti-Affinity Rules:

Ensures that tasks requiring diversity in failure domains (e.g., across racks, zones) are scored accordingly.
This is critical for fault-tolerant systems where redundant tasks must not reside on the same machine or rack.

3. Ranking Machines

Once all candidate machines are scored, Borg ranks them from the highest to the lowest score. The machine with the best score is selected for the task.

4. Preemption (If Necessary)

If no suitable machine is available for a high-priority task, Borg looks for machines running lower-priority tasks that can be preempted:

The scoring system factors in the cost of preemption, such as the impact on the preempted task.
Preemption is used as a last resort to ensure high-priority tasks are scheduled.

5. Dynamic Scoring Adjustments

Borg dynamically adjusts scoring based on the current cluster state, including:

Real-time utilization metrics.
Task queue length and backlog.
Changes in machine availability or health.

6. Fairness and Efficiency

Borg's scoring system balances fairness (e.g., ensuring all teams get a share of resources) with efficiency (e.g., maximizing overall cluster utilization). This is achieved through:

Quotas and resource guarantees for different teams or applications.
Adaptive scoring algorithms that optimize for both immediate needs and long-term cluster health.

Example: Scoring in Action

Imagine a task that requires:

4 CPUs
8 GB RAM

Available machines:

Machine A: 4 CPUs, 8 GB RAM (perfect fit)
Machine B: 8 CPUs, 16 GB RAM (oversized)
Machine C: 4 CPUs, 6 GB RAM (insufficient)

Scoring:

Machine A gets the highest score because it meets the task's requirements with minimal resource wastage.
Machine B gets a lower score due to resource inefficiency (more stranded resources).
Machine C is filtered out because it doesn’t meet the resource requirements.

The task is placed on Machine A.

Summary

Scoring in Borg is a sophisticated process that ensures:

Efficient resource utilization by minimizing wastage.
Load balancing across the cluster to prevent bottlenecks.
Fairness while adhering to quotas and priorities.
Reliability by considering machine stability and anti-affinity rules.

This scoring mechanism, combined with preemption and dynamic adjustments, helps Borg effectively manage massive, diverse workloads at Google’s scale.

Shashank’s Substack

Discussion about this post