Blue Team Scoring Strategies

In cyber defense competitions like Inherit and Defend, scoring isn’t just about preventing breaches—it’s about measuring how well teams operate under pressure, manage their assets, and adapt to dynamic challenges. In this post, we’ll walk through the core components of Blue Team scoring: Service Scoring, Blue Team Flags, Injects, and Bi-Directional Red/Blue Scoring. Whether you’re a cybersecurity professional or an educator, understanding these strategies can help sharpen your defensive game and enrich your teaching curriculum.


Inherit and Defend

Inherit and Defend competitions are structured as Red versus Blue exercises where the Blue Team(s) are responsible for defending pre-built networks and servers from the Red Team. This format is common in events like the Mid Atlantic CCDC or any setting where Blue Team members are prohibited from counter-hacking the Red Team. It scales well, allowing multiple Blue Teams to compete simultaneously against a common Red.

While the event is termed “Red v Blue,” the scoring system ranks Blue teams solely against one another, focusing on defensive skills—networking, system administration, forensics, and more. Similarly, Red teams are scored against their peers, with the objective being to excel in your respective role rather than directly “beat” the opposing side.


Points for Stuff and Things

For the purposes of these discussions, a Flag is defined as any question that requires a player to perform a specific action in order to answer it. This framework is versatile, allowing flags to be used across roles on both Red and Blue teams. At a high level, the point breakdown is as follows:

BLUE TEAM

  • Flags
  • Network Service Functionality (Service Scoring)
  • Injects
  • Keeping Red out (as reflected in Bi-Directional Scoring)

RED TEAM

  • Flags
  • Network Service Corruption (a component of Bi-Directional Scoring)
  • Target Compromise/Ownership (via Phone Home mechanisms)
  • Injects
  • Service Scoring

Service Scoring

Service Scoring evaluates a team’s ability to maintain service uptime and functionality throughout the competition. Blue teams receive an asset list—typically including IP addresses and the specific services they must run. The scoring engine then tests these services either continuously (similar to a Nagios setup) or at scheduled intervals, creating distinct scoring rounds.

How It Works

Each service is typically evaluated through a series of dependency-based steps:

  • Ping: The scoring engine verifies that the server is reachable.
  • Port Connectivity: A TCP three-way handshake confirms that the designated service port is active.
  • Service Functionality: For instance, a web server might be queried for a file (e.g., flag.html), with the expectation of receiving a 200 status code.
  • Flag Integrity: The file is then validated using an MD5 checksum compared against a baseline established during a zero round.

If a web server is valued at 100 points, each step is worth 25 points. Failure at any stage stops further checks. Additionally, scoring can be variable—different services might carry different point values based on importance or complexity. Enhanced logic may also factor in network interdependencies; for example, if a server is pinged using its Fully Qualified Domain Name (FQDN) instead of just its IP address, it indicates proper DNS functionality and earns higher marks.

Round Versus Continuous Service Scoring

There are two general methods for scoring service functionality:

  • Round-Based Scoring:
    In this method, both Red and Blue scores are updated at set intervals (typically every 12–18 minutes). During each round, the scoring engine performs:
    • A service check for each asset on every team.
    • A check for service corruption.
    • Execution of Phone Home scripts.
    • Tallying of flags, which are continuously updated on the scoreboard.
    The round score is calculated as:
    Service Score – Red Team Activity = Round Score. Round-based scoring minimizes network traffic and compartmentalizes activity into manageable time blocks. However, it may be more easily gamed—time-based ACLs, for example, can reduce network exposure, and maintaining a growing library of testing scripts can become challenging.
  • Continuous Scoring:
    Alternatively, a Nagios-like service can perform ongoing network health checks. This method converts Blue Team scores into percentages (e.g., 92% uptime), which may be more meaningful to both teams and spectators than raw point totals. The challenge here lies in developing custom reports and visualizations that often require interfacing with third-party databases or APIs.

Scoring Visualization Considerations

Let’s break down the math behind the total possible service score to understand how these metrics can impact visual scoreboards:

  • Assets and Services:
    A team with 10 assets running 2 services each yields:
    10 × 2 = 20 services.
  • Points per Service per Round:
    Each service is worth up to 100 points per scoring round.
  • Scoring Rounds:
    With scoring every 15 minutes, there are 4 rounds per hour. For 8 hours of gameplay per day:
    • Rounds per day = 8 × 4 = 32 rounds
    • Over two days, total rounds = 32 × 2 = 64 rounds.
  • Total Possible Service Score:
    Each round offers 20 services × 100 points = 2,000 points.
    Over 64 rounds, the maximum score is:
    64 × 2,000 = 128,000 points.

While these numbers validate the scoring system mathematically, such large totals can pose challenges for visualization. For example, if one Blue Team scores 10,000 points and another scores 90,000 points, plotting these on a single scoreboard can be visually overwhelming, potentially obscuring performance trends.

Visualization Strategies:

  • Normalization: Convert raw points into percentages or normalized scores for easier comparison.
  • Alternative Scales: Employ logarithmic scales or segmented graphs to highlight differences without overwhelming the viewer.
  • Dashboard Design: Focus on trends and key performance indicators rather than raw totals, ensuring that the scoreboard remains accessible and informative.

Careful planning in designing scoring schemas is essential not only for fair competition but also for creating effective visual reports.


Blue Team Flags

Blue Team Flags are interactive checkpoints designed to guide teams through their inherited network environment. They serve a dual purpose:

  • Orientation: Flags help players become familiar with the network’s scope and design, directing them to key components like configuration files, log directories, and database credentials.
  • Engagement: By posing questions that require action, flags encourage teams to actively explore and understand their infrastructure, building a strong foundation for effective defense strategies.
  • Scoring Mechanism: Flags contribute to the overall score, rewarding teams for both technical acumen and their ability to navigate complex network architectures.

Injects

Injects are tasks assigned to the Blue Team that require human evaluation, simulating realistic scenarios and operational challenges.

What Are Injects?

  • Cybersecurity Tasks: These include tasks like completing an incident response report or executing specific defensive procedures under simulated attack conditions.
  • Real-World Scenarios: Injects can mimic everyday workplace challenges such as managing a sudden team member absence, onboarding a new employee, or handling other operational disruptions.
  • Human Evaluation: Since injects are manually graded, they capture qualitative aspects like decision-making and adaptability, testing both technical and operational readiness.

Injects ensure that Blue Teams are prepared not only for technical attacks but also for the multifaceted challenges of modern cybersecurity operations.


Bi-Directional Red/Blue Scoring

Bi-Directional Scoring creates a dynamic interaction between the Red and Blue teams, where every successful offensive move by the Red Team directly impacts the Blue Team’s score.

How It Works

  • Web Service Integrity Example:
    Consider a web service where the final check involves an MD5 hash of the flag.html file. If the MD5 check fails, the system scans the file for a Red Team player’s handle. If found, Blue loses 25 points for the compromised file while that Red Team player gains 25 points.
  • Red Team Phone Home Scripts:
    Red team members execute specially generated scripts on compromised Blue Team assets. These scripts “phone home” to the scoring engine, reporting details such as the asset’s IP address, the Red Team player’s handle, and the privilege level (user or root) at which the script was executed. Each successful execution results in a transfer of points—Blue loses points, and Red gains credit for the compromise. Red teams are further encouraged to schedule these scripts (via CRON or AT) to maintain ongoing proof of compromise, reinforcing sustained control over targeted assets.

This bi-directional approach mirrors real-world adversarial dynamics, ensuring that every tactical move has a corresponding strategic impact.


Conclusion

Blue Team scoring in competitions like Inherit and Defend is multifaceted, demanding a balance of technical proficiency, operational awareness, and strategic response. From keeping services operational through intricate checks and dynamic network interdependencies, to using flags for guided discovery, handling realistic injects, and engaging in the constant tug-of-war of bi-directional scoring—each element plays a critical role in overall performance.

Understanding these strategies is vital for cybersecurity professionals and educators alike. It not only prepares teams for the challenges of modern cyber defense but also provides insights into designing scoring systems and visualizations that capture the complexities of defending digital infrastructures.

AI USE

This article was written in conjunction with ChatGPT using the o3-mini-high model based on my experience.


Discover more from Cyber Exercises

Subscribe to get the latest posts sent to your email.