Post

Week 9 Update: Refinement & Design Tradeoffs

Week 9 Update: Refinement & Design Tradeoffs

Overview

This week was focused on continuing the implementation of our user-configurable scheduler architecture. While much of the foundation laid in previous weeks is taking shape in code, we’re also grappling with important design questions around usability versus flexibility.

Progress on Scheduler Architecture

We’ve made steady progress implementing the virtual scheduler interface and system state management:

  • Base Class Integration:
    • The manage_images and schedule_invocation methods are now structurally integrated into the simulation loop.
    • Custom implementations of these methods can now influence the actual runtime behavior of the system.
  • State Class Usage:
    • The centralized state class is being actively used by the compute services to update resource usage and invocation queues.
    • The scheduler now has read-only access to the same state, providing consistent and up-to-date context for decision-making.

Ongoing Design Challenges

As we move deeper into implementation, we’re revisiting some fundamental questions about the role of user-defined scheduling:

  • How Configurable Should It Be?
    • On one hand, giving users full control over both manage_images and schedule_invocation enables powerful and custom scheduling strategies.
    • On the other hand, this level of flexibility may be overwhelming for users who just want to apply basic policies without digging into internals.
  • Balancing Complexity and Usability:
    • We’re exploring whether to offer two tiers of scheduling:
      • Advanced Mode: Full access to the scheduler interface for highly customized behavior.
      • Simplified Mode: A limited set of pre-defined strategies or simple configuration options (e.g., round-robin, least-loaded) with minimal setup.
    • This design tension is still under discussion, and we’re evaluating what best supports real-world usage without sacrificing power or accessibility.

Next Steps

  • Finalizing Design Decisions:
    • Make a decision on whether to support multiple scheduling modes or stick with a single, unified scheduler API.
    • Revisit the interface to ensure it’s intuitive regardless of the chosen approach.
  • Full Integration:
    • Complete wiring of the scheduler into the entire function lifecycle—from registration to execution and completion.
  • Error Handling and Fallbacks:
    • Begin implementing fallback behaviors for misbehaving or incomplete scheduler implementations (e.g., what happens if no host is returned).

While the core framework is coming together, this week has highlighted the importance of thoughtful design choices when exposing complex control to users. As we continue refining the architecture, our goal is to strike the right balance between power and simplicity.

This post is licensed under CC BY 4.0 by the author.