AgentSkillOS: An Operating System for Agent Skills

The agent skill ecosystem is exploding - over 200,000+ skills are now publicly available. But with so many options, how do you find the right skills for your task? And when one skill isn’t enough, how do you compose and orchestrate multiple skills into a working pipeline?

AgentSkillOS is the operating system for agent skills - helping you discover, compose, and run skill pipelines end-to-end.

AgentSkillOS Architecture

Understanding the Architecture Diagram

The architecture diagram above illustrates the complete AgentSkillOS system, designed as a layered operating system for managing AI agent skills at scale. Let’s break down each component and understand how they work together.

Entry Points Layer

At the top of the architecture, we find three distinct entry points that make AgentSkillOS accessible to different user types and use cases:

Web UI: A browser-based graphical interface that provides visual workflow management, real-time execution monitoring, and human-in-the-loop intervention capabilities. This is ideal for developers who want visual feedback and control over skill orchestration.
Batch CLI: A command-line interface designed for automated, headless execution of multiple tasks. Perfect for CI/CD pipelines, scheduled jobs, or bulk processing scenarios where manual intervention isn’t needed.
Python API: A programmatic interface for developers who want to integrate AgentSkillOS directly into their applications. This enables custom workflows, embedded skill orchestration, and programmatic control over all system features.

Manager Layer: The Brain of Skill Discovery

The Manager Layer is responsible for discovering and selecting relevant skills from the vast skill pool. It implements two complementary approaches:

Tree-based Manager: This innovative approach organizes skills into a hierarchical capability tree. Instead of relying solely on semantic similarity, it navigates through skill categories and subcategories, enabling discovery of non-obvious but functionally relevant skills. For example, when searching for “image processing,” it might discover skills in “data visualization” or “document generation” that could enhance the workflow.
Vector-based Manager: A traditional semantic search approach using embedding models. Skills are converted to vector representations, and similarity search finds the closest matches. This is fast and effective for known skill patterns but may miss creative combinations.

Orchestrator Layer: Coordinating Complex Workflows

The Orchestrator Layer takes selected skills and coordinates their execution. It offers three distinct execution strategies:

DAG Engine: The most sophisticated orchestrator that builds directed acyclic graphs to manage complex dependencies. It automatically determines execution order, handles parallel execution where possible, and manages data flow between skills.
Direct Engine: A simpler approach for straightforward tasks where skills execute sequentially without complex dependency management.
Freestyle Engine: Inspired by Claude Code’s execution model, this engine provides more flexible, conversational-style skill invocation for dynamic scenarios.

Runtime Layer: Where Skills Execute

At the bottom of the architecture sits the Runtime Layer, which handles the actual execution of skills. It supports multiple LLM backends including Claude Code and other providers through the cc-switch utility. This abstraction allows developers to use their preferred AI models while maintaining consistent skill execution interfaces.

Data Flow Through the System

When a user submits a task through any entry point, the request flows downward through the layers. The Manager Layer discovers relevant skills, the Orchestrator Layer plans and coordinates execution, and the Runtime Layer executes each skill. Results flow back upward, with logging and state management at each level ensuring observability and debugging capabilities.

Introduction

AgentSkillOS addresses a fundamental challenge in the AI agent ecosystem: skill discovery and orchestration at scale. With hundreds of thousands of skills available across platforms like GitHub, npm, and PyPI, finding the right combination of tools for complex tasks has become increasingly difficult.

Traditional approaches rely on semantic search, which often misses skills that look unrelated in embedding space but are crucial for solving tasks. AgentSkillOS introduces a novel capability tree structure that organizes skills hierarchically, enabling more creative and effective skill discovery.

Why AgentSkillOS Matters

Scale: Manages 200,000+ skills efficiently
Discovery: Finds non-obvious but functionally relevant skills
Orchestration: Composes multiple skills into coordinated workflows
Control: Provides human-in-the-loop GUI for intervention

Key Features

Feature	Description
Skill Search & Discovery	Creatively discover task-relevant skills with a skill tree that organizes skills into a hierarchy based on their capabilities
Skill Orchestration	Compose and orchestrate multiple skills into a single workflow with a directed acyclic graph, automatically managing execution order, dependencies, and data flow
GUI (Human-in-the-Loop)	A built-in GUI enables human intervention at every step, making workflows controllable, auditable, and easy to steer
High-Quality Skill Pool	A curated collection of high-quality skills, selected based on Claude’s implementation, GitHub stars, and download volume
Observability & Debugging	Trace each step with logs and metadata to debug faster and iterate on workflows with confidence
Extensible Skill Registry	Easily plug in new skills, bring your own skills via a flexible registry
Benchmark	30 multi-format creative tasks across 5 categories, evaluated with pairwise comparison and Bradley-Terry aggregation

Architecture Overview

AgentSkillOS follows a modular architecture with pluggable retrieval and orchestration components.

Skill Retrieval

Understanding the Skill Retrieval Flow

The skill retrieval diagram above demonstrates how AgentSkillOS discovers relevant skills from its vast pool of 200,000+ available skills. This process is critical because finding the right skills determines the quality and efficiency of the entire workflow execution.

The Challenge of Skill Discovery at Scale

With over 200,000 skills available across platforms like GitHub, npm, and PyPI, traditional keyword search falls short. A simple query like “process images” might return hundreds of results, many of which are tangentially related but not truly useful for the specific task at hand. AgentSkillOS addresses this through two complementary retrieval mechanisms.

Tree-Based Retrieval: Navigating the Capability Hierarchy

The tree-based approach organizes skills into a hierarchical capability structure. Imagine a tree where:

Root nodes represent broad capability categories like “Data Processing,” “Content Generation,” or “System Operations”
Branch nodes represent sub-categories like “Image Manipulation” under “Data Processing”
Leaf nodes contain the actual skills with their descriptions and metadata

When a user submits a task, the LLM doesn’t just search for keywords—it navigates this tree intelligently. For example, a task to “create a marketing video from product images” might traverse:

Content Generation → Video Production → skills for video editing
Data Processing → Image Manipulation → skills for image preparation
Content Generation → Marketing → skills for promotional content

This traversal surfaces skills that semantic search might miss—skills that are functionally relevant even if their descriptions don’t match the query textually.

Vector-Based Retrieval: Semantic Similarity Search

The vector-based approach converts skill descriptions into dense vector embeddings using models like OpenAI’s text-embedding-3-large. When a query comes in:

The query text is converted to a vector representation
Similarity search (typically cosine similarity) finds the closest skill vectors
Top-k results are returned as candidate skills

This approach excels at finding skills with similar meanings even when using different terminology. However, it can miss skills that are functionally complementary but semantically distant.

Hybrid Retrieval: Best of Both Worlds

AgentSkillOS can combine both approaches for optimal results. The tree-based method provides creative, non-obvious skill suggestions, while vector-based search ensures no relevant skills are missed due to vocabulary gaps. This hybrid approach significantly outperforms either method alone, as demonstrated in the benchmark results.

Practical Implications for Developers

For developers building AI agent workflows, this retrieval system means:

Less manual skill hunting: The system automatically surfaces relevant skills
More creative solutions: Non-obvious skill combinations lead to innovative workflows
Better task coverage: Complex tasks get decomposed into appropriate skill sequences
Scalable architecture: The system handles skill pools from 50 to 200,000+ without degradation

Core Components

Entry Points: The system provides multiple interfaces including Web UI, Batch CLI, and Python API for different use cases.

Manager Layer: Handles skill discovery through two approaches:

Tree-based: Uses capability trees for hierarchical skill organization
Vector-based: Uses semantic embeddings for similarity search

Orchestrator Layer: Manages skill execution:

DAG Engine: Plans and executes directed acyclic graphs
Direct Engine: Simple sequential execution
Freestyle Engine: Claude Code-style execution

Runtime: Executes skills using Claude Code or other LLM providers.

How Skill Retrieval Works

Why Skill Tree?

Pure semantic retrieval prioritizes textual similarity, often missing skills that look unrelated in embedding space but are crucial for actually solving the task. This leads to narrow, myopic skill usage.

AgentSkillOS uses LLM + Skill Tree to navigate the capability hierarchy, surfacing non-obvious but functionally relevant skills. This enables broader, more creative, and more effective skill composition.

Tree-based vs Vector-based Search

Approach	Strengths	Best For
Tree-based	Hierarchical organization, creative discovery	Complex tasks requiring diverse skills
Vector-based	Semantic similarity, fast lookup	Known skill patterns, straightforward tasks

Pre-built Skill Trees

Tree	Skills	Description
`skill_seeds`	~50	Curated skill set (default)
`skill_200`	200	200 skills
`skill_1000`	~1,000	1,000 skills
`skill_10000`	~10,000	10,000 active + layered dormant skills

DAG Orchestration

Understanding DAG Orchestration

The DAG (Directed Acyclic Graph) orchestration diagram above illustrates how AgentSkillOS coordinates multiple skills into coherent, executable workflows. This is where the magic happens—transforming a list of relevant skills into a coordinated execution plan.

What is DAG Orchestration?

A Directed Acyclic Graph is a mathematical structure where:

Nodes represent individual skills or operations
Edges represent dependencies between skills
Directed means edges have a direction (A → B means B depends on A)
Acyclic means there are no circular dependencies (no infinite loops)

This structure is perfect for skill orchestration because it naturally captures the dependencies between tasks while enabling parallel execution where possible.

The Orchestration Pipeline

The orchestration process follows a sophisticated pipeline:

1. Task Analysis Phase

When a user submits a task like “Create a bug diagnosis report for a mobile app,” the orchestrator first analyzes the requirements. It breaks down the high-level goal into sub-tasks:

Bug reproduction and localization
Error log analysis
Fix suggestion generation
Visual documentation creation
Report compilation

2. Skill Selection Phase

From the retrieved skills, the orchestrator selects the most appropriate ones for each sub-task. This selection considers:

Skill capabilities and specializations
Input/output compatibility between skills
Historical performance metrics
Resource requirements

3. Dependency Resolution Phase

The orchestrator determines which skills must run before others. For example:

Bug localization must complete before fix suggestion
Visual documentation requires both bug evidence and fix results
Report compilation depends on all previous outputs

4. Plan Generation Phase

The final DAG structure is generated, optimizing for:

Parallelism: Independent skills run simultaneously
Resource efficiency: Minimize redundant computations
Fault tolerance: Isolate failures to prevent cascade effects

Execution Strategies: Quality vs. Speed vs. Simplicity

AgentSkillOS offers three distinct orchestration strategies:

Quality-First Strategy

This strategy builds deep, multi-stage pipelines with extensive validation and refinement steps. Each skill’s output is verified before passing to the next stage. Ideal for:

Production deployments requiring high accuracy
Complex tasks with significant consequences for errors
Scenarios where iteration and refinement add value

Efficiency-First Strategy

This strategy maximizes parallel execution, running as many skills simultaneously as possible. Dependencies are minimized to reduce wait times. Ideal for:

Time-sensitive tasks
Batch processing scenarios
When approximate results are acceptable

Simplicity-First Strategy

This strategy uses only essential skills, avoiding complex pipelines. It’s the “minimum viable workflow” approach. Ideal for:

Simple, well-defined tasks
Quick prototyping and testing
When complexity overhead isn’t justified

Real-World Example: Bug Diagnosis Workflow

Consider a bug diagnosis task. The DAG might look like:

      
    
      
        [Mobile Bug Report]
       ↓
[Parse Stack Trace] ←→ [Extract Error Logs]
       ↓                      ↓
[Localize Bug] ←─────── [Analyze Patterns]
       ↓
[Generate Fix Suggestion]
       ↓
[Create Visual Evidence] ←→ [Validate Fix]
       ↓
[Compile Report]

      
      
        

This DAG shows parallel execution opportunities (Parse Stack Trace and Extract Error Logs can run simultaneously) while maintaining necessary dependencies (Localize Bug needs both inputs).

Human-in-the-Loop Control

A key feature of AgentSkillOS orchestration is human oversight. At each stage:

Users can review the generated plan before execution
Intermediate results can be inspected and approved
Manual intervention can redirect or modify the workflow
Execution can be paused, resumed, or terminated

This control is essential for production systems where AI autonomy must be balanced with human judgment.

Plan Generation

The DAG orchestrator analyzes task requirements and generates execution plans:

Task Analysis: Understands user requirements
Skill Selection: Chooses relevant skills from the pool
Dependency Resolution: Determines execution order
Plan Generation: Creates the DAG structure

Parallel Execution

The orchestrator supports three strategies:

Strategy	Description	Use Case
Quality-First	Deep, multi-stage pipelines	High-quality outputs
Efficiency-First	Wide, parallel execution	Speed optimization
Simplicity-First	Essential steps only	Simple tasks

Execution Flow

      
        User Request -> Skill Discovery -> Plan Generation -> DAG Execution -> Output
                    |                    |                  |
                    v                    v                  v
              Skill Tree          Dependency Graph    Parallel Tasks

Installation

Prerequisites

Python 3.10+
Claude Code (must be installed and available in PATH)
Use cc-switch to switch to other LLM providers

Install and Run

      
        # Clone the repository
git clone https://github.com/ynulihao/AgentSkillOS.git
cd AgentSkillOS

# Install in development mode
pip install -e .

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Start the web interface
python run.py --port 8765

Configuration

      
        # .env configuration
LLM_MODEL=openai/anthropic/claude-opus-4.5
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your-key

EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_API_KEY=your-key

Usage Examples

Web UI

The Web UI provides a visual workflow overview in the browser:

Navigate to http://localhost:8765
Enter your task description
Select skill discovery mode (tree or vector)
Choose orchestration strategy
Review and approve the generated plan
Monitor execution in real-time

Batch CLI

Run multiple tasks in parallel without the Web UI:

      
        # Run a batch configuration
python run.py cli --task config/batch.yaml

# Override parallel task count
python run.py cli -T config/batch.yaml --parallel 4

# Resume interrupted runs
python run.py cli -T config/batch.yaml --resume ./runs/my_batch_20260306_120000

# Dry run to preview tasks
python run.py cli -T config/batch.yaml --dry-run

Batch Configuration (YAML)

      
        batch_id: my_batch

defaults:
  skill_mode: auto          # "auto" (discover) or "specified"
  skill_group: skill_200    # Which skill pool to use
  output_dir: ./runs
  continue_on_error: true

execution:
  parallel: 2               # Max concurrent tasks
  retry_failed: 0

tasks:
  - file: path/to/task1.json
  - file: path/to/task2.json
  - dir: path/to/tasks/     # Scan directory
    pattern: "*.json"

CLI Flags

Flag	Description
`--task PATH`, `-T`	Path to batch YAML config (required)
`--parallel N`, `-p`	Override parallel task count
`--resume PATH`, `-R`	Resume an interrupted batch run
`--output-dir PATH`, `-o`	Override output directory
`--dry-run`	Preview tasks without execution
`--verbose`, `-v`	Show detailed logs
`--manager PLUGIN`, `-m`	Override skill manager (e.g., `tree`, `vector`)
`--orchestrator PLUGIN`	Override orchestrator (e.g., `dag`, `free-style`)

Custom Skill Groups

Create your own skill collections:

Create data/my_skills/skill-name/SKILL.md
Register in src/config.py -> SKILL_GROUPS
Build: python run.py build -g my_skills -v

Benchmark Results

AgentSkillOS includes a benchmark of 30 multi-format creative tasks spanning 5 categories, evaluated via pairwise comparison with Bradley-Terry aggregation.

Key Findings

Substantial Gains over Baselines: All three AgentSkillOS variants achieve the highest Bradley-Terry scores across 200 / 1K / 200K ecosystems
Both Retrieval and Orchestration Are Essential: Removing components reveals clear degradation
Strategy Choice Shapes Execution Structure: Each orchestration strategy faithfully translates its design intent into a distinct DAG topology

Example Use Cases

Example	Description
Bug Diagnosis Report	Mobile bug localization, fix validation, and visual bug report generation with before/after evidence
UI Design Research	Design-language research, report generation, and multi-direction concept mockups for knowledge software
Paper Promotion	Transforms academic papers into social slides, scientific pages, and platform-specific promotion content
Meme Video	Green-screen compositing, subtitle timing, and viral short-video production with multi-version outputs

Academic Reference

AgentSkillOS is backed by academic research. If you find it useful, consider citing the paper:

      
        @article{li2026organizing,
  title={Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale},
  author={Li, Hao and Mu, Chunjiang and Chen, Jianhao and Ren, Siyue and Cui, Zhiyao and Zhang, Yiqun and Bai, Lei and Hu, Shuyue},
  journal={arXiv preprint arXiv:2603.02176},
  year={2026}
}

Paper Link: arXiv:2603.02176

Dataset: Hugging Face - agentskillos-benchmark

Future Roadmap

AgentSkillOS is actively developed with planned features:

Conclusion

AgentSkillOS represents a significant advancement in managing AI agent skills at ecosystem scale. By combining hierarchical skill trees with DAG-based orchestration, it enables developers to:

Discover relevant skills from 200,000+ options
Compose multiple skills into working pipelines
Execute complex tasks with human oversight
Debug and iterate with full observability

The modular architecture allows plugging in different retrieval methods and orchestration strategies, making it adaptable to various use cases from simple automation to complex multi-stage workflows.

AgentSkillOS: An Operating System for Agent Skills

AgentSkillOS: An Operating System for Agent Skills

Understanding the Architecture Diagram

Entry Points Layer

Manager Layer: The Brain of Skill Discovery

Orchestrator Layer: Coordinating Complex Workflows

Runtime Layer: Where Skills Execute

Data Flow Through the System

Introduction

Why AgentSkillOS Matters

Key Features

Architecture Overview

Understanding the Skill Retrieval Flow

The Challenge of Skill Discovery at Scale

Tree-Based Retrieval: Navigating the Capability Hierarchy

Vector-Based Retrieval: Semantic Similarity Search

Hybrid Retrieval: Best of Both Worlds

Practical Implications for Developers

Core Components

How Skill Retrieval Works

Why Skill Tree?

Tree-based vs Vector-based Search

Pre-built Skill Trees

DAG Orchestration

Understanding DAG Orchestration

What is DAG Orchestration?

The Orchestration Pipeline

Execution Strategies: Quality vs. Speed vs. Simplicity

Real-World Example: Bug Diagnosis Workflow

Human-in-the-Loop Control

Plan Generation

Parallel Execution

Execution Flow

Installation

Prerequisites

Install and Run

Configuration

Usage Examples

Web UI

Batch CLI

Batch Configuration (YAML)

CLI Flags

Custom Skill Groups

Benchmark Results

Key Findings

Example Use Cases

Academic Reference

Future Roadmap

Conclusion

Links

Related Posts

Related Posts

Top AI Coding Assistant Frameworks: Build Your Own Intell...

Faster and accurate object tracking in Python

Creating a ComboBox-Based GUI with PySide6 (Part 3)

How to stream video and bidirectional text in socket...

FAQs about PyQt5

Python Programming Interview Questions and Answers

Contents