name: zhipuai-glm-cli-and-agents description: CLI invocation, direct API usage (curl/Python), non-interactive Claude Code with GLM backend, sub-agent model configuration, tool calling, streaming, and multi-agent orchestration patterns on the Z.ai GLM Coding Plan. type: reference

Z.ai GLM — CLI Invocation & Agent Communication

API Endpoints

Format	Endpoint
Native Z.ai	`https://api.z.ai/api/paas/v4/chat/completions`
OpenAI-compatible	`https://api.z.ai/v1/chat/completions`
Anthropic-compatible	`https://api.z.ai/api/anthropic`

Authentication: Authorization: Bearer <your-z.ai-api-key> on all endpoints.

Direct curl Invocation

Basic completion (OpenAI-compatible endpoint)

curl -s https://api.z.ai/v1/chat/completions \
  -H "Authorization: Bearer $ZAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-4.7",
    "messages": [
      {"role": "system", "content": "You are a senior software engineer."},
      {"role": "user", "content": "Write a Python function to parse ISO 8601 dates."}
    ],
    "temperature": 1.0,
    "max_tokens": 4096
  }' | jq '.choices[0].message.content'

Streaming (SSE)

curl -sN https://api.z.ai/v1/chat/completions \
  -H "Authorization: Bearer $ZAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-4.7",
    "messages": [{"role": "user", "content": "Implement a binary search in Go."}],
    "stream": true,
    "temperature": 1.0
  }'
# -N disables curl buffering — required for SSE streams
# Parse: each line starting with "data:" contains a JSON chunk; stream ends at data: [DONE]

With thinking/reasoning mode

curl -s https://api.z.ai/v1/chat/completions \
  -H "Authorization: Bearer $ZAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1",
    "messages": [{"role": "user", "content": "Design a distributed rate limiter."}],
    "thinking": {"type": "enabled"},
    "temperature": 1.0,
    "max_tokens": 8192
  }'

Official Python SDK (`zai`)

pip install zai

Basic usage

from zai import ZaiClient

client = ZaiClient(api_key="your-z.ai-api-key")

response = client.chat.completions.create(
    model="GLM-4.7",
    messages=[{"role": "user", "content": "Refactor this function for clarity."}],
    temperature=1.0,
    max_tokens=4096
)
print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="GLM-4.7",
    messages=[{"role": "user", "content": "Implement a Fibonacci generator."}],
    stream=True,
    temperature=1.0
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Thinking/reasoning mode

response = client.chat.completions.create(
    model="GLM-5.1",
    messages=[{"role": "user", "content": "Debug this race condition."}],
    thinking={"type": "enabled"},
    temperature=1.0,
    max_tokens=8192
)

Via OpenAI SDK (drop-in)

from openai import OpenAI

client = OpenAI(
    api_key="your-z.ai-api-key",
    base_url="https://api.z.ai/v1"
)

response = client.chat.completions.create(
    model="GLM-4.7",
    messages=[{"role": "user", "content": "Write unit tests for this module."}],
    temperature=1.0
)

Tool Calling (Function Calling)

Standard OpenAI-compatible schema. Supported on GLM-4.6, GLM-4.7, GLM-5.

from openai import OpenAI

client = OpenAI(api_key="your-key", base_url="https://api.z.ai/v1")

tools = [
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Execute the test suite and return results",
            "parameters": {
                "type": "object",
                "properties": {
                    "test_path": {"type": "string", "description": "Path to test file or directory"},
                    "verbose":   {"type": "boolean", "default": False}
                },
                "required": ["test_path"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="GLM-4.7",
    messages=[{"role": "user", "content": "Run the auth tests and summarize failures."}],
    tools=tools,
    tool_choice="auto"
)

if response.choices[0].message.tool_calls:
    for call in response.choices[0].message.tool_calls:
        print(f"Tool: {call.function.name}")
        print(f"Args: {call.function.arguments}")

Streaming tool calls

response = client.chat.completions.create(
    model="GLM-4.7",
    messages=[{"role": "user", "content": "Check the weather in Beijing."}],
    tools=tools,
    stream=True,
    tool_stream=True   # enables streaming of tool call parameters
)

for chunk in response:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
    if delta.tool_calls:
        for tc in delta.tool_calls:
            # accumulate streamed JSON arguments
            pass

tool_stream=True reduces latency by streaming tool argument JSON as it's generated rather than buffering the full call.

Non-Interactive Claude Code (CLI Automation)

Claude Code supports non-interactive mode via -p flag — useful for scripts, CI, and agent-to-agent pipelines.

One-shot task

# Using native claude with GLM env vars pre-set
ANTHROPIC_AUTH_TOKEN=$ZAI_API_KEY \
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic \
API_TIMEOUT_MS=3000000 \
claude -p "Add docstrings to all public functions in src/api.py"

Using the `z`/`glm` community wrapper

Two community projects provide z and glm as drop-in wrappers:

geoh/z.ai-powered-claude-code — installs z and glm commands:

bash scripts/install.sh   # Linux/macOS
# Windows: .\scripts\install.ps1

z "Implement the pagination feature described in JIRA-123"
glm --model opus "Refactor the auth module for testability"
z -p "Run the test suite and fix all failures"   # non-interactive

ankurkakroo2/claude-code-glm-setup — adds glm shell function:

glm -p "what is 2+2"           # non-interactive single prompt
glm --debug api                # enables API request logging

Both wrappers use --settings ~/.claude/glm-settings.json to inject Z.ai credentials without touching the default Claude config.

Sub-Agent Model Configuration

When Claude Code spawns sub-agents (background tasks, parallel tool execution), you can control which GLM model they use independently of the main model.

`.zai.json` config (geoh wrapper)

{
  "apiKey": "your-z.ai-key",
  "opusModel":    "GLM-5.1",
  "sonnetModel":  "GLM-4.7",
  "haikuModel":   "GLM-4.5-Air",
  "subagentModel": "GLM-4.7",
  "defaultModel": "opus",
  "enableThinking": "true",
  "reasoningEffort": "high"
}

subagentModel — the model used for sub-agent tasks. Recommended: GLM-4.7 (1× quota cost) to prevent sub-agents from burning through advanced model budget while the main agent uses GLM-5.1.

Per-project config

Place .zai.json in a project root to override global settings for that project:

{
  "defaultModel": "sonnet",
  "subagentModel": "GLM-4.5-Air",
  "reasoningEffort": "medium"
}

Config search order (first found wins): ./.zai.json → $ZAI_CONFIG_PATH → ~/.config/zai/config.json → ~/.zai.json

Env var override (highest priority)

export ZAI_API_KEY="your-key"   # overrides apiKey in any config file

Agent-to-Agent Patterns

Pattern: Orchestrator + Worker agents

The 1-concurrent-request limit means true parallel fan-out won't work. Use a serial dispatch queue:

import time
from openai import OpenAI

client = OpenAI(api_key="your-key", base_url="https://api.z.ai/v1")

def glm_call(prompt, model="GLM-4.7"):
    """Single agent call — serialize these, don't parallelize."""
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=1.0
    )
    return response.choices[0].message.content

# Orchestrator plans, then dispatches workers serially
plan = glm_call("Break this task into 3 subtasks: implement OAuth login", model="GLM-5.1")
results = []
for subtask in parse_subtasks(plan):
    result = glm_call(subtask, model="GLM-4.7")   # workers use cheaper model
    results.append(result)
    time.sleep(0.5)   # optional: be gentle on rate limits

Pattern: Shell script agent loop

#!/usr/bin/env bash
# Non-interactive agentic loop — each step fed to next
set -e

CONTEXT="Fix all failing tests in ./tests/"

# Step 1: analyze
ANALYSIS=$(ANTHROPIC_AUTH_TOKEN=$ZAI_API_KEY \
           ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic \
           API_TIMEOUT_MS=3000000 \
           claude -p "Analyze: $CONTEXT. List specific failures.")

# Step 2: fix based on analysis
ANTHROPIC_AUTH_TOKEN=$ZAI_API_KEY \
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic \
API_TIMEOUT_MS=3000000 \
claude -p "Given this analysis: $ANALYSIS — fix the failures."

Pattern: Claude Code as sub-agent caller

Within a Claude Code session backed by GLM, you can spawn Claude Code subprocesses pointed at the same Z.ai backend:

import subprocess

result = subprocess.run(
    ["claude", "-p", "Implement the data validation layer"],
    env={
        **os.environ,
        "ANTHROPIC_AUTH_TOKEN": os.getenv("ZAI_API_KEY"),
        "ANTHROPIC_BASE_URL":   "https://api.z.ai/api/anthropic",
        "API_TIMEOUT_MS":       "3000000"
    },
    capture_output=True, text=True, timeout=300
)
print(result.stdout)

LiteLLM Integration (Multi-Provider Orchestration)

LiteLLM provides a unified interface for routing across providers. Prefix Z.ai models with zai/:

import litellm

# Route to Z.ai GLM
response = litellm.completion(
    model="zai/GLM-4.7",
    messages=[{"role": "user", "content": "Implement error handling."}],
    api_key=os.getenv("ZAI_API_KEY")
)

# Fallback: try GLM-5.1 first, fall back to GLM-4.7
response = litellm.completion(
    model="zai/GLM-5.1",
    messages=[{"role": "user", "content": "Solve this hard bug."}],
    api_key=os.getenv("ZAI_API_KEY"),
    fallbacks=["zai/GLM-4.7"]
)

Reasoning Effort Levels (geoh wrapper)

The reasoningEffort field in .zai.json maps to model thinking depth:

Value	Use Case
`auto`	Let model decide
`low`	Simple tasks, fast response
`medium`	Balanced (default)
`high`	Complex reasoning, architecture
`max`	Maximum — most quota intensive

Windows Notes

Shell	Wrapper	Limitations
Git Bash	`z` / `glm`	Full functionality, status line works
PowerShell	`z.ps1` / `glm.ps1`	Core features, no status line
CMD	`z.cmd` / `glm.cmd`	Core features, no status line

Status line requires bash + stdin piping. On Windows, use Git Bash for full experience.

Requires jq installed and on PATH for JSON processing in wrapper scripts.

ナビゲーション

Skillsとは？

リンク

zhipuai-glm-cli-and-agents

name: zhipuai-glm-cli-and-agents description: CLI invocation, direct API usage (curl/Python), non-interactive Claude Code with GLM backend, sub-agent model configuration, tool calling, streaming, and multi-agent orchestration patterns on the Z.ai GLM Coding Plan. type: reference

Z.ai GLM — CLI Invocation & Agent Communication

API Endpoints

Direct curl Invocation

Basic completion (OpenAI-compatible endpoint)

Streaming (SSE)

With thinking/reasoning mode

Official Python SDK (`zai`)

Basic usage

Streaming

Thinking/reasoning mode

Via OpenAI SDK (drop-in)

Tool Calling (Function Calling)

Streaming tool calls

Non-Interactive Claude Code (CLI Automation)

One-shot task

Using the `z`/`glm` community wrapper

Sub-Agent Model Configuration

`.zai.json` config (geoh wrapper)

Per-project config

Env var override (highest priority)

Agent-to-Agent Patterns

Pattern: Orchestrator + Worker agents

Pattern: Shell script agent loop

Pattern: Claude Code as sub-agent caller

LiteLLM Integration (Multi-Provider Orchestration)

Reasoning Effort Levels (geoh wrapper)

Windows Notes

関連スキル(🔧 開発ツール)

ナビゲーション

Skillsとは？

リンク

zhipuai-glm-cli-and-agents

name: zhipuai-glm-cli-and-agents description: CLI invocation, direct API usage (curl/Python), non-interactive Claude Code with GLM backend, sub-agent model configuration, tool calling, streaming, and multi-agent orchestration patterns on the Z.ai GLM Coding Plan. type: reference

Z.ai GLM — CLI Invocation & Agent Communication

API Endpoints

Direct curl Invocation

Basic completion (OpenAI-compatible endpoint)

Streaming (SSE)

With thinking/reasoning mode

Official Python SDK (zai)

Basic usage

Streaming

Thinking/reasoning mode

Via OpenAI SDK (drop-in)

Tool Calling (Function Calling)

Streaming tool calls

Non-Interactive Claude Code (CLI Automation)

One-shot task

Using the z/glm community wrapper

Sub-Agent Model Configuration

.zai.json config (geoh wrapper)

Per-project config

Env var override (highest priority)

Agent-to-Agent Patterns

Pattern: Orchestrator + Worker agents

Pattern: Shell script agent loop

Pattern: Claude Code as sub-agent caller

LiteLLM Integration (Multi-Provider Orchestration)

Reasoning Effort Levels (geoh wrapper)

Windows Notes

関連スキル(🔧 開発ツール)

Official Python SDK (`zai`)

Using the `z`/`glm` community wrapper

`.zai.json` config (geoh wrapper)