name: austn-tools description: Generate content using austn.net AI services (TTS, images, etc.) user_invocable: true
Austn Tools Skill
Purpose
Access Austin's local GPU-powered AI services at austn.net for content generation:
- Text-to-Speech (Chatterbox TTS)
- Image Generation (ComfyUI)
- Background Removal
- Vector Tracing
- Audio Stem Separation
- And more
Available Services
1. Text-to-Speech (/tts)
URL: https://austn.net/tts/new Backend: Chatterbox TTS on local GPU
⚠️ CRITICAL CONSTRAINT: 40-second maximum duration
- Audio caps at 40 seconds regardless of text length
- For longer content: split into multiple clips with separate share links
- Estimate: ~100-120 words = ~40 seconds
Parameters:
| Field | Description | Default |
|---|---|---|
| text | Text to speak (keep under ~120 words) | Required |
| voice | Voice selection | "Default voice" |
| exaggeration | Emotional intensity (0-1) | 0.5 |
| cfg_weight | Voice adherence (0-1) | 1.0 |
Expression Tags (add inline to text):
[laughter]- Laughing[giggle]- Giggling[sigh]- Sighing[gasp]- Gasping[whisper]- Whispering[cough]- Coughing[clear_throat]- Throat clearing[groan]- Groaning[humming]- Humming[UH],[UM]- Filler sounds
Example Text:
Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?
2. Image Generation (/images)
URL: https://austn.net/images/ai_generate Backend: ComfyUI on local GPU
Parameters:
| Field | Description | Default |
|---|---|---|
| prompt | Image description | Required |
| negative_prompt | What to avoid | "blurry, low quality, distorted" |
| seed | Reproducibility seed | Random |
| size | Image dimensions | 512x512 |
| batch_size | Number of images | 1 |
| publish | Show in gallery 10min | false |
3. Background Removal (/rembg)
URL: https://austn.net/rembg Remove backgrounds from images.
4. Vector Tracing (/vtracer)
URL: https://austn.net/vtracer Convert raster images to SVG vectors.
5. Audio Stems (/stems)
URL: https://austn.net/stems Separate audio into vocal/instrument tracks.
6. 3D Tools (/3d)
URL: https://austn.net/3d 3D content generation.
7. MIDI Generation (/midi)
URL: https://austn.net/midi Generate MIDI sequences.
Usage via Browser Automation
Since these are web UIs, use browser automation to interact:
TTS Generation
# 1. Navigate to TTS
navigate("https://austn.net/tts/new")
# 2. Click text field and enter text
click(text_field)
type("Hello world! [laughter] This is a test.")
# 3. Optionally expand advanced options
click(advanced_options_checkbox)
# Adjust sliders if needed
# 4. Click Generate Speech
click(generate_button)
# 5. Wait for audio, then download
Image Generation
# 1. Navigate to image generator
navigate("https://austn.net/images/ai_generate")
# 2. Enter prompt
click(prompt_field)
type("A robot writing code in a cozy office, digital art")
# 3. Optionally set advanced options
click(advanced_options_checkbox)
# Set negative prompt, seed, size, batch
# 4. Click Generate Image
click(generate_button)
# 5. Wait for result, download
Browser Automation Tips
Field Locations (approximate)
TTS Page (/tts/new):
- Text input: Center of page, large textarea
- Voice dropdown: Below text input
- Advanced options checkbox: Below voice dropdown
- Exaggeration slider: After checkbox expanded
- CFG Weight slider: Below exaggeration
- Generate button: Green button at bottom
Image Page (/images/ai_generate):
- Prompt textarea: Top of form
- Advanced options checkbox: Below prompt
- Negative prompt: First advanced field
- Seed input: Below negative prompt
- Size dropdown: Below seed
- Batch size dropdown: Below size
- Generate button: Green button at bottom
Downloading Results
- TTS: Audio player appears, right-click to save or use download button
- Images: Image appears in result area, right-click to save
Integration with Video Pipeline
These tools combine well for autonomous video creation:
- Script → Write narration text
- TTS → Generate voiceover audio
- Images → Generate visuals/thumbnails
- Combine → Use ffmpeg or video editor
Example Workflow
1. Generate narration: /austn-tools tts "Welcome to austnomaton..."
2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art"
3. Record screen session with browser automation
4. Combine audio + video with ffmpeg
5. Export final video
Output Locations
Save generated content to:
- Audio:
content/audio/ - Images:
content/images/ - Videos:
content/videos/
Service Status & Dependencies
| Service | Backend | Requires Local GPU |
|---|---|---|
| TTS | Chatterbox TTS | Yes (but often available) |
| Images | ComfyUI | Yes - needs server running |
| Rembg | Python | Likely |
| VTracer | Rust | Likely |
| Stems | Demucs | Yes |
| 3D | Unknown | Yes |
| MIDI | Unknown | Yes |
Connection Details
- Services route to local GPU via Tailscale
- Image generation connects to
100.68.94.33:8188(ComfyUI) - If generation fails with "TCP connection" error, the backend server isn't running
Verified Working (2026-02-02)
- ✅ TTS - Generated 8.4s audio in 6.9s
- ❌ Images - Failed (ComfyUI server not running)
Notes
- Services depend on Austin's local GPU being online
- No API keys needed - it's Austin's own infrastructure
- TTS has "Share Link" that lasts 7 days
- Gallery publish is optional and temporary (10 min)
- Large batches may take time depending on GPU load