Prompt & Skill Development Workflow¶
This document describes the workflow for developing and testing prompts and skills using local files, AI assistance, and the cli-langfuse-sync tool.
Overview¶
When running the development server with just dev development, the system uses local prompt and skill files instead of fetching from Langfuse. This enables a powerful AI-assisted development workflow where you can:
- Edit prompts and skills locally in your IDE
- Use AI (like Claude) to modify both prompts/skills AND code together
- Test changes immediately with unit tests
- Push verified changes to Langfuse when ready
Local Development Setup¶
Start the Development Server¶
With IX_ENVIRONMENT=development, the system reads prompts and skills from local files:
backend/apps/shared_data/prompts/website-agent/
├── main.md # Main system prompt
├── skills/
│ ├── response_handling/
│ │ └── pricing/
│ │ └── SKILL.md
│ └── clients/
│ └── abtasty.com/
│ └── skills/
│ └── pricing/
│ └── SKILL.md
└── ...
AI-Assisted Development¶
Because prompts and skills are plain Markdown files in your codebase, you can use AI assistants (like Claude Code) to:
- Modify prompts and code together - Make coherent changes across the entire system
- Refactor skill definitions - Update SKILL.md files while adjusting related Python code
- Add new skills - Create new skill files with proper structure and metadata
- Test changes - Run unit tests to verify behavior before pushing
Example workflow with AI:
User: "Update the pricing skill to handle enterprise pricing differently"
AI:
1. Reads the current SKILL.md
2. Modifies the skill definition
3. Updates any related Python code if needed
4. Runs unit tests to verify
5. Reports ready for push
Testing with Spy Tests¶
Use pytest to test prompts and skills locally before pushing to Langfuse.
Run Skill Tests¶
What Spy Tests Verify¶
- Skill selection logic matches expected intents
- Prompt rendering produces valid output
- Skill metadata is correctly parsed
- Integration between skills and the router works correctly
Example Test Pattern¶
@pytest.mark.unit
async def test_pricing_skill_selected_for_pricing_query(mocker):
"""Verify pricing skill is selected for pricing-related queries."""
# Test uses local SKILL.md files
result = await skill_selector.select_skill(
query="How much does it cost?",
context=mock_context,
)
assert result.skill_name == "pricing"
Syncing with Langfuse¶
Once your local changes are tested and ready, use cli-langfuse-sync to push them to Langfuse.
Step 1: Check Status¶
See which files have local modifications:
Output shows sync status for each file:
| Status | Meaning |
|---|---|
SYNCED |
Local matches Langfuse |
LOCAL_MODIFIED |
Local has unpushed changes |
NEW_LOCAL |
New file, not yet in Langfuse |
Step 2: Review Changes¶
See the actual diff before pushing:
This shows a unified diff comparing your local changes against what's in Langfuse.
Step 3: Push Changes¶
Upload your changes to Langfuse:
This creates new versions in Langfuse with the latest label.
Step 4: Bump to Label¶
Set Langfuse to use your new versions with a specific label:
# Apply 'development' label (default)
cli-langfuse-sync bump
# Apply 'production' label for production deployment
cli-langfuse-sync bump --label production
The bump command applies a label to the latest versions without creating new versions. This is how you control which version the system uses in each environment.
Complete Workflow Example¶
# 1. Start development server
cd backend
just dev development
# 2. Edit prompts/skills (or ask AI to help)
# ... make changes to SKILL.md files ...
# 3. Run tests
poetry run pytest -m unit packages/ixskills/
# 4. Check what needs syncing
cli-langfuse-sync status
# 5. Review the diff
cli-langfuse-sync diff
# 6. Push to Langfuse
cli-langfuse-sync push --yes
# 7. Set development label for testing
cli-langfuse-sync bump --label development
# 8. After validation, promote to production
cli-langfuse-sync bump --label production
Environment Labels¶
| Label | Purpose | When to Use |
|---|---|---|
development |
Active development | Default after push, for dev/staging testing |
production |
Production-ready | After validating in development |
latest |
Most recent version | For pulling/comparing (read-only) |
Best Practices¶
1. Test Before Pushing¶
Always run unit tests before pushing changes:
2. Use Meaningful Commits¶
Commit prompt/skill changes with the code that depends on them:
git add backend/apps/shared_data/prompts/
git add backend/packages/ixskills/
git commit -m "feat: add enterprise pricing handling to pricing skill"
3. Validate in Development First¶
Always bump to development first, test in staging, then bump to production:
cli-langfuse-sync bump --label development
# ... test in staging environment ...
cli-langfuse-sync bump --label production
4. Coordinate with Team¶
Use cli-langfuse-sync status before making changes to avoid conflicts:
cli-langfuse-sync status
# Check if anyone else has pushed changes
cli-langfuse-sync pull # If needed
Troubleshooting¶
Local Changes Not Taking Effect¶
Ensure you're running with IX_ENVIRONMENT=development:
Tests Failing After Skill Changes¶
- Check skill metadata is valid YAML
- Verify skill category matches expected values
- Run specific test with verbose output:
Push Rejected¶
If push fails due to version mismatch:
# Pull latest changes first
cli-langfuse-sync pull
# Resolve any conflicts
# Then push again
cli-langfuse-sync push
Related Documentation¶
- CLI Langfuse Sync Reference - Full
cli-langfuse-synccommand reference - IXChat Package - How skills integrate with the chatbot
- Development Workflow - General development workflow