AI Service Architecture for Self-Help System
Overview
The Help Deskβs AI-powered self-help system is designed to provide instant, intelligent IT support while minimizing costs through a sophisticated multi-tier response strategy. This document explains the technical architecture, implementation details, and design decisions behind the AI service.
Architecture Components
1. Core AI Service (aiService.ts)
The main AI orchestrator that manages the entire self-help workflow:
class AIService { private geminiModel: GenerativeModel; private contextCache: Map<string, CachedContext>; private knowledgeBase: Map<string, KnowledgeArticle>;}Key Responsibilities:
- Query categorization and routing
- Context management for conversations
- Integration with Google Gemini AI
- Dynamic content personalization
- Token usage optimization
2. FAQ Service (faqService.ts)
A high-performance FAQ matching system that serves as the first line of response:
class FAQService { private fuse: Fuse<FAQ>; // Fuzzy search engine private exactMatchCache: Map<string, FAQ>; private faqs: FAQ[]; // Loaded from Firestore}Key Features:
- Firestore Integration: Dynamic FAQ management with real-time updates
- Multi-tier Matching:
- Exact match (100% confidence)
- Keyword match (80-95% confidence)
- Fuzzy match (60-80% confidence)
- Usage Analytics: Tracks FAQ effectiveness
- Category Organization: IT-specific categorization
3. Cache Service (cacheService.ts)
Intelligent caching system for AI context optimization:
class CacheService { private cache: Map<string, CachedEntry>; private readonly maxSize = 100; private readonly ttl = 3600000; // 1 hour}Benefits:
- Reduces redundant AI API calls
- Maintains conversation context
- Implements LRU eviction strategy
- Saves ~70% on repeat queries
Query Processing Flow
Step 1: Initial Query Analysis
graph TD A[User Query] --> B{FAQ Match?} B -->|Yes| C[Return FAQ Answer] B -->|No| D[Categorize Query] D --> E[Load Context] E --> F[Generate AI Response]- Normalize Query: Clean and standardize user input
- FAQ Search: Check against Firestore-backed FAQ database
- Confidence Check: If match confidence > 70%, return FAQ
- Fallback to AI: Process with Gemini for complex queries
Step 2: AI Processing Pipeline
async processWithAI(query: string, config: Config): Promise<AIResponse> { // 1. Categorize the query const category = this.categorizeQuery(query);
// 2. Load relevant knowledge base const context = await this.loadContextForCategory(category);
// 3. Check cache for similar queries const cached = this.cacheService.get(query); if (cached) return cached;
// 4. Generate AI response const response = await this.generateResponse(query, context);
// 5. Cache and return this.cacheService.set(query, response); return response;}Step 3: Response Enhancement
All responses undergo post-processing:
- Phone Number Replacement:
{SUPPORT_PHONE}β actual support number - Markdown Formatting: Convert to user-friendly format
- Safety Checks: Filter potentially harmful content
- Context Preservation: Store for follow-up questions
Cost Optimization Strategies
1. FAQ-First Approach
- Impact: 70-80% queries handled without AI
- Cost: $0 (no API calls)
- Speed: < 10ms response time
2. Gemini 1.5 Flash Model
- Cost: $0.075 per 1M input tokens
- Speed: Fast inference times
- Quality: Excellent for IT support tasks
3. Smart Caching
- Cache Hit Rate: ~30% for AI queries
- Savings: Reduces API calls significantly
- TTL: 1-hour expiration for freshness
4. Context Pruning
private pruneContext(context: string, maxTokens: number = 4000): string { // Remove redundant information // Prioritize most relevant content // Maintain coherence}Knowledge Base Structure
Categories and Context
const KNOWLEDGE_BASE = { 'password': { context: 'Password policies, reset procedures...', commonIssues: ['locked accounts', 'expired passwords'], solutions: [...] }, 'email': { context: 'Email setup, troubleshooting...', commonIssues: ['sync issues', 'quota exceeded'], solutions: [...] } // ... more categories}Dynamic Knowledge Loading
The system intelligently loads only relevant knowledge based on query categorization:
- Primary Category: Full context (1000-2000 tokens)
- Related Categories: Summary only (200-500 tokens)
- General Policies: Always included (500 tokens)
Integration with Ticket System
Seamless Escalation
When users need human support:
interface TicketContext { originalQuestion: string; aiSuggestions: string[]; conversationHistory: Message[]; category: string; attemptedSolutions: string[];}This context is automatically included in new tickets, giving support staff complete background.
Real-time FAQ Management
Firestore Integration
// FAQs are now stored in Firestore/faqs/{faqId} - category: string - questions: string[] - answer: string - keywords: string[] - priority: number - usage_count: numberAdmin Features
- CRUD Operations: Full FAQ management UI
- Usage Analytics: Track effectiveness
- A/B Testing: Compare different answers
- Bulk Import: Migrate legacy FAQs
Security and Privacy
Data Handling
- No PII Storage: Conversations are ephemeral
- Sanitization: User inputs are cleaned
- Access Control: Firestore rules enforce permissions
- Audit Trail: Admin actions are logged
Safety Measures
private sanitizeInput(input: string): string { // Remove potential XSS // Filter profanity // Validate length // Check for patterns}Performance Metrics
Response Times
- FAQ Match: < 10ms
- Cached AI: < 50ms
- Fresh AI Query: 1-3 seconds
Accuracy Metrics
- FAQ Hit Rate: 70-80%
- AI Satisfaction: 85%+
- Escalation Rate: < 15%
Configuration
Environment Variables
VITE_GEMINI_API_KEY=your_api_keyVITE_ENABLE_AI_SELF_HELP=trueVITE_ENABLE_TOKEN_CACHE=trueVITE_MAX_CONTEXT_SIZE=4000Dynamic Settings
- Support phone number
- Company-specific knowledge
- Response templates
- Escalation thresholds
Monitoring and Analytics
Key Metrics Tracked
- Query Volume: Requests per hour/day
- Response Types: FAQ vs AI vs Escalation
- Token Usage: Cost monitoring
- User Satisfaction: Implicit feedback
- Category Distribution: Common problem areas
Dashboard Integration
interface AIAnalytics { totalQueries: number; faqHitRate: number; aiUsageRate: number; escalationRate: number; tokenCost: number; categoryCounts: Record<string, number>;}Future Enhancements
Planned Features
- Multi-language Support: Serve global teams
- Voice Integration: Speech-to-text queries
- Proactive Suggestions: Predict issues
- Learning System: Improve from interactions
- Integration APIs: Connect with other tools
Machine Learning Pipeline
- Analyze successful resolutions
- Identify new FAQ candidates
- Improve categorization accuracy
- Personalize responses per user
Best Practices
For Administrators
- Regular FAQ Updates: Keep content current
- Monitor Analytics: Identify gaps
- Test Responses: Ensure accuracy
- Gather Feedback: Improve continuously
For Developers
- Token Efficiency: Minimize context size
- Error Handling: Graceful fallbacks
- Performance: Cache aggressively
- Security: Validate all inputs
Troubleshooting
Common Issues
-
High Token Usage
- Check context size
- Review caching effectiveness
- Analyze query patterns
-
Slow Responses
- Verify Gemini API status
- Check network latency
- Review context loading
-
Poor Match Quality
- Update FAQ keywords
- Adjust confidence thresholds
- Enhance categorization
Conclusion
The AI-powered self-help system represents a sophisticated approach to IT support automation. By combining intelligent FAQ matching, advanced AI capabilities, and seamless escalation paths, it provides users with instant, accurate help while maintaining cost efficiency and system performance.
The architecture is designed to scale with organizational needs while maintaining flexibility for future enhancements and integrations.