Data Science on Agentic System

The focus of this session is on managing agentic AI systems using established data science methodologies rather than treating them as unique technical mysteries. We explore how techniques such as funnel analysis, queueing theory, and statistical process control can be directly applied to monitoring and evaluating intelligent systems.

Through a detailed mapping of classic analytics to AI-specific challenges - including classifying agent failure modes, optimizing retrieval quality, and detecting performance degradation - we illustrate how familiar analogies like clickstream data and call center operations translate to instrumenting execution logs and designing controlled experiments.

Figure 1
Manage Agentic AI with Traditional Analytics

Deep Dive

Forcing Structured Outputs with Constrained Decoding

Presentation

Lecture Notes

Mapping Classic Data Science Techniques to Agentic AI Management

Classic Analytics ProblemAgentic AI EquivalentKey TechniqueBusiness AnalogyPrimary Metrics
Conversion funnel analysisTask completion path analysisSankey diagrams, sequence analysisClickstream / Customer Journey AnalyticsTask completion rate, Mean steps-to-completion, Tool call frequency, Path diversity
Concept drift in forecastingAgent performance degradationChange-point detection, SPC control chartsKPI Trend Monitoring / Demand Forecasting / SLA TrackingAnswer correctness, Task success rate, Latency, Token consumption
Call center queueing / workforce mgmtMulti-agent coordination & load balancingLittle’s Law, utilization analysis, bottleneck IDHospital Patient Flow / Supply Chain OptimizationUtilization rate, throughput, end-to-end latency ($W$), tasks in-flight ($L$)
Survey instrument validationLLM-as-judge calibrationCohen’s kappa, inter-rater reliabilitySocial science survey validationKrippendorff’s alpha, Cohen’s kappa, bias/calibration metrics
Search relevance optimizationRetrieval quality in RAG pipelinesNDCG, Precision@k, Recall@kE-commerce Search / Recommendation SystemsPrecision@k, Recall@k, NDCG, Similarity scores
Customer complaint classificationAgent failure taxonomyMulti-label classification, cost-sensitive learningFraud Detection / Ticket RoutingHamming loss, subset accuracy, per-label F1, Precision/Recall
A/B testing marketing campaignsAgent configuration experimentst-test, Mann-Whitney U, effect size estimationMarketing Campaign Experiments / Clinical Trialsp-value, Cohen’s d (Effect size), Statistical significance
Demand forecastingQuery volume & resource planningTime-series decomposition, capacity modelsRetail Demand ForecastingTrend, Seasonality, Residuals, GPU/Resource utilization
Customer segmentationAgent behavioral clusteringK-means, HDBSCAN, UMAPCRM Behavioral Cohort AnalysisCluster membership, silhouette scores (inferred), behavioral feature vectors
Fraud / anomaly detectionUnusual agent behavior detectionIsolation Forest, LOF, DBSCANNetwork Intrusion DetectionOutlier scores, noise points