Arize AI, a leader in AI observability and LLM evaluation, has introduced new capabilities aimed at assisting AI developers in evaluating and debugging LLM systems. The announcements were made at the Arize conference, featuring notable speakers from OpenAI, Lowe’s, Mistral, Microsoft, NATO, and others, who are sharing advancements in research, engineering best practices, and open-source frameworks.
Arize Copilot, the industry’s first AI assistant designed for troubleshooting AI systems, is a groundbreaking tool within the Arize platform. It surfaces relevant information and suggests actions, automating complex tasks to save time and enhance app performance for AI engineers. Key functionalities include providing model insights, optimizing prompts, building custom evaluations, and conducting AI searches.
“Using AI to troubleshoot complex AI systems is a logical next step in the evolution of building generative AI applications, and we are proud to offer Arize Copilot to teams that want to improve the development and performance of LLM systems,” said Aparna Dhinakaran, Chief Product Officer and Co-Founder of ArizeNew Workflows for Enhanced LLM App Management
In addition, new workflows in the Arize platform enable engineers to identify and resolve issues in deployed LLM apps. For instance, the AI search functionality allows teams to select an example span and discover all similar issues, such as finding all data points where a customer is frustrated. These data points can then be saved into a curated dataset for annotations, evaluation experiments, or fine-tuning workflows.
These updates position Arize as a comprehensive tool for both experimentation and production observability. Engineers can make adjustments, such as editing a prompt template or changing the LLM, and assess performance impacts across test datasets, including metrics like latency, retrieval accuracy, and hallucinations, before safely deploying changes into production.