Skip to main content
NICE CXone Expert
Expert Success Center

Foundation Model Evolution

Keeping up with the pace of innovation periodically requires foundation model and/or vector embedding model upgrades and changes. This is managed much like a notable traditional software upgrade of an application. The motivations are to ensure systems remain secure, efficient, and equipped to provide the best possible results.

Reasons for model updates

  • A model, or embedding service provider, no longer supports a current version.
  • A new model with equal or better results is introduced.
  • Model prices change.
  • Better scalability, reliability, and security are offered via a model upgrade.

Process for model adoption

Benchmark testing

All models are evaluated based on the industry benchmarks before considering utilization. Generally, the model under consideration must meet or exceed the benchmarks. These benchmarks are evolving, and the goal is to compare the same benchmarks for any model-to-model changes.

Examples of common industry benchmarks
Feature Model Example A  Model Example B
Input context window 300,000 tokens 100,000 tokens
Maximum output tokens 5,000 tokens Unknown
Supported modalities Text, image, and video processing Text only
Release date December 2, 2024 August 9, 2023
Knowledge cut-off date Purposefully not disclosed January 2023
MMLU Benchmark (Massive Multitask Language Understanding) 85.9% (Chain-of-Thought) 73.4% (5-shot scenario)
HumanEval (code generation) 89% pass@1 Not available
MATH Benchmark 76.6% (Chain-of-Thought) Not available
GPQA Benchmark (PhD-level knowledge) 46.9% Not available
IFEval (Instruction following) 92.1% Not available

Baseline (RAGAS) testing

Before implementation, all models undergo extensive baseline testing by the CXone Mpower Expert team and follow structured release / upgrade processes⁠⁠.

  1. The baseline testing consists of a set of questions which generate a RAGAS (Retrieval Augmented Generation Assessment Score).
  2. The existing model and the new model are compared.
  3. New models are adopted when these scores are at baseline or better, or when reasoned evidence can explain why a result is not at baseline or higher.
  4. The new model is exercised on Expert Help (this site) before broad deployment or general availability.
Comparison of Models using RAGAS
Model Average of context precision Average of context recall Average of answer relevancy Average of faithfulness
Baseline Model A 0.43 0.83 0.88 0.86
Baseline Model B 0.43 0.83 0.89 0.81
Baseline Model C 0.43 0.83 0.87 0.86

CXone Mpower Expert model usage

Features that use LLM / foundation models and/or embeddings in the product include:

  • Generations (Completions): GenSearch
  • Content Assessment: Customers can upload questions and determine if the content in Expert will answer those questions.
  • Answer Relevance, Answer Faithfulness
    • Relevance Reporting: Evaluates how relevant the generated answer is to the question submitted.
    • Faithfulness: Computes how accurately the generated answer reflects the information in the retrieved context.

Customer deployment of model changes

  • For Generations (Completions), customers receive a model adoption period to test new models before implementation. In some cases, such as when a model provider discontinues a model or its support, customers may have to adopt changes immediately.
    • The model adoption period is determined by NICE.
    • The intent is to provide a minimum of 6 weeks for customers to evaluate new models.
    • The Model adoption period would start after the above benchmarking and baseline testing have concluded.
    • Customers can evaluate the new models if that is desired, but evaluation or action from the customer is not required.  
    • When customers perform testing and questions arise, a support ticket can be created and the Product Success Manager can be engaged.
  • For Content Assessment, Answer Relevance, Answer Faithfulness, and Content Editor (creation, update, and other functions), customers receive advance notice of model changes, and a migration schedule is provided.

 

  • Was this article helpful?