Skip to main content
NICE CXone Expert
Expert Success Center

Generative Responses FAQ

This FAQ addresses common inquiries about the large language model (LLM) behind Expert Kernels.

Prepare for AI and content management

How can you prepare to use AI?
  1. Articulate what your goals and expectations for GenAI are.
  2. Determine what problems you have that you think AI can help solve.
  3. Test out public models like Chat GPT to get familiar with LLMs and prompt engineering.
    Examples: Summarize video and meeting transcripts, create a table from a paragraph, create a page summary for existing content
  4. Familiarize yourself with the terminology associated with LLM and generative AI, and the practices of others in your industry.
  5. Understand RAG (Retrieval Augmented Generation) and its use in AI.
How can you prepare your content for AI?

Authoring content following Expert best practices, including guided content framework, is a good start.

When a word is used across your site in different ways, it can confuse the LLM. Understanding where GenAI might have difficulty with your content will empower you to write clearer persona prompts, to reduce the misunderstandings between your customers and the LLM. The first place to look when you do not get the answers a customer expects is in the content.

Take Google for example; Google defines all their products as Google [Product]: Google Maps, Google Search, Google Drive, and so on. When a consumer asks a question like "What is Google?" and the desired answer they are looking for is not specifically called out in the content, they are less likely to obtain that answer in a GenResponse. 

Assume they expect the following answer: "Google is a large company that powers the largest percentage of search traffic on the web" but this answer was nowhere on their Expert site. The result they obtain could be something about all Google products / services. For example: "Google is a company that has Maps, Drive, Search, and various other products." That answer is not wrong, but it is not desirable in this example.

Can the model relearn if you give it updated content, or will it get confused?

We do not train the models; we use base models with Retrieval Techniques and prompt engineering on top of that. This way, only the most recent and relevant Expert content gets used in the generative responses.  When you make an update to content the update is reflected in near real-time much like native search is updated today.

How does Kernels handle content in an Archive folder?

Like the search experience, Kernels does not distinguish between archived and unarchived content.

Will GenAI know how to interpret content where there is version blending on a page?

No.  Content is semantically related from all pages for GenResponses, therefore content on the same page about different versions of a product or service will not be known to the models as a separate version.  The expected questions that users will ask of the content should be tested as context will determine what level of content refactoring may be required.

What kinds of content can Kernels work with?

Kernels supports text content, whether public or private, PDF attachments, and tables within pages. Authentication determines what content users can view. Text content that's rendered by dekiscript is available within Kernels, text content that's created by javascript is not available within Kernels.

PDFs can be added to a Kernel, and complex PDFs can return unexpected Kernel results.  For example, when a PDF has steps to complete a task.  Consider if step one asks the reader a question. If the answer is yes, it refers them to step 5, skipping 2-4.  This may provide unexpected results.

Do we use page summaries?

On pages that include the page overview DekiScript template, the summary is included in Kernels. The page overview template is added to page content by default. If it has been removed, add the Page Overview block to the page to index the page summary.

Can you exclude pages from the Kernels endpoint?

This is on the roadmap and will be delivered in a future release.

How we select and validate an LLM

How do we select a model?

Selecting an LLM is a complex decision matrix where trade-offs will often need to be made. The ideal model balances these criteria effectively while aligning closely with the specific goals and values of the project. Continuous monitoring and evaluation are necessary as models evolve and new ones emerge. There are several factors we use when selecting an LLM model:

  1. Performance: This refers to both the qualitative and quantitative aspects of an LLM's output:
    • Qualitatively, the model should generate coherent, contextually appropriate, and nuanced text.
    • Quantitatively, it should have low latency and high throughput to handle the required scale of operations.
  2. Cost: The economic aspect of using an LLM can be significant, especially when scaled up. Cost considerations include the direct expense of using or training the model, as well as the computational resources required for operation and maintenance. We conduct cost-benefit analysis to ensure the model's value justifies its expense.
  3. Security: Given the sensitive nature of data that LLMs might process, security is paramount. This includes data encryption at rest, data encryption in transit, and access controls. The model should also have robust mechanisms to prevent data leakage and ensure user privacy, and it must not collect data from usage.
  4. Model security: The model should resist attacks that could cause it to generate incorrect or harmful text.
  5. Scalability: The ability of an LLM to scale efficiently is crucial for handling growing data volumes and user requests without a significant reduction in performance or speed. Scalability also refers to the model's capacity to incorporate new data and adapt to different contexts without extensive retraining.
  6. Fairness and biases: It is essential to assess models for fairness and bias. We evaluate LLMs for their tendency to generate biased outputs, which could perpetuate stereotypes or discriminate against certain groups.
  7. Interoperability: The selected LLM should play well with other systems and technologies in the workflow and ecosystem. This includes easy integration with existing databases, software, and APIs; as well as compatibility with various data formats.
  8. Ethical Considerations: The model should adhere to ethical guidelines for AI use, ensuring that its deployment does not cause harm or adverse societal impacts.
  9. Robustness: The model should be robust to minor input variations. We test model sensitivity against factors like capitalization, punctuation, typos, and noisy neighbors (resource allocation between tenants of various sizes).
Do we use models that have been certified by a standards body?

When we started our efforts, few standards existed in the industry. We continue to monitor this and will adopt standards when they become available and where they make sense.

How do we control for prompt injection?

For our completions endpoint we use prompt engineering, which is applied to each query when the completions endpoint is called. We also ensure the LLM does not remember the conversation, so each query is independent of any other query. Further, we are planning to employ an ethical hacking service for an external perspective. Finally, we periodically test many of the common prompt injection hacks to assess and address potential vulnerabilities.

How do we manage or reduce hallucinations?
  • Prompts and persona engineering: GenSearch can better understand the context you are working with and the expectations visitors to your site will have when asking questions.
  • Using your Expert content as the primary source for the GenResponses enables the highest likelihood the base models will not invoke any data that is not from your Expert content.
  • Using RAG: RAG (Retrieval Augmented Generation) is used industry-wide as a strategy for more desirable responses.
  • No retrieval, no answer: Without relevant kernels, GenSearch will not generate a response. 

Performance, optimization, and methodology

What are our response times and how are they impacted?
  • Our average response time for Kernels (text chunking) is 500ms.
  • Our average response time for Completions (natural language output) is in the process of being tested.
  • The response length is the biggest factor.
How can you ensure generative searches are good?
  • Follow our GenSearch content best practices.
  • Use persona prompting to establish a personality for your search.
  • Without relevant kernels, GenSearch will not generate a response. This prevents it from relying on previous training to provide an irrelevant or incorrect answer.
Do we use RAG?

Yes, Expert Generative Search is a Retrieval Augmented Generation (RAG) system.

How are we better than ChatGPT or other off-the-shelf LLM tools?

Expert uses your site content and customized persona information. It also respects content permissions so responses are tailored to your customer base and business needs, unlike publicly available LLM and AI tools. This enables you to control the information site visitors can find, and guide their experiences to lead to an optimal user experience.

Permissions and privacy

How are IDP's setup and managed?

All Copilot customers MUST deploy CXone as the IDP for Expert. If customers have non-agent users, an IDP such as Azure, OKTA, etc., can also be set up to integrate with Expert.

Customers who are Expert only customers can follow the existing paradigm for deploying IDP integrations.

An existing Expert customer that adopts Copilot after being live on Expert first and using another IDP (Azure, OKTA etc.) can migrate the users from an existing IDP to CXone as the IDP.  The Expert Support team can facilitate this migration.

How do we manage page permissions?

Permissions follow the same rules that the existing Expert search functionality uses. Users can only see content (Kernels) they have permission to view.

How do you protect customers' privacy?

Expert security standards and practices are applied to all aspects of the platform, including generative search.

How do you ensure customer data is secure and not shared or commingled with other customer data?

Expert security standards and practices are applied to all aspects of the platform, including generative search.

What do we log, audit, or otherwise retain about usage?

An event log is maintained which contains:

  • The date / time a request was made
  • Who made the request
  • The request query.

The responses will be saved and made available via an API for auditing purposes (further version).

Functionality

Are there advanced options for generative search?

While it was considered for Kernels and generative search, the technology behind semantic and lucene searches are fundamentally different, so advanced search will not be available at this time.

Will Expert have extractive search and keyword suggestions?

We are considering those and other features following the launch of GenSearch.

Can you make your own LLM using your Expert content?

Yes, integrating your preferred chat experience to the Kernels API endpoint enables you to serve up generative responses to users.

What are consumers, customers, and guests?

It is important to be mindful of the distinction between these groups in the context of generative responses and terminology in your content.  Generally, this is how the Expert Product Team describes our constituents.

  • Customers: Expert customers
  • Consumers: The customers of Expert customers
  • Guests: An example of how a customer might refer to their consumers
Do you index content from other non-Expert systems?

Currently, we do not.  We are working on a future feature to import content more easily which would enable you to bring content into Expert faster.  Beyond that, we have concluded that if we index other content, ownership is implied, and other sources present a huge complication for updates and troubleshooting.  This is akin to the Federated Search set of problems.

 

  • Was this article helpful?