What is indexed for GenSearch?
What is a Kernel?
A Kernel, also known as a chunk or text chunk, is a semantically meaningful section of an article that is processed by an LLM .A Kernel can be a word, a sentence, or a paragraph. Kernels enable your site to be broken down into smaller chunks of semantically related content that are used to form a generative response.
When using traditional search, content is partitioned by page; with GenSearch, kernels of related content are pulled together and sent to the LLM to form a response.
What does it mean to index a site for GenSearch?
The process of converting a site to Kernels is called "indexing." This includes updating or incorporating new content into the LLM, which allows the tool to generate responses based on the most relevant and recently updated information.
What kind of content is indexed?
Kernels supports public or private text content, PDF attachments (up to 5MB) , and tables within pages. Authentication determines what content users can view. Text content that is rendered by DekiScript is available within Kernels. Text content created by JavaScript is not available within Kernels.
PDFs can be added to a Kernel, and complex PDFs can return unexpected Kernel results. For example, PDFs with steps to complete a task, or if a step asks the user a question and guides them to a different step or section depending on the answer.
Are page titles indexed?
Yes, page titles are indexed by default.
Are page summaries indexed?
On pages that include the page overview DekiScript template, the summary is included in Kernels. The page overview template is added to page content by default. If it has been removed, add the Page Overview block to the page to index the page summary.
Is Content Reuse indexed?
Yes. For each unique instance of Content Reuse, distinct Kernels are created for each page, including the Source page and any pages that reuse the content.
We have a full breakdown of how indexing is handled for Content Reuse.
When is reused content re-indexed?
- When the source of Content Reuse is updated, the associated Kernels for that page will be updated.
- For instances of content reuse, the associated Kernels will not be updated until there is an update to the page where the content is reused.
- Use the Content Reuse audit report to identify pages that need to be updated once source content has been modified.
- We have a full breakdown of how indexing is handled for Content Reuse.
Will GenSearch know how to interpret content where there is version blending on a page?
No. Content is semantically related from all pages for GenResponses, so content on the same page about different versions of a product or service will not be recognized by the LLM as a separate version. Test the questions you expect users will ask, as context will determine what level of content editing may be required.
Is content in an accordion / expandable lists indexed?
Yes, content nested within expandable lists is indexed.
Is alt text for images indexed?
No, alt text is not indexed.
Is content in an Archive folder indexed?
Like the search experience, Kernels does not distinguish between archived and unarchived content. To remove content from GenSearch responses, add the llm-no-index
tag to the page, or use the Page Classification Manager to apply the tag within a hierarchy.
Can pages be excluded from being indexed?
Yes. Add the tag llm-no-index
to the page, or use the Page Classification Manager to apply the within the hierarchy.
How often is content re-indexed?
When you update content, the update is reflected in near real-time, just like traditional search is updated.
Do you index content from other non-Expert systems?
No. The HTML Content Importer enables you to import non-Expert content faster.