LLM terminology
ChatGPT: A specific large language model (LLM) created by OpenAI. ChatGPT is trained to engage in open-ended conversations and provide responsive answers while adhering to guidelines around being helpful, harmless, and honest. Its training data and model architecture allow it to draw insights, explain complex topics, and even engage in creative writing tasks.
Chunk (or text chunk): A semantically meaningful section of a content page that can be processed separately by an LLM (for example, a sentence or paragraph). Breaking down large inputs and outputs into easily manageable pieces helps models understand and maintain context / coherence over longer texts.
Expert Kernels: An LLM-ready knowledge delivery solution based on content within your Expert site.
Completion: The LLM output from a prompt input.
Completions endpoint: Powers Expert generative search.
Content gaps: Where the LLM lacks the necessary information or context to provide a satisfactory response. Identifying and filling these gaps is an important part of improving performance.
Embedding: The processing of converting text chunks from a content page into numerical vectors that represent the meaning of the text. LLMs map this data in a way that preserves semantic relationships, so that similar words / sentences are close together in the vector space. This is why context is more helpful than keywords when readying your content for AI.
Fine-tuning: Taking a pre-trained language model and training it on a specific task or domain-specific dataset. This leads to the LLM adapting its knowledge and improving performance for particular use cases.
For Expert Kernels, instead of having to fine-tune the LLM further, we use prompt and persona engineering to achieve similar outcomes in much less time. This also allows Kernels to be flexible for our customers, as it is not bound by a particular use case or customer need, and can be tailored to each customer individually.
Generative AI (GenAI): The broader field of AI systems that can create new data outputs, including text, image, and video outputs. Large language models (LLM) are a type of generative AI.
GPT tokens: A unit that measures the input and output of an LLM. Longer prompts and inputs consume more tokens, which can slow down processing time and increase the operational costs of LLM-based features.
Indexing: Updating or incorporating new content into the LLM, allowing it to generate responses based on the updated information.
Large Language Model (LLM): A type of artificial intelligence model that is trained on a large amount of text data to understand and generate human-like language.
Language generation: Using language models to generate human-like text, often for tasks like translation or summarization.
Model: A statistical representation learned by the system during training. For LLMs, the model is a massive neural network trained on an expansive amount of text data to understand and generate human-like language.
Persona: The personality or character traits assigned to an LLM to make it more human-like and engaging. Creating a persona document is recommended to outline aspects like tone, voice, and how the bot should interact with users.
Prompt: Specific input text that instructs or guides a GenAI model in its output generation process.
Prompt engineering: Crafting prompts or instructions for an LLM, to guide its behavior and responses. This includes:
- Providing context
- Defining the tone / personality
- Restricting the model to specific content or knowledge domains
RAG (Retrieval-Augmented Generation): A technique used in LLMs that combines retrieval from a knowledge base with generative language modeling. It allows the LLM to retrieve relevant information from a knowledge base and then generate text based on that information. This is the technique Expert uses.
Search queries: Questions or phrases people use to search for information. Analyzing these queries can help us understand how users phrase their questions, in order to improve content and the LLM's ability to interpret and respond to customer queries.
Temperature: Controls the randomness and perceived "creativity" of the LLM's outputs. A lower temperature of 0.5-0.6 generates more focused and predictable outputs, while a higher temperature (between 0.7-0.9) allows for more variation and what we humans consider ad-libbing by the AI. More freedom and variation increases the likelihood that the LLM will reference its pre-learning to generate an answer, which can lead to more hallucinations. A temperature between 0.7 and 0.9 helps balance between stilted language and referencing content that you cannot control, such as information from the Web or competitor sites.
Threshold: The relevancy distance between kernels in the vector database. The more closely related kernels are, the smaller their threshold will be.
Token: A basic unit that language models work with, representing words, parts of words, or characters. Tokens quantify the input length and output length. "GPT tokens" refers to these fundamental units that the model operates on.