Capturing the scope and behavior of LLMs

Emily Uematsu Banzhaf
August 8, 2023
in Article, Guides

Capturing the scope and behavior of LLMs https://vux.world/wp-content/uploads/scope-and-behavior-of-LLMs.png 1456 816 Emily Uematsu Banzhaf Emily Uematsu Banzhaf https://secure.gravatar.com/avatar/687bacc470af13994601365306ddda17?s=96&d=blank&r=g August 8, 2023 August 25, 2023

A hot topic among conversation designers and others is how to capture the scope and behavior of LLMs. While there is a need to create a design language and to constantly test and document prompts in tools like Voiceflow, there’s another layer of documentation that needs to occur before we can start defining patterns and creating standards and best practices.

Until we become more familiar with the intricacies of the models, capturing the scope and behavior of LLMs is done through trial and error, rigorous testing and documentation, and understanding the variables that make up a successful prompt and prompt output. It all goes back to Maaike Groenewege’s point at Unparsed 2023 that the best output comes down to content optimization.

Prompt documentation and tracking

I’ve seen a few tools like Vidura that track prompts, but what I’m not seeing is a way to track the output behavior. When you change one variable in a prompt, the outcome of the entire experience can change, so it’s really important to document every single detail. This way, it’s easier to track what you’ve tried before, pinpoint exactly how to fix a prompt that might not respond the way you want it to, and mix and match the best parts of each prompt.

Prompt variables

Here are the types of variables to test and document each time you change or add a prompt (these will vary depending on the tool you use):

Goal/Task
Prompt
System prompt (if applicable)
Data source (e.g. AI model or knowledge base)
Model type (e.g. GPT-3, ChatGPT, GPT-4, Claude 2, Llama 2)
Knowledge base (e.g. text, documents, URLs)
Temperature
Tokens
Prompt response type (if applicable) (e.g. prompt, memory, memory and prompt)
Expected behavior or output
Examples of expected behavior or output
Behavior and output considerations (e.g. hallucinations, sensitive data)
What worked well about the prompt output
What could be improved

What we ideally need is a way to track these variables within a tool like Voiceflow or with a tool like Airtable that allows you to filter, color-code, search, compare, and track changes. This will help with scalability and cross-functional collaboration. But until we have specific tools for prompt management and documentation, FigJam‘s been working for me so far. I started with a spreadsheet, but FigJam was a lot faster and easier.

This is very detailed and intensive work, but it has to be done until we understand how to create consistent behavior and output.

Prompt variable spreadsheet template

I created an initial spreadsheet template to demonstrate what I’m envisioning. This is not scalable and only accounts for individual prompts, but it’s a starting point to build on. If you want to use it, please feel free to make a copy and modify it based on your specific needs!

AI Prompt Management Template

Working document of prompt variables and definitions

For those of you that may not be familiar with some of these terms mentioned above, I’ve created an initial working document that defines all the prompt variables mentioned. These definitions are based on my research and understanding of them, so if anything is inaccurate or if I’m missing anything, please let me know and I will adjust it.

Prompt Variables and Definitions

I hope this is helpful! Please reach out on LinkedIn if you have any questions.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
resolution	session	This is a functionality cookie used to collect the horizontal value of the visitor screen resolution. It helps in optimizing the website view to the user.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111445333_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
ajs_anonymous_id	never	This cookie is set by Segment.io to check the number of ew and returning visitors to the website.
CONSENT	16 years 2 months 25 days 18 hours	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.
__smVID	1 month	This cookie is set by Sumo. The purpose of the cookie is not yet known.
_mailmunch_visitor_id	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
AnalyticsSyncHistory	1 month	No description
attribution_user_id	1 year	This cookie is set by the provider Typeform. This cookie is used for Typeform usage statistics. It is used in context with the website's pop-up questionnaires and messengering.
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
debug	never	No description available.
intercom-id-or0x2acp	8 months 26 days 1 hour	No description
intercom-session-or0x2acp	7 days	No description
li_gc	2 years	No description
li_sugr	3 months	No description available.
mailmunch_second_pageview	never	This cookie is set by MailMunch which is email collection and email marketing platform. We do not know the exact purpose of the cookie.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

Capturing the scope and behavior of LLMs

Prompt documentation and tracking

Prompt variables

Prompt variable spreadsheet template

Working document of prompt variables and definitions

Your bot’s really an employee with superhuman responsibilities

How Decathlon made messaging matter with a little help from AI