It’s been a week since the launch of ChatGPT, Open AI’s latest large language model (LLM), which has taken the AI industry by storm. Many have posited that it could replace Google search, even kill Google. Some have asked whether it could replace customer service agents. A few have questioned whether it will change the role or even replace conversation designers. While some think it’s the future, others think it’s just monkeys hitting keys or a grand illusion.
One of the things we have to remember is that, when things receive lots of attention (ChatGPT reached 1 million users in less than a week), and generate this much excitement, we tend to run away with ourselves. We tend to believe that this new shiny object is the future of everything and has limitless potential.
I’m not saying that ChatGPT doesn’t have potential, nor am I saying that I’m not impressed. It most certainly does, and I most certainly am. But there’s two sides to each coin. In some areas, it’s amazing. In others, it might always fall short.
So here, I wanted to share a few pieces that stuck out at me as I’ve been exploring ChatGPT this week, and I’ll share more of my thoughts and experiments later.
How does it fair having actual conversations?
The intention of ChatGPT is to be able to converse with a user in a conversational way. Maaike Groenewege has covered brilliantly its ability to hold a conversation, and within that conversation, demonstrate the ability to:
- Understand pronouns
- Remain consistent with the fact that it’s not a human
- Remember and manage context (I have a great story to share about its top 5 rappers, which I’ll get to in another post)
- Ground the conversation, always bringing it back to ‘how can I help you’
- Use logic
- Apologise for misleading users
Read Maaike’s full post here.
Is it Alexa 2.0?
Alex Cohen put together this Twitter thread of him using ChatGPT to help him figure out how much weight he’d need to loose per week to reach his target weight. It then was able to create a meal plan and subsequent shopping list as well as a (admittedly ropey) workout plan. This is ChatGPT acting as a personal assistant. Try asking the same sequence of questions to Alexa and see how it does.
Bret Kinsella compared ChatGPT to digital assistants and also to Google LaMDA, concluding that a more apt comparison should be between ChatGPT and digital assistants. Both are there to accomplish tasks. While digital assistants excel at providing answers to intents that are pre-specified, ChatGPT excels at providing answers that are not documented specifically or are more conceptual in nature.
Could this technology lead to the next generation of personal assistants?
Is ChatGPT better at SEO than a professional with 1-2 years experience? Maybe. Zain Kahn had it produce an SEO strategy for a website. Then, create an audit report for a website. Then, it was able to shine some light on the best ways to get backlinks and generate a target keyword list. Finally, it created a content plan and generated ideas for article titles. Zain believes the work to be of a standard achievable by an SEO professional with 2 years experience and a salary of $50k.
Whether you’d execute that SEO strategy remains to be seen, but there’s no doubt that ChatGPT is creative. It can come up with some pretty comical stuff. Check out this example of it giving you instructions on how to remove a peanut butter sandwich from a VCR in the style of the King James bible. A trend that many others seem to be trying.
Or check out my example of it writing a freestyle rap in the style of Eminem. It’s not perfect, but it produced some rhymes and had the right amount of syllables on each line.
So is it just monkeys hitting typewriters? Or is there more to it?
Perhaps my favourite piece is this one from Gary Marcus, shining a light on how ChatGPT can sound so brilliant one minute and so dumb the next. Gary gets into whether ChatGPT is simply a bunch of trained monkeys hammering away at the keys, with the human brain being the thing that assigns meaning to the responses, or whether there’s more to it.
The TL;DR is that, technically, it’s monkeys behind typewriters because it fundamentally doesn’t understand what it’s saying, but it’s more than that because all of its content is based fundamentally on language written by actual humans.
In another post, Bret Kinsella lists the things you should and shouldn’t expect from ChatGPT, highlighting some of the things that make it unique compared to GPT3, as well as some of its limitations: inaccuracies, potential offensive content, and no source information, being some of them.
Even though the Open AI team has tried to make it better than previous GPT models in this regard, it can still be coerced into breaking its own rules on not entertaining inappropriate requests. Rupert Tombs found a way of having it produce content on the health benefits of crushed glass, for example.
Now, it’s not that anyone would believe something as blatantly inaccurate as this, but it does show where potential cracks start to emerge. These inaccuracies have already led it to be banned from Stack Overflow. Some people were feeding questions to ChatGPT and posting the answers in the forum. In some cases, ChatGPT can actually debug code, but it’s not quite reliable enough yet.
What about enterprise usage?
Now, you might be wondering whether this, and other LLMs, have potential to be used in your business. Perhaps to make your digital assistant perform better. Peter Voss highlights the fundamental limitations in ChatGPT’s ability to work as a conversational assistant for enterprise use cases here.
Namely, it’s too unpredictable, can’t operate with real time data, or with custom knowledge. Nor can it complete complex and controlled actions such as updating multiple APIs, or be fully auditable. So we may have a long way to go before it has some real business use.
Developing our relationship with AI
The deeper part of all of this is what this technology does to our relationship with AI and where that can lead us in future.
Alberto Romero shares a interesting take on how ChatGPT (and general AI applications for that matter), could obfuscate some of its dumbness in future by using “well-designed guardrails, filters, and intrinsic conservativeness”.
The general idea is that, LLMs like this can produce nonsense. Once you discover that it can produce nonsense, you stop believing it to be reliable. Alberto asks; what if these systems, in areas where they produce nonsense, instead used guardrails and conservative responses to prevent it from showing us that nonsense?
In this case, prompts that would typically produce nonsense would instead generate responses that are more general and noncommittal. That would mean the edges and limitations of the system would be hidden and trust would eventually grow.
In essence, we’d be trusting a system that is still fundamentally flawed, but that’s good at hiding it. The impact of mass adoption of such technology remains to be seen.
There’s no doubt that LLMs will have a big impact on our world. From helping us spark creative ideas and produce creative output, to helping businesses be more efficient and deliver more effective experiences. They’ll certainly influence how we interact with technology and the world around us. And, while the future looks exciting and promising, let’s not forget that it’s very early days with these things. They’re not ready for prime time yet, and there’s some fundamental societal and ethical considerations to be weighed up before we do pull the trigger.