How AI is Learning to Think (Almost) Like Us

avatar
Kanishka Prakash
Clock icon
8 min read
Calendar icon
October 8, 2024

If you've ever tried teaching someone how to cook, you know that "just do it" instructions don’t cut it. You need to walk them through each step – the ingredients, the process, the little tricks you picked up along the way. The same principle applies to how AI models are evolving today. On that note, let's weigh in on the talk of the town – OpenAI’s new o1 Model which uses "Chain of Thought" (CoT) – a breakthrough in how AI systems tackle complex problems by mimicking the same step-by-step thinking process humans go through.

It’s not just about AI curating better solutions, but doing so with a whole new level of transparency and reliability. Sounds great, right? But it’s not time to pop the champagne yet. Let’s first dissect the good, the bad, and the downright tricky aspects of this new AI technique.

The Power of Chain of Thought (CoT) Lies In Thoughtfulness

At its core CoT reasoning enables AI to emulate human-like reasoning by breaking down complex problems into a series of logical steps, almost as if the model is ‘thinking out loud.’ Instead of jumping to conclusions (as AI often does), CoT forces the model to take a more deliberate and structured approach, akin to how humans tackle problems by dissecting them into manageable parts. This cognitive strategy not only reflects a fundamental aspect of human intelligence but also provides a structured mechanism for problem-solving, reducing the likelihood of errors.

Power of Chain of Thought (CoT)

Power of Chain of Thought (CoT)

From a business perspective, this is a game-changer. In industries like finance or healthcare, where decisions need to be rock-solid, having an AI that methodically works through problems rather than shooting out answers reduces mistakes, enhances transparency, and most importantly, mitigates the risk of AI hallucinations – those wonderfully creative but highly inaccurate answers that AI sometimes confidently produces.

With the recent release of OpenAI’s o1-preview and o1-mini models, the CoT methodology has taken center stage. These models, launched to select paying users, showcase a marked improvement in reasoning and mathematical problem-solving abilities over their predecessors. Early benchmarks show that o1 scored an impressive 83.3% on the challenging AIME mathematics competition, compared to just 13.4% for GPT-4o. In PhD-level science questions, o1 answered 78% correctly, surpassing not only GPT-4o’s 56.1% but also the 69.7% accuracy of human experts.

This advancement could prove transformative across sectors like chemistry, physics, and engineering, where complex calculations and problem-solving are crucial. Beyond education, better reasoning and problem-solving skills are also essential in organizations as they aim to build AI agents capable of sophisticated tasks – whether that’s performing analyses or automating complex processes.

Safety and Trust Through Double-Checking

One of the most crucial aspects of CoT is the built-in safety mechanism. OpenAI’s o1 model has integrated a real-time ‘double-checking’ process to reduce the infamous AI hallucinations. This might sound like a minor tweak, but it’s a massive stride toward reliable AI systems, particularly when dealing with high-stakes scenarios where bad advice isn’t just inconvenient, it’s outright dangerous.

Think about it: in a world where AI is increasingly woven into decision-making processes, ensuring that the model isn’t biased or spitting out inaccurate, ethically dubious outputs is vital. And while Chain of Thought helps in curbing those risks, there’s still the chance that the model could stumble. As evidenced in some tests, while o1-preview excelled at solving complex math problems, AI critic Gary Marcus highlighted, “As you acknowledge o1 is still unreliable even at tic-tac-toe, and in some cases no better than earlier models.” This highlights the limitations of the model’s reasoning abilities and serves as a reminder that while CoT can elevate AI’s performance, it’s far from perfect.

As a result, the AI safety conversation shifts from "Will it get the answer right?" to "Can we trust how it arrived at the answer?" Transparency is the new gold standard here, and businesses need to start asking how they can integrate this kind of responsible AI into their own workflows.

The Downside : AI Thinking Isn’t Always Pretty

Advancements like OpenAI's o1 model showcase impressive autonomous problem-solving but raise significant control and safety concerns. For example, an AI assistant using the o1 model bypassed user protocols by directly accessing data from a social media company's API without approval. Such autonomy can lead AI systems to operate outside intended parameters, posing ethical, legal, and financial risks, including "reward hacking”, where AI finds unintended ways to achieve goals while circumventing safety guidelines.

AI and its Hidden Risks

AI and its Hidden Risks

Practical limitations also exist. The o1 model's slower response time—up to 30 seconds compared to mere seconds for models like GPT-4o – can be a drawback for certain applications. Increased computational demands lead to higher costs, limiting its use to scenarios where enhanced reasoning justifies the expense. As AI systems become more powerful, implementing stringent safety measures and maintaining rigorous oversight is essential to prevent misuse or unintended consequences.

The Opportunities With Thoughtful AI

That said, let’s not forget that the opportunities here are immense. CoT opens doors to applications we couldn’t have tackled before. Think of high-stakes decision-making, where AI’s ability to explain its thought process can provide a level of assurance previously missing. Financial institutions, legal firms, healthcare providers and even the government will find significant value in this approach, particularly when combined with Reinforcement Learning Expert Feedback (RLEF) to ensure AI is making decisions that align with human experts’ insights.

Moreover, AI’s ability to engage in Chain of Thought reasoning could usher in a new wave of AI governance and ethics frameworks such as our Govern, Guide and Control(GGC) framework. By making AI’s decision-making process more visible (even if partially filtered), we can better assess whether the model is behaving responsibly and ethically. This is where the real impact lies.

The Final Thought (Pun Intended)

So, where does this leave us? Chain of Thought is not just a tech-update; it’s a fundamental shift in how AI systems approach complex problems. It’s a step toward more reliable, transparent, and thoughtful AI. But as business leaders, we can’t be passive observers. We need to actively engage with these systems, providing feedback, supervision, and most importantly, a clear framework to guide them.

AI may be thinking better, but it’s up to us to ensure it’s thinking responsibly. We’re not just handing the reins to AI – we’re riding alongside it!