LAZYLM - A FRAMEWORK FOR FOUNDATIONAL MODELS TO “LAZILY” EVALUATE REASONING TRACE

avatar
Gianni Crivello, Customer Engineer, AI/ML
Clock icon
6 min read
Calendar icon
September 5, 2024

In the dynamic field of AI, large language models (LLMs) have become crucial for a variety of applications, including content creation and problem-solving. We’ve seen widespread adoption of language models as “assistants” in various domains, such as healthcare, education, and content creation. However, in educational contexts, these models often fall short of providing an optimal learning experience. They tend to generate complete solutions upfront, robbing students of the opportunity to engage in the step-by-step reasoning process that is crucial for deep understanding.

This raises an important question – Can we get these systems to evaluate in a ‘Socratic’ way?

In other words, can we develop systems that encourage students to think step-by-step, rather than generating complete solutions upfront?

guided step by step learning

While there has been significant work on making LLMs reason step-by-step (e.g., chain-of-thought prompting), to our knowledge, there isn’t an existing framework to build systems that allow users to think through problems step-by-step while having the LLM assist in a pedagogical way. This is where the concept of “lazy evaluation” in language models comes into play, offering a more Socratic approach to AI-assisted tasks.

This work demonstrates an approach to building a framework for forcing models to evaluate in a lazy way, drawing inspiration from functional programming concepts.

Observations That Led To Implementing Lazy Evaluation In Language Models

Pedagogical Effectiveness:

Traditional tutoring methods often involve guiding students through problems step-by-step, allowing them to think critically and make connections on their own. AI tutors should aim to replicate this approach rather than simply providing answers.

Resource Efficiency:

Generating complete solutions upfront is computationally expensive, especially for complex problems. A lazy evaluation approach can significantly reduce resource usage by generating only the necessary information on demand.

Adaptability:

Students have varying levels of understanding and may require different amounts of guidance. A lazy evaluation system can adapt to each student's needs, providing more or less detail as required.

Engagement:

By revealing information gradually, we can maintain student engagement and encourage active participation in the problem-solving process.

Real-world Problem Solving:

In many real-world scenarios, solutions are not immediately apparent and must be approached incrementally. Training students to think in this way prepares them for challenges beyond the classroom.

Installing LazyLM

pip install lazy_lm

LazyLM

from dotenv import load_dotenv 
import os from anthropic 
import AnthropicVertex from lazy_lm.core 
import lazy 

load_dotenv() 
project_id = os.getenv(“PROJECT_ID”) 
location = os.getenv(“PROJECT_LOCATION”) 

# Initialize the Anthropic client 
client = AnthropicVertex(project_id=project_id, region=location) 
lazy_lm = client.lazy(“What is the derivative of `2x^3 + x^2 + 2x + 1`? Give me the solution step-by-step”) 

# Get the current step 
print(lazy_lm.get_current_step()) 
"""
What is the derivative of `2x^3 + x^2 + 2x + 1`? Give me the solution step-by-step 
"""

# Get the next step 
print(lazy_lm.get_next_step()) 
"""
To find the derivative of the given function, we’ll use the power rule and the constant rule of differentiation. Let’s start with the first term: 
Step 1: Find the derivative of 2x^3 The power rule states that for a term ax^n, the derivative is nax^(n-1). 
For 2x^3, we have: 
  a = 2, n = 3 
  So, the derivative of 2x^ 
"""

# Query the current step 
print(lazy_lm.ask_question(“I don’t understand this step”)) 
"""
I apologize for any confusion. I’d be happy to explain this step in more detail without advancing to the next step. 

In this step, we’re focusing on finding the derivative of the first term in the given expression, which is 2x^3. 

To do this, we're using the power rule of differentiation. The power rule states that for a term in the form ax^n (where 'a' is a constant and 'n' is the power) 
"""

# Get the next step 
print(lazy_lm.get_next_step()) 
"""
Step 2: Complete the derivative of 2x^3 

Continuing from the previous step, we apply the power rule to 2x^3: 
  
The derivative of 2x^3 is: 
3 · 2x^(3-1) = 3 · 2x^2 = 6x^2
"""

A Note On Lazy Evaluation In Programming

The concept of lazy evaluation is well-established in functional programming languages, where the evaluation of an expression is only done when the value (or terminal) of the expression is needed. This is also known as call-by-need. The contrast of this evaluation strategy is what's called "eager" evaluation or strict evaluation. Eager evaluation evaluates all of the subexpressions of an expression regardless of whether the value is used or not.

For example the expression:

func :: int -> int -> int 
func a b = a

will always just pass back the first argument to the function

>> func (2+2) 100 
4

In an eager (or strict) language like python. the evaluation of that function would look something like this

func((2+2), (100-1000000)) 
func(4. (100-1000000)) 
func(4. 10000000000000000000000000000.....) 
4

That is a lot of wasted computation just for a function that returns the first argument!

Programming languages that support lazy evaluation have an evaluation strategy that looks more like this:

func((2+2). (100-1000000)) 
(2+2) 
4

Our language "knows" that we don't need the second argument, so why bother evaluating it?

Lazy Evaluation In The Context Of Language Models

Language models are inherently eager: given a prompt, they will continue to generate tokens until reaching an end-of-sequence token. However, this behavior is not always desirable, especially in educational contexts.

For example, given a prompt such as:

"What is the derivative of 2x^3 + x^2 + 2x + 1? Give me the solution step-by-step"

A language model will generate the entire sequence of steps in one go. Something like this: Given function: f(x) = 2x^3 + x^2 + 2x + 1

Step 1: Differentiate each term separately using the power rule and constant rule.

The power rule states that the derivative of x^n is nx^(n-1).

The constant rule states that the derivative of a constant is 0.

a) Differentiate 2x^3:

d/dx(2x^ ^ 3)=2^ * 3x ^ (3 - 1) = 6x ^ 2

b) Differentiate x^2:

d/dx (x ^ 2) = 2x ^ (2 - 1) = 2x

c) Differentiate 2x:

d/dx (2x) = 2

d) Differentiate 1:

d/dx (1) = 0

Step 2: Combine the results from each term.

f' * (x) = 6x ^ 2 + 2x + 2 + 0

Step 3: Simplify the expression.

f' * (x) = 6x ^ 2 + 2x + 2

Therefore, the derivative of 2x ^ 3 + x ^ 2 + 2x + 1 is 6x ^ 2 + 2x + 2

In the context of creating an application that can assist students with learning this content, a full trace will likely be suboptimal in facilitating a productive learning environment.

More abstractly, we can frame the problems as a problem initialization, a sequence of steps, and the final solution.

sp_1 -> sp_2 -> sp_3 -> ... -> sp_n, where sp_i is the sub problem at step i for problem p

With some light definition in place we can frame the desired evaluation strategy like this:

past(sp_1) -> curr(sp_2) -> future(sp_3 -> ... -> sp_n)

where past are the steps that have already been evaluated, curr is the current step that has been evaluated, and future are all of the future steps to be evaluated.

To a language model, this is all just token sequences.

TokenSequence_1 -> TokenSequence_2 -> TokenSequence_3 ->  ...  -> TokenSequence_n
|    sp_1     |    |    sp_2     |    |    sp_3     |     ...     |    sp_n     |

Leveraging a model's KV cache for memoizing the compute done on TokenSequences that have already been computed we can frame our evaluation strategy to look much more "lazy".

memoized (TokenSequence_1 -> TokenSequence_2) -> TokenSequence_3 -> future (TokenSequence_4 -> ... -> TokenSequence_n)  

where we can get roughly the desired evaluation strategy having the LLM compute the next sequence it samples as a sub problem and nothing more.

Check out the repo to try it out yourself:Click here