Contextual Precision

The contextual precision metric measures your RAG pipeline's retriever by evaluating whether nodes in your retrieval_context that are relevant to the given input are ranked higher than irrelevant ones. deepeval's contextual precision metric is a self-explaining LLM-Eval, meaning it outputs a reason for its metric score.

Required Arguments

To use the ContextualPrecisionMetric, you'll have to provide the following arguments when creating an LLMTestCase:

input
actual_output
expected_output
retrieval_context

Example

from deepeval import evaluate
from deepeval.metrics import ContextualPrecisionMetric
from deepeval.test_case import LLMTestCase

# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."

# Replace this with the expected output from your RAG generator
expected_output = "You are eligible for a 30 day full refund at no extra cost."

# Replace this with the actual retrieved context from your RAG pipeline
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]

metric = ContextualPrecisionMetric(
    threshold=0.7,
    model="gpt-4",
    include_reason=True
)
test_case = LLMTestCase(
    input="What if these shoes don't fit?",
    actual_output=actual_output,
    expected_output=expected_output,
    retrieval_context=retrieval_context
)

metric.measure(test_case)
print(metric.score)
print(metric.reason)

# or evaluate test cases in bulk
evaluate([test_case], [metric])

There are five optional parameters when creating a ContextualPrecisionMetric:

[Optional] threshold: a float representing the minimum passing threshold, defaulted to 0.5.
[Optional] model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of type DeepEvalBaseLLM. Defaulted to 'gpt-4-0125-preview'.
[Optional] include_reason: a boolean which when set to True, will include a reason for its evaluation score. Defaulted to True.
[Optional] strict_mode: a boolean which when set to True, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to False.
[Optional] async_mode: a boolean which when set to True, enables concurrent execution within the measure() method. Defaulted to True.

How Is It Calculated?

The ContextualPrecisionMetric score is calculated according to the following equation:

\text{Contextual Precision} = \frac{1}{\text{Number of Relevant Nodes}} \sum_{k=1}^{n} \left( \frac{\text{Number of Relevant Nodes Up to Position } k}{k} \times r_{k} \right)

info

k is the (i+1)^th node in the retrieval_context
n is the length of the retrieval_context
r_k is the binary relevance for the k^th node in the retrieval_context. r_k = 1 for nodes that are relevant, 0 if not.

The ContextualPrecisionMetric first uses an LLM to determine for each node in the retrieval_context whether it is relevant to the input based on information in the expected_output, before calculating the weighted cumulative precision as the contextual precision score. The weighted cumulative precision (WCP) is used because it:

Emphasizes on Top Results: WCP places a stronger emphasis on the relevance of top-ranked results. This emphasis is important because LLMs tend to give more attention to earlier nodes in the retrieval_context (which may cause downstream hallucination if nodes are ranked incorrectly).
Rewards Relevant Ordering: WCP can handle varying degrees of relevance (e.g., "highly relevant", "somewhat relevant", "not relevant"). This is in contrast to metrics like precision, which treats all retrieved nodes as equally important.

A higher contextual precision score represents a greater ability of the retrieval system to correctly rank relevant nodes higher in the retrieval_context.

Required Arguments​

Example​

How Is It Calculated?​

Required Arguments

Example

How Is It Calculated?