RAGAS
The RAGAS metric is the average of four distinct metrics:
RAGASAnswerRelevancyMetric
RAGASFaithfulnessMetric
RAGASContextualPrecisionMetric
RAGASContextualRecallMetric
It provides a score to holistically evaluate of your RAG pipeline's generator and retriever.
note
The RAGASMetric
, although similar to deepeval
's default RAG metrics, is not capable of generating a reason.
Required Arguments
To use the RagasMetric
, you'll have to provide the following arguments when creating an LLMTestCase
:
input
actual_output
expected_output
retrieval_context
Example
from deepeval import evaluate
from deepeval.metrics.ragas import RagasMetric
from deepeval.test_case import LLMTestCase
# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."
# Replace this with the expected output from your RAG generator
expected_output = "You are eligible for a 30 day full refund at no extra cost."
# Replace this with the actual retrieved context from your RAG pipeline
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]
metric = RagasMetric(threshold=0.5, model="gpt-3.5-turbo")
test_case = LLMTestCase(
input="What if these shoes don't fit?",
actual_output=actual_output,
expected_output=expected_output,
retrieval_context=retrieval_context
)
metric.measure(test_case)
print(metric.score)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
There are three optional parameters when creating a RagasMetric
:
- [Optional]
threshold
: a float representing the minimum passing threshold, defaulted to 0.5. - [Optional]
model
: a string specifying which of OpenAI's GPT models to use, OR any one of langchain's chat models of typeBaseChatModel
. Defaulted to 'gpt-3.5-turbo'. - [Optional]
embeddings
: any one of langchain's embedding models of typeEmbeddings
. Customembeddings
provided to theRagasMetric
will only be used in theRAGASAnswerRelevancyMetric
, since it is the only metric that requires embeddings for calculating cosine similarity.
info
You can also choose to import and execute each metric individually:
from deepeval.metrics.ragas import RAGASAnswerRelevancyMetric
from deepeval.metrics.ragas import RAGASFaithfulnessMetric
from deepeval.metrics.ragas import RAGASContextualRecallMetric
from deepeval.metrics.ragas import RAGASContextualPrecisionMetric
These metrics accept the same arguments as the RagasMetric
.