RAGAS
The RAGAS metric is the average of four distinct metrics:
RAGASAnswerRelevancyMetricRAGASFaithfulnessMetricRAGASContextualPrecisionMetricRAGASContextualRecallMetric
It provides a score to holistically evaluate of your RAG pipeline's generator and retriever.
note
The RAGASMetric, although similar to deepeval's default RAG metrics, is not capable of generating a reason.
Required Arguments
To use the RagasMetric, you'll have to provide the following arguments when creating an LLMTestCase:
inputactual_outputexpected_outputretrieval_context
Example
from deepeval import evaluate
from deepeval.metrics.ragas import RagasMetric
from deepeval.test_case import LLMTestCase
# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."
# Replace this with the expected output from your RAG generator
expected_output = "You are eligible for a 30 day full refund at no extra cost."
# Replace this with the actual retrieved context from your RAG pipeline
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]
metric = RagasMetric(threshold=0.5, model="gpt-3.5-turbo")
test_case = LLMTestCase(
input="What if these shoes don't fit?",
actual_output=actual_output,
expected_output=expected_output,
retrieval_context=retrieval_context
)
metric.measure(test_case)
print(metric.score)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
There are three optional parameters when creating a RagasMetric:
- [Optional]
threshold: a float representing the minimum passing threshold, defaulted to 0.5. - [Optional]
model: a string specifying which of OpenAI's GPT models to use, OR any one of langchain's chat models of typeBaseChatModel. Defaulted to 'gpt-3.5-turbo'. - [Optional]
embeddings: any one of langchain's embedding models of typeEmbeddings. Customembeddingsprovided to theRagasMetricwill only be used in theRAGASAnswerRelevancyMetric, since it is the only metric that requires embeddings for calculating cosine similarity.
info
You can also choose to import and execute each metric individually:
from deepeval.metrics.ragas import RAGASAnswerRelevancyMetric
from deepeval.metrics.ragas import RAGASFaithfulnessMetric
from deepeval.metrics.ragas import RAGASContextualRecallMetric
from deepeval.metrics.ragas import RAGASContextualPrecisionMetric
These metrics accept the same arguments as the RagasMetric.