Evaluating Responses

Updated Jan 01, 2025 ·

Overview

ChatGPT has a knowledge base with a cutoff date, meaning it knows information only up to that point. For new information beyond this date, ChatGPT might not provide accurate answers.

To manage this, you can prompt ChatGPT to specify when it doesn't know an answer due to this limitation. While not perfect, this approach highlights the need for cross-referencing data.

Four Cornerstones of Evaluation

Use the acronym LARF to assess ChatGPT's responses:

Logical consistency
Accuracy
Relevance
Factual correctness

These help critically evaluate outputs.

Logical Consistency

Responses should be coherent.

Example:

Listing "minimal maintenance" as both a benefit and drawback of solar energy shows inconsistency.

Ensure responses make logical sense.

Accuracy and Hallucination Tendency

ChatGPT can provide confident but incorrect answers.

Verify facts to avoid inaccuracies.
Example: If asked,
```
Who was the first person to walk on the moon?
```
and it answers "Buzz Aldrin," it’s wrong. The correct answer is Neil Armstrong.

Relevance

Responses should align with the context and intent of the prompt.

Example: If asked about tourist attractions in Paris and ChatGPT includes the Great Wall of China, it’s irrelevant.
Ensure responses are pertinent to the question.

Factual Correctness

Encourage factual correctness by asking for references from reliable sources.

Use ChatGPT Plus's browser capability for post-cutoff events to get up-to-date information.
Always verify answers, especially for recent developments.

Overview​

Four Cornerstones of Evaluation​

Logical Consistency​

Accuracy and Hallucination Tendency​

Relevance​

Factual Correctness​