Evaluating Responses
Overview
ChatGPT has a knowledge base with a cutoff date, meaning it knows information only up to that point. For new information beyond this date, ChatGPT might not provide accurate answers.
To manage this, you can prompt ChatGPT to specify when it doesn't know an answer due to this limitation. While not perfect, this approach highlights the need for cross-referencing data.
Four Cornerstones of Evaluation
Use the acronym LARF to assess ChatGPT's responses:
- Logical consistency
- Accuracy
- Relevance
- Factual correctness
These help critically evaluate outputs.
Logical Consistency
Responses should be coherent.
-
Example:
Listing "minimal maintenance" as both a benefit and drawback of solar energy shows inconsistency.
-
Ensure responses make logical sense.
Accuracy and Hallucination Tendency
ChatGPT can provide confident but incorrect answers.
-
Verify facts to avoid inaccuracies.
-
Example: If asked,
Who was the first person to walk on the moon?
and it answers "Buzz Aldrin," it’s wrong. The correct answer is Neil Armstrong.
Relevance
Responses should align with the context and intent of the prompt.
- Example: If asked about tourist attractions in Paris and ChatGPT includes the Great Wall of China, it’s irrelevant.
- Ensure responses are pertinent to the question.
Factual Correctness
Encourage factual correctness by asking for references from reliable sources.
- Use ChatGPT Plus's browser capability for post-cutoff events to get up-to-date information.
- Always verify answers, especially for recent developments.