반응형
This post heavily relies on this lecture:
Two types of LLMs
- Base LLM
- Predicts next word based on text training data
- Instruction Tuned LLM
- Tries to follow instructions
- Fine-tune on instructions and good attempts at following those instructions
- Often further refined using RLHF technique
- RLHF: Reinforcement Learning with Human
- Trained to be Helpful, Honest, and Harmless
- Thus, likely to be less toxic than Base LLM
- Recommended to be used for practical usages
Guidelines for Prompting
- First Principle: Write clear and specific instructions
- Clear prompt doesn't mean short prompt
- Detailed tactics:
- Use delimiters to clearly indicate distinct parts of the input
- Delimiters can be anything like: ```, """, < >, <tag> </tag>, ---, etc
- Delimiters can also avoid prompt injections
- prompt injection: if a user is allowed to add some input into your prompt, they might give kind of conflicting instructions to the model that might make it follow the user's instructions rather than doing what you wanted it to do
- In other words, model can successfully distinguish the input part and the instruction part to avoid confusions
- Ask for structured output
- for example: JSON, HTML
- Ask the model to check whether conditions are satisfied
- Example: If the text does not contain a sequence of instructions, then simply write "No steps provided".
- "Few-shot" prompting
- Use delimiters to clearly indicate distinct parts of the input
- Second Principle: Give the model time to THINK
- Detailed tactics:
- Specify the steps required to complete a task
- Instruct the model to work out its own solution before rushing to a conclusion
- When we simply provide sample answer and ask if that is correct, the model might skim read it and simply say that it is correct without fully thinking about the answer
- Therefore, we should first ask to draw its own solution first and then make the model compare its own and the sample answer
- Detailed tactics:
Model limitations: Hallucinations
- Even though large language models are exposed to a vast amount of knowledge during its training process, it has not perfectly memorized the information it have seen and so it doesn't know the boundary of its knowledge very well.
- Thus, those models are likely to make statements that sound plausible but are not true.
- How to reduce hallucinations:
- First ask the model to find any relevant quotes from the text and then ask it to use those quotes to answer the question
If you need inspiration, don't do it.
-Elon Musk-
반응형
'NLP' 카테고리의 다른 글
[NLP] 3. How does Transformer Work? (1) | 2025.01.03 |
---|---|
[Prompt Engineering] 2. Zer0-shot Prompting (0) | 2025.01.02 |
[Prompt Engineering] 1. Few-shot Prompting (0) | 2025.01.02 |
[NLP] 2. NLP Arrangement of Terms (용어 정리) (0) | 2024.11.21 |
[NLP] 1. Natural Language Processing Basics (0) | 2024.11.18 |