Github Lucywang720 Model Surgery
Find Surgery Github Contribute to lucywang720 model surgery development by creating an account on github. In this paper, we observe that surprisingly, directly editing a small subset of parameters can effectively modulate specific behaviors of llms, such as detoxification and resistance to jailbreaking.
Github Lucywang720 Model Surgery Inspired by this finding, we propose a new approach called model surgery, which aims to manipulate the hidden layers of llms to shift away from the direction associated with a specific behavior (i.e., the direction indicated by the trained probe) when the llm generates output. Use this form to create a github issue with structured data describing the correction. you will need a github account. once you create that issue, the correction will be reviewed by a staff member. The repository contains code and data demonstrating the modification of llama2 7b and other models to reduce toxicity, increase resistance to jailbreaking prompts, and shift sentiment expression. To deploy llms as ai assistants, it is crucial that these models exhibit desirable behavioral traits, such as non toxicity and resilience against jailbreak attempts.
Github Lucywang720 Model Surgery The repository contains code and data demonstrating the modification of llama2 7b and other models to reduce toxicity, increase resistance to jailbreaking prompts, and shift sentiment expression. To deploy llms as ai assistants, it is crucial that these models exhibit desirable behavioral traits, such as non toxicity and resilience against jailbreak attempts. Contribute to lucywang720 model surgery development by creating an account on github. Toxification becomes more significant. conversely, when α is less than 0 and decreases, the model surgery exerts an opposite effect, generating more toxic outputs. more. In this paper, we observe that surprisingly, directly editing a small subset of parameters can effectively modulate specific behaviors of llms, such as detoxification and resistance to jailbreaking, with only inference level computational resources. This paper discusses model surgery, a new method for adjusting large language models (llms) to improve their behavior by directly editing specific parameters instead of retraining the entire model.
Comments are closed.