Skip to content

convert : fix duplicate key DeepSeek-R1 conversion error #14103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 10, 2025

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Jun 10, 2025

Since DeepSeekV3 support was merged into transformers AutoConfig started returning an incorrect head_dim value.

Fixes #14093

@github-actions github-actions bot added the python python script changes label Jun 10, 2025
@ericcurtin ericcurtin requested a review from Copilot June 10, 2025 12:16
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a guard to avoid writing duplicate key/value lengths when head_dim from AutoConfig is incorrect for DeepSeekV3.

  • Adds a conditional check comparing qk_rope_head_dim to head_dim before calling add_key_length and add_value_length
  • Documents the workaround with a comment pointing to the relevant Transformers config file
Comments suppressed due to low confidence (1)

convert_hf_to_gguf.py:558

  • Introduce a dedicated test for the DeepSeekV3 conversion path to cover this workaround and ensure future changes to head_dim handling don’t regress.
if (head_dim := self.hparams.get("head_dim")) is not None:

@CISC CISC merged commit 55f6b9f into master Jun 10, 2025
7 checks passed
@CISC CISC deleted the cisc/fix-deepseek-duplicate-key branch June 10, 2025 21:29
@compilade compilade mentioned this pull request Jun 12, 2025
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: ValueError: Duplicated key name 'deepseek2.attention.key_length'
2 participants