Skip to content

noise when decoding hevc yuv420p10le videos with device="cuda" #598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hanchchch opened this issue Mar 26, 2025 · 2 comments
Open

noise when decoding hevc yuv420p10le videos with device="cuda" #598

hanchchch opened this issue Mar 26, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@hanchchch
Copy link

🐛 Describe the bug

when I try to decode videos with pixel format yuv420p10le like below,

# ffprobe result
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt709/bt709/unknown), 1920x1080, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn (default)
# decoding script
decoder = VideoDecoder("./test.mp4", device="cuda")
frames = decoder.get_frames_in_range(0, 100)

cv2.imwrite("frame.png", frames.data.permute(0, 2, 3, 1).cpu().numpy()[0])

decoded frames appears like this, with full of noise:

Image

but with h264 yuv420p videos or if I use cpu decoding it works fine.

  Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x816 [SAR 1:1 DAR 40:17], 23.98 fps, 23.98 tbr, 1k tbn (default)

why is this happening?

Versions

0.2.1

@haixuanTao
Copy link

haixuanTao commented Apr 12, 2025

I'm not super familiar with torch codec but it seems that the bitdepth is hardcoded as uint8 (#585) and so trying with yuv420p10le (10bit) encoding should result in either reading 2 pixels over one or truncation.

@NicolasHug NicolasHug added the bug Something isn't working label Apr 14, 2025
@NicolasHug
Copy link
Member

Thank you for the report @hanchchch , we should investigate what's going on. This is probably related to these lines:

if (avFrame->colorspace == AVColorSpace::AVCOL_SPC_BT709) {
status = nppiNV12ToRGB_709CSC_8u_P2C3R(
input,
avFrame->linesize[0],
static_cast<Npp8u*>(dst.data_ptr()),
dst.stride(0),
oSizeROI);
} else {
status = nppiNV12ToRGB_8u_P2C3R(
input,
avFrame->linesize[0],
static_cast<Npp8u*>(dst.data_ptr()),
dst.stride(0),
oSizeROI);
}

For consistency with CPU outputs we should be converting to uint8 by default, but maybe in the future we can explore the possibility of returning a bigger dtype to preserve the 10 bits precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants