Skip to content

[Release] v0.20.0 Release Candidate Notes #17860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ysh329 opened this issue Apr 19, 2025 · 0 comments
Open

[Release] v0.20.0 Release Candidate Notes #17860

ysh329 opened this issue Apr 19, 2025 · 0 comments

Comments

@ysh329
Copy link
Contributor

ysh329 commented Apr 19, 2025

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), CUDA etc.

Please visit the full listing of commits for a complete view: v0.20.dev0...v0.20.0.rc0.

Community

None.

RFCs

None.

Adreno

  • #17608 - [WINDOWS] Windows build dependencies for Adreno target

BugFix

  • #17761 - [FIX][RELAX] fix fusion of transpose + matmul when constant weight
  • #17762 - [Fix] Fix OpenCL header in attention utils
  • #17711 - [Fix][dlight] add an explicit reduction loop check in Reduce
  • #17697 - [Fix] Include <chrono> for std::chrono
  • #17677 - Declare build backend for python package
  • #17598 - [TIR][FIX] update FlopEstimator to include missing nodes
  • #17601 - [Flashinfer][Fix] fix missing args in flashinfer test
  • #17607 - [FIX][TVMC] Fix the mixed precision conversion pipeline

CI

  • #17687 - Update images to 20250226-223225-63bc315f
  • #17680 - update images to 20250225-035137-aeadc31c
  • #17675 - [skip ci]Update github tvmbot
  • #17635 - Cleanup legacy files
  • #17634 - [skip ci]Improve build time
  • #17629 - [skip ci]Robustify CI for SPOT failure
  • #17620 - Unpin pytest-profiling
  • #17621 - [skip ci] Remove legacy CI runners protection
  • #17619 - [Refactor]Remove legacy frontend tests

Dlight

  • #17754 - Fix general reduction rule to support non-last reduction axis
  • #17663 - [CPU] Add CPU Backend Support for GEMV Optimization

Docker

  • #17691 - Fix ml_dtypes downgrade issue introduced by TensorFlow
  • #17686 - Update ml_dtypes to 0.5.1+
  • #17676 - Use Torch GPU on gpu device
  • #17648 - Tensorflow (aka TFLite) upgrade to 2.18.0
  • #17643 - Update ml_dtypes version
  • #17638 - [skip ci]Update ml_dtypes version
  • #17638 - [skip ci]Update ml_dtypes version
  • #17617 - Tensorflow upgrade to 2.18.0

Docs

  • #17650 - Update README
  • #17611 - Download 3rd party embeds to local files
  • #17604 - Update README

MetaSchedule

  • #17104 - Adding post optimization in MetaSchedule to Improve Scheduling

OpenCL & CLML

  • #17571 - [OPENCL][TEXTURE] Improved texture memory planning

Relax

  • #17814 - [PyTorch] Add stack.default and sum.default to exported programs translator
  • #17820 - [PyTorch] Add support for broadcast_to, narrow ops
  • #17822 - [PyTorch] Cleanup tests for ExportedProgram frontend
  • #17806 - [PyTorch] Add Softplus Op Support for Exported Program and FX graph
  • #17817 - [PyTorch] Support dynamic shapes in ExportedProgram frontend
  • #17813 - [PyTorch] Improve ExportedProgram frontend by supporting unflatten.int, hardtanh_.default, dropout_.default, silu_.default, add_.Tensor and relu_.default
  • #17812 - [PyTorch] Support argsort, topk ops for ExportedProgram importer
  • #17810 - [PyTorch] Add support for argsort, sort, topk ops
  • #17809 - [PyTorch] Delete duplicate converter function _to
  • #17807 - [PyTorch] Fix torch 2.6 compatibility issues
  • #17797 - [Pytorch] Update SELU Implementation Using Decomposed Core-Level Ops
  • #17802 - [Pytorch] support for arange in exported programs translator
  • #17801 - [PyTorch] Support where, cumprod and reciprocal ops for ExportedProgram importer
  • #17790 - [PyTorch] Add support for index_select
  • #17786 - [PyTorch] Support softshrink op for ExportedProgram
  • #17788 - [PyTorch] Add support for where, cumprod and reciprocal ops
  • #17785 - [PyTorch] Support prod, std and var ops for ExportedProgram importer
  • #17778 - [PyTorch] Support log2, log10 and log1p ops for ExportedProgram importer
  • #17772 - [PyTorch] Add support for prod, std and var ops
  • #17766 - [PyTorch] Add support for log2, log10 and log1p ops
  • #17760 - [PyTorch] Add support for lerp, select and clone ops
  • #17751 - [PyTorch] Support one_hot, empty_like ops for ExportedProgram importer
  • #17747 - [PyTorch] Support flip, gather, take ops for ExportedProgram importer
  • #17738 - [PyTorch] Support elu, celu, selu ops for ExportedProgram importer
  • #17726 - [PyTorch] Add support for numel, empty_like and one_hot ops
  • #17707 - [PyTorch] Add support for gather, flip and take ops
  • #17702 - [PyTorch] Add support for celu, selu, is_floating_point ops
  • #17694 - [PyTorch] Add support for elu, hardtanh ops
  • #17689 - [PyTorch] Support several binary ops for ExportedProgram importer
  • #17672 - [PyTorch] Refactor binary ops tests
  • #17679 - [PyTorch] Support several unary ops for ExportedProgram importer
  • #17668 - [PyTorch] Add support for and_, lshift, min, or_, rshift, xor ops
  • #17664 - [PyTorch] Add support for ge, gt, le, mod, ne ops
  • #17659 - [PyTorch] Add support for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square ops
  • #17622 - [PyTorch] Add support for abs, ceil, erf, floor, log ops and refactor unary tests
  • #17566 - [ONNX] Add prim experssion support to Neg converter and update Arange converter to use relax.op.arange
  • #17642 - [ONNX]replace topi.split with relax.op.split in the onnx frontend
  • #17674 - [KVCache] PagedKVCache refactor, FlashInfer JIT and MLA integration
  • #17618 - [KVCache] TIR attention kernel support for MLA
  • #17615 - [KVCache] Add KV Cache for CPU Runtime
  • #17616 - [Runtime][KVCache] Initial interface setup for MLA
  • #17782 - [Frontend] Support max/min in frontend op interface
  • #17758 - Allow ingesting tensor.chunk() from exported torch program
  • #17781 - Enable bfloat16 for softmax struct-info inference
  • #17752 - Batch norm correctness on eval mode
  • #17774 - check for tensor_meta in exported_program_translator
  • #17757 - Tensor.split with uneven tensors
  • #17749 - Move TIR backend to gpu_generic
  • #17725 - Ingest Tensor.clamp from torch export
  • #17724 - Add support to ingest Tensor.expand_as()
  • #17723 - Add torch exported program ingestion capability for Tensor.detach(), Tensor.copy_, and aten.lift_fresh_copy
  • #17721 - Allow ingesting Upsample module from torch.export either using Size or Scale Factor argument
  • #17722 - Allow ingesting vector_norm from torch.export
  • #17728 - ingest Tensor.contiguous from torch export
  • #17700 - Fix tree attention for Qwen2-1.5 models
  • #17682 - Add support for func attr inheritance in SplitLayoutRewritePreproc
  • #17654 - [BYOC] OpenCLML offload support for Relax
  • #17633 - Pipeline file reorganization
  • #17626 - Initial setup of relax backend pipeline
  • #17568 - [PASS] Convert layout pass and ops enhanced to support sub indexing

Runtime

  • #17614 - [CLML] Profiling options enabled for CLML
  • #17614 - [CLML] Profiling options enabled for CLML
  • #17570 - [OPENCL] Bugfix

TIR

  • #17799 - Fix reduce buffer allocation position
  • #17783 - [REFACTOR]remove legacy tir::any
  • #17706 - Minor fix for default GPU schedule
  • #17579 - [SoftwarePipeline] Ensure pipeline epilogue and prologue do not overlap
  • #17584 - [LoopPartition] enforcement on loop partition control

TVMC

cuda & cutlass & tensorrt

  • #17789 - [CUTLASS] Add blockwise scale gemm/bmm kernels
  • #17741 - [Codegen][CUDA] Fix codegen of cast among vector bfloat16, fp8 and fp4
  • #17708 - [CUDA] FP4 cast and reinterpret support
  • #17639 - [CUDA] Remove htanh from unsupported math ops for CUDA 12.8
  • #16950 - [Codegen, CUDA] Add FP8 Tensor Core Codegen

web

  • #17695 - [WASM] Update wasm include in accordance to kv cache revamp

Misc

  • #17796 - [Cublas] Added support for bfloat16 while dispatching to cublas kernels
  • #17763 - [Flashinfer] Added jit flow for sampling kernel
  • #17811 - [NFC] Fix explict typo
  • #17780 - [3rdparty] Enable bfloat16 for custom allreduce kernel
  • #17784 - [REFACTOR] Phase out StackVM
  • #17750 - BugFix: Relax comment
  • #17748 - [Codegen] Support codegen for vectorized tir.ShuffleNode
  • #17743 - Fix: Change variable i to x in split operation in cross_compilation_and_rpc.py
  • #17730 - [Attention] Added caching for flashinfer binaries during JIT
  • #17733 - [Refactor] Clean up Relay references in the codebase
  • #17739 - [BF16] Support ndarray.asnumpy() to bfloat16 tensor natively using ml_dtypes
  • #17734 - Remove Google Analytics
  • #17731 - [IR] Compact Functor vtable
  • #17736 - Fix typos in comments and strings
  • #17670 - [DataType] BF16 Support
  • #17727 - [FFI] Fix dynamic FFI index to ensure compatibility
  • #17718 - [Refactor] Migrate build API to tvm.compile
  • #17714 - [FFI] Phase out ctypes fallback in favor of cython
  • #17716 - Fix the get_target_compute_version for sm >= 100
  • #17710 - [Refactor] Introduce base Executable class and tvm.compile interface
  • #17713 - [REFACTOR] Cleanup legacy relay runtime data structures
  • #17712 - [DataType] Rename FP8 dtypes to standard names
  • #17703 - Fix typos in multiple files
  • #17693 - updated the assert in BindParams to allow tvm.relax.Constant
  • #17701 - [Refactor] Remove legacy TE schedule tag
  • #17683 - [MSC] Remove relay
  • #17688 - Fix relax.ccl.scatter_from_worker0 assert
  • #17630 - [Codegen] FP4 support
  • #17685 - [REFACTOR] Cleanup legacy TE-based passes
  • #17681 - [REFACTOR] Followup cleanup of relay phase out
  • #17678 - Bump 3rdparty/cutlass_fpA_intB_gemm
  • #17669 - [REFACTOR] Allow target dependent default tir pipeline dispatch in tir.build()
  • #17665 - [REFACTOR] move build flow from C++ to Python
  • #17624 - Added support for normal MLA kernel
  • #17641 - Pick up vector length from 'zvlXXXb' (RVV) mattr for riscv
  • #17666 - [Refactor] Improve TargetHasSVE function with optional target handling
  • #17661 - [Refactor] Phrase out python dependency decorator
  • #17662 - [REFACTOR] Phase out te.Schedule c++ components
  • #17660 - [REFACTOR] Phase out relay c++ components
  • #17655 - Upgrading onnx and onnxrt verions
  • #17657 - Update argument order for relax.op.pad to make it round-trippable
  • #17658 - [REFACTOR] Phase out te.schedule python components
  • #17653 - Update images to 20250214-034537-bd1411f8
  • #17656 - [REFACTOR] Phase out relay python components
  • #17649 - [Refactor] Phase out python dependency attrs
  • #17644 - Bump rollup from 2.79.1 to 2.79.2 in /web
  • #17637 - [PYTHON] Build cython by default
  • #17631 - Handle vector width (VLEN) for RISCV arches
  • #17613 - Bug Fix: Removed unused code
  • #17585 - [Relay]Disable InferType if it was done and no changes after previous pass
  • #17605 - [Refactor] Phase out legacy example apps
  • #17603 - [Refactor] Phase out legacy docs
  • #17513 - [GRAPH RT] Additional API support
@ysh329 ysh329 added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug and removed type: bug needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Apr 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant