Docs | Tutorial | DiffTaichi | Examples | Contribute | Forum
| Documentations | Chat | taichi-nightly | taichi-nightly-cuda-10-0 | taichi-nightly-cuda-10-1 |
|---|---|---|---|---|
# Python 3.6/3.7 needed
# CPU only. No GPU/CUDA needed. (Linux, OS X and Windows)
python3 -m pip install taichi-nightly
# With GPU (CUDA 10.0) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-0
# With GPU (CUDA 10.1) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-1| Linux (CUDA) | OS X (10.14+) | Windows | |
|---|---|---|---|
| Build | |||
| PyPI |
- (Done) Fully implement the LLVM backend to replace the legacy source-to-source C++/CUDA backends (By Dec 2019)
- The only missing features compared to the old source-to-source backends:
- Vectorization on CPUs. Given most users who want performance are using GPUs (CUDA), this is given low priority.
- Automatic shared memory utilization. Postponed until Feb/March 2020.
- The only missing features compared to the old source-to-source backends:
- (Done) Redesign & reimplement (GPU) memory allocator (by the end of Jan 2020)
- (WIP) Tune the performance of the LLVM backend to match that of the legacy source-to-source backends (Hopefully by mid Feb, 2020. Current progress: setting up/tuning for final benchmarks)
-
(Feb 20, 2020) v0.5.2 released
- Gradients for
ti.pownow supported (by Yubin Peng [archibate]) - Multi-threaded unit testing (by Yubin Peng [archibate])
- Fixed Taichi crashing when starting multiple instances simultaneously (by Yubin Peng [archibate])
- Metal backend now supports
ti.pow(by Ye Kuang [k-ye]) - Better algebraic simplification (by Mingkuan Xu [xumingkuan])
ti.normalizednow optionally takes a argumentepsto prevent division by zero in differentiable programming- Improved random number generation by decorrelating PRNG streams on CUDA
- Set environment variable
TI_LOG_LEVELtotrace,debug,info,warn,errorto filter out/increase verbosity. Default=info - [bug fix] fixed a loud failure on differentiable programming code generation due to a new optimization pass
- Added
ti.GUI.triangleexample - Doc update: added
ti.crossfor 3D cross products - Use environment variable
TI_TEST_THREADSto override testing threads - [For Taichi developers, bug fix]
ti.init(print_processed=True)renamed toti.init(print_preprocessed=True) - Various development infrastructure improvements by Yubin Peng [archibate]
- Gradients for
-
(Feb 16, 2020) v0.5.1 released
- Keyboard and mouse events supported in the GUI system. Check out mpm128.py for a interactive demo! (by Yubin Peng [archibate] and Ye Kuang [k-ye])
- Basic algebraic simplification passes (by Mingkuan Xu [xumingkuan])
- (For developers)
ti(ti.exe) command supported on Windows after setting%PATH%correctly (by Mingkuan Xu [xumingkuan]) - General power operator
x ** ynow supported in Taichi kernels (by Yubin Peng [archibate]) .dense(...).pointer()now abbreviated as.pointer(...).pointernow stands for a dense pointer array. This leads to cleaner code and better performance. (by Kenneth Lozes [KLozes])- (Advanced struct-fors only)
for i in Xnow iterates all child instances ofXinstead ofXitself. Skip this if you only useX=leaf nodesuch asti.f32/i32/Vector/Matrix. - Fixed cuda random number generator racing conditions
-
(Feb 14, 2020) v0.5.0 released with a new Apple Metal GPU backend for Mac OS X users! (by Ye Kuang [k-ye])
- Just initialize your program with
ti.init(..., arch=ti.metal)and run Taichi on your Mac GPUs! - A few takeaways if you do want to use the Metal backend:
- For now, the Metal backend only supports
denseSNodes and 32-bit data types. It doesn't supportti.random()orprint(). - Pre-2015 models may encounter some undefined behaviors under certain conditions (e.g. read-after-write). According to our tests, it seems like the memory order on a single GPU thread could go inconsistent on these models.
- The
[]operator in Python is slow in the current implementation. If you need to do a large number of reads, consider dumping all the data to anumpyarray viato_numpy()as a workaround. For writes, consider first generating the data into anumpyarray, then copying that to the Taichi variables as a whole. - Do NOT expect a performance boost yet, and we are still profiling and tuning the new backend. (So far we only saw a big performance improvement on a 2015 MBP 13-inch model.)
- For now, the Metal backend only supports
- Just initialize your program with
-
(Feb 12, 2020) v0.4.6 released.
- (For compiler developers) An error will be raised when
TAICHI_REPO_DIRis not a valid path (by Yubin Peng [archibate]) - Fixed a CUDA backend deadlock bug
- Added test selectors
ti.require()andti.archs_excluding()(by Ye Kuang [k-ye]) ti.init(**kwargs)now takes a parameterdebug=True/False, which turns on debug mode if true- ... or use
TI_DEBUG=1to turn on debug mode non-intrusively - Fixed
ti.profiler_clear - Added
GUI.line(begin, end, color, radius)andti.rgb_to_hex - Renamed
ti.trace(Matrix trace) toti.tr.ti.traceis now for logging withti.TRACElevel - Fixed return value of
ti test_cpp(thanks to Ye Kuang [k-ye]) - Raise default loggineg level to
ti.INFOinstead of trace to make the world quiter - General performance/compatibility improvements
- Doc updated
- (For compiler developers) An error will be raised when
-
(Feb 6, 2020) v0.4.5 released.
ti.init(arch=..., print_ir=..., default_fp=..., default_ip=...)now supported.ti.cfg.xxxis deprecated- Immediate data layout specification supported after
ti.init. No need to wrap data layout definition with@ti.layoutanymore (unless you intend to do so) ti.is_active,ti.deactivate,SNode.deactivate_allsupported in the new LLVM x64/CUDA backend. Example
- Experimental Windows non-UTF-8 path fix (by Yubin Peng [archibate])
ti.global_var(which duplicatesti.var) is removedti.Matrix.rotation2d(angle)added
-
(Feb 5, 2020) v0.4.4 released.
- For developers: ffi-navigator support [doc]. (by masahi)
- Fixed
f64precision support ofsinandcoson CUDA backends (by Kenneth Lozes [KLozes]) - Make Profiler print the arch name in its title (by Ye Kuang [k-ye])
- Tons of invisible contributions by Ye Kuang [k-ye], for the WIP Metal backend
Profilerworking on CPU devices. To enable,ti.cfg.enable_profiler = True. Callti.profiler_print()to print kernel running times- General performance improvements
-
(Feb 3, 2020) v0.4.3 released.
GUI.circles2.4x faster- General performance improvements
-
(Feb 2, 2020) v0.4.2 released.
- GUI framerates are now more stable
- Optimized OffloadedRangeFor with const bounds. Light computation programs such as
mpm88.pyis 30% faster on CUDA due to reduced kernel launches - Optimized CPU parallel range for performance
-
(Jan 31, 2020) v0.4.1 released.
- Fixed an autodiff bug introduced in v0.3.24. Please update if you are using Taichi differentiable programming.
- Updated
Dockerfile(by Shenghang Tsai [jackalcooper]) pbf2d.pyvisualization performance boosted (by Ye Kuang [k-ye])- Fixed
GlobalTemporaryStmtcodegen
-
(Jan 30, 2020) v0.4.0 released.
- Memory allocator redesigned
- Struct-fors with pure dense data structures will be demoted into a range-for, which is faster since no element list generation is needed
- Python 3.5 support is dropped. Please use Python 3.6(pip)/3.7(pip)/3.8(Windows: pip; OS X & Linux: build from source) (by Chujie Zeng [Psycho7])
ti.deactivatenow supported on sparse data structuresGUI.circles(batched circle drawing) performance improved by 30x- Minor bug fixes (by Yubin Peng [archibate], Ye Kuang [k-ye])
- Doc updated
- (SIGGRAPH Asia 2019) High-Performance Computation on Sparse Data Structures [Video] [BibTex]
- by Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand
- (ICLR 2020) Differentiable Programming for Physical Simulation [Video] [BibTex] [Code]
- by Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand
