Skip to content

The platform module can cause crashes in Windows due to slow WMI calls #125315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
runn opened this issue Oct 11, 2024 · 8 comments
Closed

The platform module can cause crashes in Windows due to slow WMI calls #125315

runn opened this issue Oct 11, 2024 · 8 comments
Assignees
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes OS-windows type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@runn
Copy link

runn commented Oct 11, 2024

Crash report

What happened?

When running on a virtual machine where WMI calls seems to have very variable performance the WMI C module can cause Python to crash.

If you have (or can simulate slow WMI calls) then simple python code should (non-deterministically) reproduce the problem. The reason it's non-deterministic is because It's a thread race with a shared resource on the stack. WMI thread in CPython can end up with a pointer to a now invalid stack frame.

I can cause the problem by repeatedly calling platform.machine() and platform.win32_ver() in a loop of about 100 iterations on a machine with slow WMI calls.

import platform

for i in range(100):
    platform.win32_ver()
    platform.machine()

On the affected machines this will sometimes cause the whole process to die with error that relates to the stack being trashed, such as 0xC0000409 where the stack canary has been overwritten.

From a crash dump (that i cannot share) I debugged this issue by taking the WMI module and running on its own. I notice in the code that there is a timeout that seems to have been created because the WMI calls themselves can be quite slow, especially in the case of permission problems where WMIs own timeout is quite long.

https://github.com/python/cpython/blob/main/PC/_wmimodule.cpp#L282

The problem is that this timeout can cause the function that the platform modules uses to finish before the thread running the WMI code does. This is a bit of a problem because the thread is using a pointer to a struct that is allocated on the stack that is about to go away.

https://github.com/python/cpython/blob/main/PC/_wmimodule.cpp#L241

That struct has handles to a bunch of things that the WMI thread wants to use or clean up, including references to the ends of a pipe for which WriteFile calls are used.

In some situations python hangs, sometimes windows terminates it because it detected a stack overflow, sometimes it works, sometimes the timeout is fine, but it all depends on where the thread doing the WMI work was at the time the calling function terminates.

I can stop this problem by monkey patching the WMI calls in the platform module (it has alternative code paths that work ok). I can also stop it by removing the simulated timeout in the WMI module.

The problem is that lots of tools use the platform module - i first discovered this whilst using poetry when poetry install would just terminate, but it can affect anything that uses the platform module to make WMI calls on a machine with slow WMI.

There is no reasonable workaround on the virtual machines I use because they are managed by an organisation (as is the python install on those machines).

CPython versions tested on:

3.12

Operating systems tested on:

Windows

Output from running 'python -VV' on the command line:

No response

Linked PRs

@runn runn added the type-crash A hard crash of the interpreter, possibly with a core dump label Oct 11, 2024
@runn
Copy link
Author

runn commented Oct 11, 2024

I believe this is the commit that adds the timeout that exposes the problem with the stack allocated struct, though i think there might be a few code paths in here where problems will happen if _wmi_exec_query_impl exits before the thread is finished

5a0137c

related to this issue
#112658

@runn runn changed the title Slow WMI calls in Windows can cause crashes that include stack overflow The platform module can cause crashes in Windows due to slow WMI calls Oct 11, 2024
@zooba
Copy link
Member

zooba commented Oct 18, 2024

Acknowledged. We should fix this, I haven't had a chance to look into it yet. At a guess, we need to copy something from that shared struct into a local in the thread before we start work.

@runn
Copy link
Author

runn commented Oct 18, 2024

Thanks!

I think the query string pointer could be dealt with (roughly) by just allocating the BSTR before COM init happens. At least there's a thread local copy quite quickly that'll likely beat the race.

If you copy the handle refs local to the thread there's a chance the calling thread will close them first then CloseHandle might cause InvalidHandle to be raised. Maybe no massive issue outside of structured exception handling? DuplicateHandle is a pain to use here.

In some ways I worry about using WMI. I know it's the canonical location for things like windows version information but WMI has caused me all sorts of trouble over the years so I'm a little cautious. Occasionally some file gets corrupted and WMI stops working. Occasionally it's very slow. It's dependent on permission, and the timeout there is long (and not configurable) - which i think caused the original waitforsingleobject calls with a timeouts to be added.

WMI is just a very complex beast under the hood i suppose: https://learn.microsoft.com/en-us/windows/win32/wmisdk/wmi-architecture

The fact that we basically stop using the WMI code in 3.13 the first time a timeout happens sort of adds to the argument that we should consider avoiding it altogether.

For various reasons I cannot contribute a PR even though i'd love to, so please accept my apologies for doing complaining and no contributing.

@zooba
Copy link
Member

zooba commented Oct 21, 2024

allocating the BSTR before COM init happens

Unfortunately, I'm pretty sure this is cheating 😆 COM init is needed to initialise the memory allocator used to allocate the BSTR.

The fact that we basically stop using the WMI code in 3.13 the first time a timeout happens sort of adds to the argument that we should consider avoiding it altogether.

If you know of another way to accurately get the OS version for display (not compatibility) purposes (bearing in mind that we've tried all the other obvious and non-obvious ones), I'd love to hear about it. The next best one seems to be to run cmd.exe and parse its ver command, which also breaks in similar ways, probably on more correctly configured systems than WMI will break on. Due to how Windows is built these days, we can't read the version info from any system files anymore.

There supposedly is not meant to be any permission restriction on reading the basic OS info for the current machine. Granted, plenty of other info requires more permissions, and I guess it's possible to forcibly override those sensible defaults. I personally haven't found a broken machine - all the timeouts were contributed by others who had, but couldn't explain why they were failing.

@runn
Copy link
Author

runn commented Oct 22, 2024

I'm likely quite out of date on all of this having not written any COM professionally for a good while, so I'm probably missing something totally obvious, but I thought you could use SysAllocString before CoInit because the allocator is available before CoInit is called.

https://learn.microsoft.com/en-us/windows/win32/api/objbase/nf-objbase-coinitialize#remarks

If i write a little test and call SysAllocString without CoInit then i get a proper BSTR back with the length, the embedded wchar and the pointer to it. SysFreeString and SysStringLen all work fine.

I always thought we needed coinit for any object creation, but there's a bunch of oleauto stuff that's ok. Though the docs for SysAllocString makes no claims either way. Like i say, it's been a while!

Anyway - i guess that's all besides the point. I just meant allocate it early and it's probably not even an especially useful comment in the first place :)

Even worse, I have no better alternatives outside of what the platform module already does in the case that the timeout is triggered in the WMI call. Microsoft really seems to push the idea that no one should depend on the version and check for compatibility instead, which isn't what is wanted here I suppose.

I suspect the WMI failures happen in a corp environment where machines have all sorts of invasive stuff installed that intercept Win32 or COM calls. I can only reproduce this sporadically. Love me a Heisenbug :)

@zooba
Copy link
Member

zooba commented Oct 23, 2024

Doing some digging, SysAllocString uses CoTaskMemAlloc which claims to instantiate the IMalloc interface, but actually refers to a static instance of the object rather than one that needs runtime allocation. This static instance is initialized when the combase DLL is loaded, which is why CoTaskMemAlloc can work before CoInitialize.

But it's not specified as such! However, I'm pretty sure it couldn't possibly change, as it would break too many apps. So I guess it's safe enough.

@zooba zooba self-assigned this Oct 29, 2024
@zooba zooba added 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes labels Oct 29, 2024
zooba added a commit to zooba/cpython that referenced this issue Oct 29, 2024
zooba added a commit that referenced this issue Oct 30, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 30, 2024
…n some Windows machines (pythonGH-126141)

(cherry picked from commit 60c415b)

Co-authored-by: Steve Dower <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 30, 2024
…n some Windows machines (pythonGH-126141)

(cherry picked from commit 60c415b)

Co-authored-by: Steve Dower <[email protected]>
zooba added a commit that referenced this issue Oct 30, 2024
… Windows machines (GH-126141)

(cherry picked from commit 60c415b)

Co-authored-by: Steve Dower <[email protected]>
zooba added a commit that referenced this issue Oct 30, 2024
… Windows machines (GH-126141)

(cherry picked from commit 60c415b)

Co-authored-by: Steve Dower <[email protected]>
@picnixz
Copy link
Member

picnixz commented Nov 2, 2024

@zooba Can this issue be closed or is there anything left to do?

@zooba
Copy link
Member

zooba commented Nov 4, 2024

Now that the backports are merged, it's done.

@zooba zooba closed this as completed Nov 4, 2024
picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
copybara-service bot pushed a commit to openxla/xla that referenced this issue Feb 10, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 10, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to openxla/xla that referenced this issue Feb 10, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 10, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to google/tsl that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to openxla/xla that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to google/tsl that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to openxla/xla that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to openxla/xla that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 725214831
copybara-service bot pushed a commit to google/tsl that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 730930044
copybara-service bot pushed a commit to openxla/xla that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 730930044
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 730930044
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 25, 2025
This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
WMI query errors (python/cpython#125315).

PiperOrigin-RevId: 730930044
saisindhuri91 added a commit to linux-on-ibm-z/tensorflow that referenced this issue Feb 26, 2025
commit d56c042
Author: Adrian Kuegel <[email protected]>
Date:   Tue Feb 25 22:27:49 2025 -0800

    Let FusionDeduplicationCache handle ProducerConsumer multi-output fusions.

    This will be needed when we want to allow such fusions in PriorityFusion.

    PiperOrigin-RevId: 731165217

commit 77ba208
Author: Majid Dadashi <[email protected]>
Date:   Tue Feb 25 21:08:34 2025 -0800

    Enable folding of quantized reshape with per-axis scales

    PiperOrigin-RevId: 731144237

commit 446fac2
Author: Eunjae Kim <[email protected]>
Date:   Tue Feb 25 21:05:05 2025 -0800

    Introduce `FunctionBody::Finalize()` to populate `AllocatorAttribute`s for arg/ret nodes and release unnecessary resources

    PiperOrigin-RevId: 731143677

commit 58269e0
Author: Weiyi Wang <[email protected]>
Date:   Tue Feb 25 18:27:37 2025 -0800

    Flip default of _experimental_enable_composite_direct_lowering flag to True

    PiperOrigin-RevId: 731105623

commit 7d4ce51
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 17:42:26 2025 -0800

    Move some strategy generation utilities from auto_sharding_dot_handler.cc to
    auto_sharding_strategy.h with the intention of using the utilities more broadly
    throughout the codebase.

    PiperOrigin-RevId: 731094359

commit af03154
Author: Yin Zhang <[email protected]>
Date:   Tue Feb 25 17:09:21 2025 -0800

    Reverts changelist 723349025

    PiperOrigin-RevId: 731085146

commit 2bb741a
Author: Eric Yang <[email protected]>
Date:   Tue Feb 25 17:07:40 2025 -0800

    Add HLO adapter

    PiperOrigin-RevId: 731084644

commit 745b9dd
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 16:37:04 2025 -0800

    Always set use_global_scheduler/rank_queues with priority_merge policy.

    PiperOrigin-RevId: 731074632

commit 16e6b9f
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 16:23:15 2025 -0800

    Integrate LLVM at llvm/llvm-project@9889de834b0a

    Updates LLVM usage to match
    [9889de834b0a](llvm/llvm-project@9889de834b0a)

    PiperOrigin-RevId: 731070091

commit 1e392a4
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 16:17:51 2025 -0800

    Update ops-related pbtxt files.

    PiperOrigin-RevId: 731068451

commit fcedb3c
Author: Pat Notz <[email protected]>
Date:   Tue Feb 25 16:14:40 2025 -0800

    Flag guard the option to disable embedding pipelining when summary ops are present

    PiperOrigin-RevId: 731067500

commit 8ebbd6c
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 16:08:09 2025 -0800

    Go: Update generated wrapper functions for TensorFlow ops.

    PiperOrigin-RevId: 731065650

commit 8a759e6
Author: Derek Murray <[email protected]>
Date:   Tue Feb 25 16:05:29 2025 -0800

    Introduce `TPUDummyInput` as a specialization of `Fill` for ICI weight distribution.

    The new op has a few benefits over the previous version:
    * We can generate a single op instead of three ops for each dummy input.
    * The new op is marked as `DoNotOptimize` and `TF_NoConstantFold`, so it will never be accidentally constant-folded to a large memory footprint.

    PiperOrigin-RevId: 731064699

commit 4723816
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 15:36:07 2025 -0800

    Support setting up global prioritized batching via the batch op rewriter.

    PiperOrigin-RevId: 731054770

commit ae1d10b
Author: Luke Boyer <[email protected]>
Date:   Tue Feb 25 14:29:03 2025 -0800

    Add serialization options to the public API for alignment for bytecode.

    PiperOrigin-RevId: 731030707

commit 1333586
Author: Tai Ly <[email protected]>
Date:   Tue Feb 25 16:28:32 2025 -0600

    [tosa] Fix lowering of tf/tfl expand_dims for negative dim (tensorflow#67859)

    This fixes lowering of tf/tfl expand_dims to tosa,
    for negative dim values such that dim=-1 means adding
     inner most dimension

commit 78dd108
Author: Terry Heo <[email protected]>
Date:   Tue Feb 25 14:04:09 2025 -0800

    litert: Fix broken Dispatch API tests

    Provide valid DispatchOption to LiteRtDispatchInitialize()

    PiperOrigin-RevId: 731021714

commit b120e3e
Author: Michael Hudgins <[email protected]>
Date:   Tue Feb 25 13:43:31 2025 -0800

    [XLA:OSS] Add CI connection step to the ci workflows.

    PiperOrigin-RevId: 731013692

commit 0e5ec72
Author: Reed Wanderman-Milne <[email protected]>
Date:   Tue Feb 25 13:34:28 2025 -0800

    Fix race condition in the predicate in GPU thunks.

    WhileThunk and ConditionalThunk stored CUDA host memory that would store the predicate. The thunks would transfer the predicate from device to host into the CUDA host memory. But if the thunks were called multiple times in parallel, each call would use the same memory, causing a race condition which could result in incorrect predicate values.

    Now a pool of host memory is used so different calls to the thunk get different pointers to host memory. The pool has a fixed size of 128, so if there are more parallel callers than that, an error will be raised. I think it's unlikely there will be that many parallel calls in practice.

    PiperOrigin-RevId: 731010318

commit b273bba
Author: Chenguang Wang <[email protected]>
Date:   Tue Feb 25 13:07:11 2025 -0800

    Fix Android ARM64 build for hlo_to_mhlo.

    See also commit ce2bae2.

    PiperOrigin-RevId: 731000510

commit 97d5495
Author: Andrew Zhang <[email protected]>
Date:   Tue Feb 25 12:47:12 2025 -0800

    Directly overwrite ADSP_LIBRARY_PATH if shared lib path is provided to qnn manager.

    Fix the issue where existing ADSP_LIBRARY_PATH contains other versions QNN lib files.

    PiperOrigin-RevId: 730992932

commit f4e0633
Author: Julia Guo <[email protected]>
Date:   Tue Feb 25 12:43:48 2025 -0800

    [XLA:GPU] Fix xspace.pb path

    PiperOrigin-RevId: 730991615

commit 0a6967b
Author: Oleg Shyshkov <[email protected]>
Date:   Tue Feb 25 12:33:49 2025 -0800

    [XLA:GPU] Fix thunk emitter for degenerate ops.

    The condition to get index of the output buffer wasn't always correct. It's possible to have an op with 1 operand and result with a tuple of 1 element. For example, a degenerate a2a will look like:

    ```
    a2a = (u32[2]) all-to-all(u32[2] a1), replica_groups={{0},{1}}
    ```

    It's better to check that output is a tuple.

    PiperOrigin-RevId: 730988026

commit 95e9577
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 12:02:27 2025 -0800

    Fix HLO stats table to use int types as ints (instead of strings).

    PiperOrigin-RevId: 730976625

commit 754f826
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 11:53:35 2025 -0800

    Reverts 1e0f639

    PiperOrigin-RevId: 730973217

commit 9b75a55
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 11:39:38 2025 -0800

    Cleanup: Fix includes.

    PiperOrigin-RevId: 730967918

commit 09806e6
Author: Luke Boyer <[email protected]>
Date:   Tue Feb 25 11:36:33 2025 -0800

    Add support for aligned byte code in internal model serialize API

    PiperOrigin-RevId: 730966854

commit 14aeefb
Author: Penporn Koanantakool <[email protected]>
Date:   Tue Feb 25 10:25:55 2025 -0800

    [xla:cpu:onednn] Support basic MatMul in oneDNN fusion thunk.

    PiperOrigin-RevId: 730937945

commit c47c195
Author: David Dunleavy <[email protected]>
Date:   Tue Feb 25 10:21:01 2025 -0800

    Remove TensorFlow specific configs in `tensorflow.bazelrc`

    PiperOrigin-RevId: 730935687

commit 5c4dddd
Author: Eugene Zhulenev <[email protected]>
Date:   Tue Feb 25 10:18:18 2025 -0800

    [xla:cpu] Move dot_kernel_emitter under codegen/dot

    PiperOrigin-RevId: 730934538

commit e69ca84
Author: Vladimir Belitskiy <[email protected]>
Date:   Tue Feb 25 10:06:41 2025 -0800

    Patch rules_python to point to the newest Python 3.12 patch version.

    This should hopefully resolve Windows RBE test runs on Python3.12 flaking with
    WMI query errors (python/cpython#125315).

    PiperOrigin-RevId: 730930044

commit f1dc591
Author: Won Jong Jeon <[email protected]>
Date:   Tue Feb 25 10:16:42 2025 -0800

    [mlir][tosa] Fix lit tests for resize (tensorflow#87976)

    Change-Id: I8cb88a0b6344259d57a37d6ddd2c0810bb7a61e7

    Signed-off-by: Won Jeon <[email protected]>

commit 0f1a45d
Author: Quentin Khan <[email protected]>
Date:   Tue Feb 25 09:52:32 2025 -0800

    #litert Create the NPU accelerator.

    The accelerator is not yet automatically registered to the LiteRT environment.

    PiperOrigin-RevId: 730924856

commit 57859a1
Author: Aliia Khasanova <[email protected]>
Date:   Tue Feb 25 09:46:10 2025 -0800

    Overwrite xla_dump_as_* options in raw_options only if raw_options.xla_dump_to is set. Otherwise keep debug_options settings.

    This is needed to access the flags state in PjRtStreamExecutorLoadedExecutable::Execute. Specifically, I need to access dumping options in order to dump unoptimized hlo module with arguments during execution correctly.

    PiperOrigin-RevId: 730922688

commit c9c731e
Author: Quentin Khan <[email protected]>
Date:   Tue Feb 25 09:30:03 2025 -0800

    #litert Fix `LITERT_RETURN_IF_ERROR` when checking bool return values.

    - `false` return values are errors.
    - Add `kLiteRtStatusErrorUnknown` for unknown errors.
    - When converting a boolean error to a `LiteRtStatus`/`litert::Expected`, the
      error value is `kLiteRtStatusErrorUnknown`.

    PiperOrigin-RevId: 730917169

commit 573c1ff
Author: Ilia Sergachev <[email protected]>
Date:   Tue Feb 25 09:12:55 2025 -0800

    PR tensorflow#23078: Revert "PR tensorflow#22292: [GPU] Support cuDNN explicit CUDA graph construction."

    Imported from GitHub PR openxla/xla#23078

    This reverts commit 65b4b8874b659d7f11523f7b1d6df1613cfc8984.
    Copybara import of the project:

    --
    f2cc964f5b849b149626a007045cccc32778ee27 by Ilia Sergachev <[email protected]>:

    Revert "PR tensorflow#22292: [GPU] Support cuDNN explicit CUDA graph construction."

    This reverts commit 65b4b8874b659d7f11523f7b1d6df1613cfc8984.

    Merging this change closes tensorflow#23078

    PiperOrigin-RevId: 730911296

commit 10f7fe6
Author: Ilia Sergachev <[email protected]>
Date:   Tue Feb 25 09:05:55 2025 -0800

    PR tensorflow#22898: [GPU] GEMM fusion autotuner: dump unoptimized fusions before profiling them.

    Imported from GitHub PR openxla/xla#22898

    This helps debugging failures during profiling.
    Copybara import of the project:

    --
    e63f7865126281a7eb5b410394424826275037a8 by Ilia Sergachev <[email protected]>:

    [GPU] GEMM fusion autotuner: dump unoptimized fusions before profiling them.

    This helps debugging failures during profiling.

    Merging this change closes tensorflow#22898

    PiperOrigin-RevId: 730909003

commit ca77b1a
Author: Penporn Koanantakool <[email protected]>
Date:   Tue Feb 25 08:37:29 2025 -0800

    [xla:cpu:onednn] Support elementwise Add and Mul in oneDNN fusion thunk

    PiperOrigin-RevId: 730899327

commit c42688e
Author: Ilia Sergachev <[email protected]>
Date:   Tue Feb 25 08:23:14 2025 -0800

    PR tensorflow#23068: [GPU] Fix missing cuDNN symbols.

    Imported from GitHub PR openxla/xla#23068

    This fixes JAX builds with cuDNN 9.5.0+ after openxla/xla@65b4b88.
    Copybara import of the project:

    --
    3aa286e5a849e2187ef3d44c22c54d518dd168ec by Ilia Sergachev <[email protected]>:

    [GPU] Fix missing cuDNN symbols.

    Merging this change closes tensorflow#23068

    PiperOrigin-RevId: 730895063

commit 6b098f7
Author: Benjamin Kramer <[email protected]>
Date:   Tue Feb 25 07:59:38 2025 -0800

    Integrate LLVM at llvm/llvm-project@d23da7d6300e

    Updates LLVM usage to match
    [d23da7d6300e](llvm/llvm-project@d23da7d6300e)

    PiperOrigin-RevId: 730887012

commit 847b2df
Author: Aliia Khasanova <[email protected]>
Date:   Tue Feb 25 07:55:28 2025 -0800

    [XLA:GPU] Reset `CodedInputStream` after parsing each literal in the serialization of large snapshots.

    `CodedInputStream` has an internal int32 counter for total bytes read, limiting the bytes read by a single instance to 2 GiB.
    I've changed the deserialization implementation to parse each literal with a separate `CodedInputStream`. This fix still limits the *size of each literal* to 2 GiB.

    PiperOrigin-RevId: 730885881

commit 366d129
Author: Alexander Lyashuk <[email protected]>
Date:   Tue Feb 25 07:51:52 2025 -0800

    [XLA] Preserve AUTO layout when converting from HLO to StableHLO

    In HLO, AUTO layout is encoded as missing layout in `entry_computation_layout`.

    In StableHLO, it's marked using `mhlo.layout_mode = "auto"` attribute of the main@ function argument or return value.

    PiperOrigin-RevId: 730884950

commit c8f3847
Author: Mohammed Anany <[email protected]>
Date:   Tue Feb 25 07:32:02 2025 -0800

    [XLA:GPU/TMA] Adding verification for triton_xla ops and custom type.

    PiperOrigin-RevId: 730879041

commit 50054d5
Author: Tori Baker <[email protected]>
Date:   Tue Feb 25 06:58:03 2025 -0800

    [xla:gpu:triton] Create tma_utils with functions & tests that help with emitting TMA through triton. (see child cl to see how most of these get used).

    This also helps to isolate TMA that can be used in other places.

    PiperOrigin-RevId: 730869356

commit 1b776c9
Author: Oleg Shyshkov <[email protected]>
Date:   Tue Feb 25 06:49:57 2025 -0800

    [XLA:GPU] Init output data with -1.

    Makes it easier to detect cases when we overwrite data out of the update range.

    PiperOrigin-RevId: 730867242

commit 522b1b9
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 05:47:53 2025 -0800

    Upgrade Bazel to 7.4.1

    PiperOrigin-RevId: 730848636

commit 41cc4b5
Author: Goran Flegar <[email protected]>
Date:   Tue Feb 25 05:09:33 2025 -0800

    Log which "test case" we are running in TritonAndBlasSupport... Regular2DDot

    It is not quite ideal that we have a test that in effect consists of several test-cases, since it's difficult to figure out which one failed when one of them crashes.

    I do understand the idea that we want an easy to see support matrix, and splitting it up into individual tests would prevent us from doing that.

    As a middle ground, adding some logging so it's easy to tell what failed from the log.

    PiperOrigin-RevId: 730839144

commit 450341f
Author: Henning Becker <[email protected]>
Date:   Tue Feb 25 03:26:41 2025 -0800

    [XLA] Remove the `device_util` build rule from XLA

    The header file for this build rule doesn't exist anymore.

    PiperOrigin-RevId: 730812030

commit e41890c
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 01:03:01 2025 -0800

    Update GraphDef version to 2149.

    PiperOrigin-RevId: 730773101

commit aed230f
Author: A. Unique TensorFlower <[email protected]>
Date:   Tue Feb 25 01:02:52 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-25

    PiperOrigin-RevId: 730773034

commit 581787c
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 23:58:42 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 730754505

commit b685cdc
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 23:40:42 2025 -0800

    update SafeDivide() function to reference the correct lib from tsl::profiler
    Internal change

    PiperOrigin-RevId: 730749825

commit 4902208
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 23:03:19 2025 -0800

    Fixes sub key generation for the stacked variable.

    PiperOrigin-RevId: 730740034

commit 205f198
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 22:44:09 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 730734324

commit 7a51c9f
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 22:32:07 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 730731111

commit 023d8cc
Author: Yin Zhang <[email protected]>
Date:   Mon Feb 24 22:15:51 2025 -0800

    Switch from tsl::Mutex to absl::Mutex

    PiperOrigin-RevId: 730727424

commit c349c84
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 22:12:40 2025 -0800

    Internal change for visibility

    PiperOrigin-RevId: 730726643

commit 0105096
Author: Gunhyun Park <[email protected]>
Date:   Mon Feb 24 21:40:29 2025 -0800

    Bump the priority of CHLO->MHLO ragged dot pass to highest.

    PiperOrigin-RevId: 730719320

commit 52f1cfe
Author: Ezekiel Calubaquib <[email protected]>
Date:   Mon Feb 24 21:08:58 2025 -0800

    Fix duplicate error in LiteRT by replacing tensorflow.lite with tflite.python.lite

    PiperOrigin-RevId: 730711746

commit b482271
Author: Gunhyun Park <[email protected]>
Date:   Mon Feb 24 20:52:27 2025 -0800

    Integrate StableHLO at openxla/stablehlo@5e9b356b

    PiperOrigin-RevId: 730707859

commit 3449eea
Author: Alexander Pivovarov <[email protected]>
Date:   Mon Feb 24 19:15:45 2025 -0800

    PR tensorflow#22930: Initialize num_slices_ to 0 in Heap Simulator

    Imported from GitHub PR openxla/xla#22930

    Ensure `num_slices_` class member is explicitly initialized to 0 in `SliceTimeAllPermutationIterator` and `SliceTimePreferredPermutationIterator` to prevent potential uninitialized variable issues.
    Copybara import of the project:

    --
    53a76f188330d4e72171e3b5349e79aafa68132c by Alexander Pivovarov <[email protected]>:

    Initialize num_slices_ to 0 in Heap Simulator

    Merging this change closes tensorflow#22930

    PiperOrigin-RevId: 730686675

commit dc6c496
Author: Alexander Pivovarov <[email protected]>
Date:   Mon Feb 24 19:12:40 2025 -0800

    PR tensorflow#22953: Fix const qualifier on status prevents automatic move semantics

    Imported from GitHub PR openxla/xla#22953

    reason for change - const qualifier on `status` prevents automatic move semantics in return.

    When return status; is executed, the compiler cannot invoke the move constructor of `absl::Status` because status is const.
    Copybara import of the project:

    --
    b1722312a9e697d9e55d8758eb1c083005fefcda by Alexander Pivovarov <[email protected]>:

    Fix const qualifier on status prevents automatic move semantics

    Merging this change closes tensorflow#22953

    PiperOrigin-RevId: 730686035

commit bbbc58a
Author: Yunlong Liu <[email protected]>
Date:   Mon Feb 24 18:47:56 2025 -0800

    PR tensorflow#22956: vlog device id in while_thunk.

    Imported from GitHub PR openxla/xla#22956

    Copybara import of the project:

    --
    d4623150b29e8c3960a1839c3da2234eae71adac by Yunlong Liu <[email protected]>:

    vlog device id in while_thunk.

    Merging this change closes tensorflow#22956

    PiperOrigin-RevId: 730681273

commit 564b4a1
Author: Eugene Zhulenev <[email protected]>
Date:   Mon Feb 24 18:13:17 2025 -0800

    [xla:cpu] InProcessCommunicator: compute collective operations in parallel using all ranks

    PiperOrigin-RevId: 730672408

commit c90652e
Author: Yin Zhang <[email protected]>
Date:   Mon Feb 24 17:47:08 2025 -0800

    Migrate callers from tensorflow::profiler math_utils to tsl/profiler/utils/math_utils.h. No functional changes expected.

    PiperOrigin-RevId: 730665145

commit 8cf7713
Author: Luke Boyer <[email protected]>
Date:   Mon Feb 24 17:41:38 2025 -0800

    Add a flatbuffer util (python) function for getting the builtin options as a given type.

    PiperOrigin-RevId: 730663554

commit f19575d
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 17:00:51 2025 -0800

    Ensure that fetch window is never bigger than the full trace duration.

    PiperOrigin-RevId: 730649115

commit 6e3ec2a
Author: Luke Boyer <[email protected]>
Date:   Mon Feb 24 17:00:26 2025 -0800

    Add flag (only on flatbuffer export tool) to disable buffer sharing in flatbuffer.

    Some downstream tools won't support this.

    PiperOrigin-RevId: 730648967

commit 988ab99
Author: Luke Boyer <[email protected]>
Date:   Mon Feb 24 16:44:49 2025 -0800

    Integrate the compiler flags into the tooling

    PiperOrigin-RevId: 730644395

commit f84cf9d
Author: Tom Natan <[email protected]>
Date:   Mon Feb 24 16:23:27 2025 -0800

    Build absl::string_view(data, length) (instead of StringRef::str) explicitly since the llvm::StringRef to absl::string_view converter is not (always?) available on
    Android.

    END_PUBLIC

    PiperOrigin-RevId: 730637186

commit f0061c7
Merge: d42e2d6 a47a227
Author: TensorFlower Gardener <[email protected]>
Date:   Mon Feb 24 15:42:54 2025 -0800

    Merge pull request tensorflow#87937 from jiunkaiy:dev/chuntl/revise_log

    PiperOrigin-RevId: 730617466

commit d42e2d6
Author: Alexander Pivovarov <[email protected]>
Date:   Mon Feb 24 15:05:10 2025 -0800

    PR tensorflow#22822: Fix ambiguous constructor call in SourceTargetPairs initialization

    Imported from GitHub PR openxla/xla#22822

    ### Description
    Resolve a build failure (with GCC-11) in `collective_permute_cycle_test` caused by an ambiguous constructor call when initializing `SourceTargetPairs` with an empty list (`{{}}`).

    #### Issue
    When calling `SourceTargetPairs({{}})`, the compiler could not determine whether to use the `std::vector<std::pair<int64_t, int64_t>>` constructor or the default copy/move constructors, leading to an error:
    ```
    xla/service/collective_permute_cycle_test.cc:130:48: error: call of overloaded 'SourceTargetPairs(<brace-enclosed initializer list>)' is ambiguous
      130 |   EXPECT_EQ(GetCycleType(SourceTargetPairs({{}})), CycleType::kNone);
    ```

    #### Solution
    1. Explicitly define an `initializer_list` constructor for `SourceTargetPairs` to properly handle `{}` and `{{src, tgt}}` initializations.
    2. Update the test case to use default ctor `SourceTargetPairs()` instead of `SourceTargetPairs({{}})`, ensuring clarity and correctness.

    This fix ensures proper initialization and eliminates ambiguity

    Tested with GCC-11
    Copybara import of the project:

    --
    f97c38d47c8373ec609fdfbaedff3856f123fc33 by Alexander Pivovarov <[email protected]>:

    Fix ambiguous constructor call in SourceTargetPairs initialization

    Merging this change closes tensorflow#22822

    PiperOrigin-RevId: 730610452

commit 64e4135
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 15:03:32 2025 -0800

    Change size_in_bytes argument type from int to size_t.

    Other uses of it are size_t, so this makes it consistent.

    PiperOrigin-RevId: 730609856

commit c0562df
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 14:53:13 2025 -0800

    Expose ExecutableBuildOptions::CompilationEnvironments::CreateFromProto to python
    Add a default TpuCompilationEnvironment to the wiz export

    PiperOrigin-RevId: 730606534

commit ce2bae2
Author: Chenguang Wang <[email protected]>
Date:   Mon Feb 24 14:50:30 2025 -0800

    Fix Android ARM64 build.

    The llvm::StringRef to absl::string_view converter is not (always?) available on
    Android, so inserting StringRef::str() calls where necessary.

    PiperOrigin-RevId: 730605410

commit dc4dbaf
Author: David Dunleavy <[email protected]>
Date:   Mon Feb 24 14:38:05 2025 -0800

    Remove `release` configs from XLA's version of the TensorFlow bazelrc except for MacOS

    PiperOrigin-RevId: 730600681

commit 015bab9
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 14:32:06 2025 -0800

    Make xla_test, etc, shuffle tests by default.

    This helps catch test order dependencies at presubmit time.

    PiperOrigin-RevId: 730598576

commit e25e378
Author: Oleg Shyshkov <[email protected]>
Date:   Mon Feb 24 14:14:13 2025 -0800

    [XLA:GPU] Give descriptive names to test case parameters.

    By default the parameterized test suites have numbers as names.

    PiperOrigin-RevId: 730592124

commit 10f8a18
Author: David Dunleavy <[email protected]>
Date:   Mon Feb 24 13:38:49 2025 -0800

    Remove iOS, Android, and `with_xla_support` configs from XLA's copy of the TensorFlow .bazelrc

    PiperOrigin-RevId: 730579046

commit d313af9
Author: Frederik Gossen <[email protected]>
Date:   Mon Feb 24 13:38:24 2025 -0800

    [XLA:GPU] Fix `HasCycle` function

    This is needed to avoid deadlocks when running maxtext with PP and FSDP.
    In this case, we see collective-permutes with multiple cycles, that were falsely categorized as acyclic.
    The result is a decomposed collective-permute issuing a cyclic recv leading into a deadlock.

    PiperOrigin-RevId: 730578883

commit dfae6d7
Author: Sandeep Dasgupta <[email protected]>
Date:   Mon Feb 24 13:37:31 2025 -0800

    Fix "ops w/o operand and followed by quant accidentally matching dq-op-q patter"

    PiperOrigin-RevId: 730578564

commit 4e2dfdb
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 12:50:50 2025 -0800

    Switch xla_test, etc to static linking within Google.

    Previously, we switched xla_test, etc to static linking to catch duplicate main() definitions at build time. We had to revert the change as it increased test binary sizes and broke Nvidia's build.

    In this second attempt, we make the change only for the Google internal build, so that external users aren't affected.

    PiperOrigin-RevId: 730561451

commit 5d8c1f9
Author: Julia Guo <[email protected]>
Date:   Mon Feb 24 12:44:01 2025 -0800

    [XLA] Use built-in environment variable to find paths

    PiperOrigin-RevId: 730558831

commit 1535c85
Author: Nitin Srinivasan <[email protected]>
Date:   Mon Feb 24 12:33:56 2025 -0800

    Move `immutabledict` install to the Dockerfile

    PiperOrigin-RevId: 730555599

commit e77316a
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 12:31:31 2025 -0800

    Reverts 52fc64b

    PiperOrigin-RevId: 730554805

commit 7587767
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 12:21:22 2025 -0800

    Use `addressable_devices_` instead of `devices_` in case of the multi-host environment.

    PiperOrigin-RevId: 730551286

commit d615e26
Author: Quentin Khan <[email protected]>
Date:   Mon Feb 24 11:49:38 2025 -0800

    #litert Add a automatically added accelerator compilation structure.

    This structure allows passing metadata that is generated during the model
    compilation onto accelerators when they alter the underlying runtime.

    PiperOrigin-RevId: 730538526

commit 3be10ae
Author: Farzin Houshmand <[email protected]>
Date:   Mon Feb 24 11:42:09 2025 -0800

    [XLA:MSA] Remove reference to internal names.

    PiperOrigin-RevId: 730535804

commit 4cbcaca
Author: Abhinav Gunjal <[email protected]>
Date:   Mon Feb 24 11:40:33 2025 -0800

    hlo/tools : move hlo tools tests to dedicated hlo/tools/tests directory.

    PiperOrigin-RevId: 730535112

commit f189fb0
Author: Luke Boyer <[email protected]>
Date:   Mon Feb 24 11:36:14 2025 -0800

    Add to/from string for compiler flags. Also move to compiler/plugin, this doesn't belong in vendor code.

    PiperOrigin-RevId: 730533416

commit 36b41f8
Author: Steeve Morin <[email protected]>
Date:   Mon Feb 24 11:34:20 2025 -0800

    Various MacOS QoL enhancements

    Part 1 of openxla/xla#16696

    PiperOrigin-RevId: 730532747

commit 0df96d2
Author: Ilia Sergachev <[email protected]>
Date:   Mon Feb 24 11:31:18 2025 -0800

    PR tensorflow#22292: [GPU] Support cuDNN explicit CUDA graph construction.

    Imported from GitHub PR openxla/xla#22292

    Some cuDNN graph engines now support explicit CUDA graph construction instead of stream capture. XLA will now switch between explicit construction and the already implemented stream capture accordingly.
    Copybara import of the project:

    --
    caf22d33e606a6b2ab00d14aa9082550515c404c by Ilia Sergachev <[email protected]>:

    [GPU] Support cuDNN explicit CUDA graph construction.

    Some cuDNN graph engines now support explicit CUDA graph construction
    instead of stream capture. XLA will now switch between explicit
    construction and the already implemented stream capture accordingly.

    --
    23bb1ea89959a10b90b7892196bec41621c9b093 by Ilia Sergachev <[email protected]>:

    Log graphs that don't support CUDA graph native API.

    --
    dd31aeab7edc21a39531817e96a6eecfb0d5b96f by Ilia Sergachev <[email protected]>:

    Skip the added test with old cuDNN versions.

    --
    eeafdbf5f61b111fa3285fb2cfcb65efc91c6b62 by Ilia Sergachev <[email protected]>:

    Address review comments.

    --
    c03beef9515c0198d6eb1518b10a483b6a1b9c41 by Ilia Sergachev <[email protected]>:

    Fix build errors.

    Merging this change closes tensorflow#22292

    PiperOrigin-RevId: 730531507

commit 386f7e6
Author: Shraiysh <[email protected]>
Date:   Mon Feb 24 11:16:39 2025 -0800

    PR tensorflow#22970: Fix bug in post order traversal of computation instructions

    Imported from GitHub PR openxla/xla#22970

    While creating post order traversal, an instruction may have a user outside the computation. This is the case when we are constructing new instructions to store in replacements for cloning the computation later. This user should be ignored. Added test for the same.

    Because of this, functions like `ToString()`, and
    `GetUniqueGteInstruction()` encounter errors. They rely on post-order traversal to have all the instructions.
    Copybara import of the project:

    --
    326469b7cab50e90616094dffe36758afef815e1 by Shraiysh Vaishay <[email protected]>:

    Fix bug in post order traversal of computation instructions

    While creating post order traversal, an instruction may have a user
    outside the computation. This is the case when we are constructing
    new instructions to store in replacements for cloning the computation
    later. This user should be ignored. Added test for the same.

    Because of this, functions like `ToString()`, and
    `GetUniqueGteInstruction()` encounter errors. They rely on post-order
    traversal to have all the instructions.

    Merging this change closes tensorflow#22970

    PiperOrigin-RevId: 730525630

commit 0171f72
Author: Yang Chen <[email protected]>
Date:   Mon Feb 24 10:42:20 2025 -0800

    Cleanup: Fix includes.

    PiperOrigin-RevId: 730511326

commit c70f83a
Author: Yang Chen <[email protected]>
Date:   Mon Feb 24 10:38:55 2025 -0800

    Cleanup: Fix includes.

    PiperOrigin-RevId: 730509796

commit 3fd4e66
Author: Yang Chen <[email protected]>
Date:   Mon Feb 24 10:35:22 2025 -0800

    Cleanup: Fix includes.

    PiperOrigin-RevId: 730508090

commit c536176
Author: Yang Chen <[email protected]>
Date:   Mon Feb 24 10:35:11 2025 -0800

    Cleanup: Fix includes.

    PiperOrigin-RevId: 730507999

commit c80f582
Merge: fd6bd5a e9009ce
Author: TensorFlower Gardener <[email protected]>
Date:   Mon Feb 24 10:49:01 2025 -0800

    Merge pull request tensorflow#83372 from cybersupersoap:transpose-crash-fix

    PiperOrigin-RevId: 730504254

commit fd6bd5a
Author: Michael Whittaker <[email protected]>
Date:   Mon Feb 24 10:19:16 2025 -0800

    Added incarnation to `GetTaskState` RPC in coordination service.

    PiperOrigin-RevId: 730501913

commit ea61820
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 10:17:00 2025 -0800

    Make xla_cc_test default to shuffling test cases.

    This helps catch test case order dependencies at presubmit time.

    PiperOrigin-RevId: 730500863

commit ed0d218
Author: Michael Whittaker <[email protected]>
Date:   Mon Feb 24 10:00:15 2025 -0800

    Don't run CUDA test with msan.

    PiperOrigin-RevId: 730493506

commit 9f94996
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 09:35:21 2025 -0800

    Adds visibility restriction to some XLA bzl files to prevent them from being used outside of XLA, as they are internal implementation details.

    This CL is not complete. It's the first step that establishes the mechanism. Once I get buy-in on the approach, I'll follow up with more CLs to add visibility restriction to the other XLA bazl files.

    PiperOrigin-RevId: 730484507

commit 200be96
Author: Won Jong Jeon <[email protected]>
Date:   Mon Feb 24 09:37:37 2025 -0800

    [mlir][tosa] Update Tensorflow to match TOSA v1.0 specification (part 3) (tensorflow#87273)

    * [mlir][tosa] Change 'shape' attribute of RESHAPE operator to become an input

    including minor change from:
    Slice the input of kernel based ops to the actual used size

    Change-Id: Ifebe0d1b3459300df0fa2edc9ba24a867caec3d3

    Signed-off-by: Won Jeon <[email protected]>
    Change-Id: I938503349f38b64db5e77a01c3a7b2bb33e8f041

    * [Tosa] Refactor QuantizationAttr

    changes due to removal of quantization attr in TOSA dialect
    and due to name changes in while_loop region names

    Signed-off-by: Tai Ly <[email protected]>
    Change-Id: I09533bffcd8e2179505c7e11e1320b673266585d

    * [mlir][tosa] ClampOp attributes changes

    This patch implements changes required by Tosa ClampOp's
    new min_val/max_val attributes

    including clamp_max update code from:

    commit 04055fa510522af659aa56bac3b4796961131546
    Author: Thibaut Goetghebuer-Planchon <[email protected]>
    Date:   Thu Sep 21 17:22:14 2023 +0000

        [TOSA] During quantized ReLU legalization, limit the clamp_max attribute to the max value of the quantized type

        Change-Id: I781229be0eb86ecb3cf1a305ede98ad630e5bcfd

    Signed-off-by: Tai Ly <[email protected]>
    Change-Id: I25ba0d077fa44d4c384ab094a6070a4743383414

    * [TOSA] Calculate unknown reshape dimension when input is static

    This commit updates the reshape legalization to calculate static
    shape and output type when a static input shape is provided and
    only one dimension is unknown.

    Change-Id: I0843549b47131b0852fbf375f00846b1fcbe8bc6
    Signed-off-by: Luke Hutton <[email protected]>

    * [TOSA] Numerical mismatch on tfl.transpose_conv layer

    * Legalization now handles cases where the layer has a bias

    Author: Tom Allsop <[email protected]>
    Change-Id: Ie3ba38644d1cf8e5d6f71271e8bb6f1b5636f406

    * [mlir][tosa] Change resize attrs to inputs

    This patch implements changes required by Tosa resize op's
    scale/offset/border changing from attributes to inputs.

    Signed-off-by: Tai Ly <[email protected]>
    Change-Id: I9a4319ac53298c25568fc651e249528b9a9477fc

    * [mlir][tosa] Update LIT tests

    Combination of test file updates from the following commits:
    * [mlir][tosa] Change 'shape' attribute of RESHAPE operator to become an input
    * [mlir][tosa] Switch zero point of convolutions to input variable type
    * [Tosa] Refactor QuantizationAttr
    * [TOSA] During quantized ReLU legalization, limit the clamp_max attribute to the ma
    x value of the quantized type
    * [mlir][tosa] ClampOp attributes changes
    * [TOSA] Calculate unknown reshape dimension when input is static
    * [TOSA] Numerical mismatch on tfl.transpose_conv layer
    * [mlir][tosa] Change resize attrs to inputs

    Co-authored-by: Tai Ly <[email protected]>
    Co-authored-by: Thibaut Goetghebuer-Planchon <[email protected]>
    Co-authored-by: Luke Hutton <[email protected]>
    Co-authored-by: Tom Allsop <[email protected]>

    Signed-off-by: Won Jeon <[email protected]>
    Change-Id: Ia5731e659d262c74374e8326d49beccf6a60032e

    ---------

    Signed-off-by: Won Jeon <[email protected]>
    Signed-off-by: Tai Ly <[email protected]>
    Signed-off-by: Luke Hutton <[email protected]>

commit b3a79af
Author: Julia Guo <[email protected]>
Date:   Mon Feb 24 09:15:18 2025 -0800

    Fix cpu/gpu benchmarks github workflows to run on steps correctly.

    PiperOrigin-RevId: 730477678

commit 1fe5433
Author: Emily Fertig <[email protected]>
Date:   Mon Feb 24 08:52:51 2025 -0800

    Plumb layout through the creation of PjRtArrays.

    This is in preparation to support arrays with no local shards, so that layout may not be accessible from a buffer.

    PiperOrigin-RevId: 730469597

commit f01ad0b
Author: Bart Chrzaszcz <[email protected]>
Date:   Mon Feb 24 07:36:37 2025 -0800

    #sdy Make XLA changes to support JAX export.

    - Shardy isn't serializable yet with StableHLO, so we need to expose the `SdyRoundTripExportPipeline` to JAX to remove the dialect before serializing.
    - Pass an option to `refine_polymoprhic_shapes` if shardy is enabled as we need to undo `SdyRoundTripExportPipeline` through importing again with `SdyRoundTripImportPipeline`
    - Add `is_tile_maximal` as a nanobind python binding for `OpSharding`

    PiperOrigin-RevId: 730445364

commit fb32129
Author: Emilio Cota <[email protected]>
Date:   Mon Feb 24 07:25:48 2025 -0800

    [xla:emitters] tag XLA, XLA:CPU and XLA:GPU dialects as non-prod-compatible

    This paves the way for XLA:CPU fusion emitters.

    Note that XLA:CPU is non-prod-compatible, whereas XLA:GPU is
    not. The CPU fusion emitters will depend on the XLA, XLA:CPU
    and XLA:GPU dialects, and given that the emitters' dependents
    in XLA:CPU are non-prod-compatible, the three dialects have
    to be as well.

    XLA:CPU passes also have to be tagged. Crucially, thanks to
    the parent CLs, XLA:GPU passes are not used anymore by any of
    the above dialects nor by XLA:CPU passes, so XLA:GPU remains
    essentially untouched; we just tag the XLA:GPU dialect.

    Some common libraries in xla/codegen/emitters are also tagged.

    PiperOrigin-RevId: 730442237

commit a7aaad7
Author: Alexander Belyaev <[email protected]>
Date:   Mon Feb 24 07:06:50 2025 -0800

    [XLA:GPU][Emitters] Restrict the inliner.

    Inline only if there are more than 1 call to the callee in the caller.

    Background: jax-ml/jax#26162 contains an example of a MoF fusion that takes forever to compile.

    The [indexing-based partitioner](openxla/xla@44bc816) in combination with this change fixes the issue.

    PiperOrigin-RevId: 730436982

commit 3ab2013
Author: Benjamin Kramer <[email protected]>
Date:   Mon Feb 24 06:57:46 2025 -0800

    Integrate LLVM at llvm/llvm-project@c80b99d98ad0

    Updates LLVM usage to match
    [c80b99d98ad0](llvm/llvm-project@c80b99d98ad0)

    PiperOrigin-RevId: 730434387

commit 68abab9
Author: Alexander Belyaev <[email protected]>
Date:   Mon Feb 24 06:38:26 2025 -0800

    [XLA:GPU][TMA] Add an alias for TmaDescriptorAttr.

    PiperOrigin-RevId: 730429137

commit 4750f67
Author: Quentin Khan <[email protected]>
Date:   Mon Feb 24 04:14:51 2025 -0800

    Add missing newline in `accelerator.h`

    PiperOrigin-RevId: 730391255

commit a47a227
Author: chuntl <[email protected]>
Date:   Thu Feb 20 18:04:47 2025 +0800

    Qualcomm AI Engine Direct - Add log utils for core module

    Summary:
    - Implement default and android version of log utils for core module
    - Add test for log util
    - Use LogOff as default log level
    - Unify to use log util in core module

commit 5ee65d4
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 04:14:40 2025 -0800

    Avoids Segmentation fault when dispatcher library is not found

    PiperOrigin-RevId: 730391202

commit 46ed7f6
Author: Fergus Henderson <[email protected]>
Date:   Mon Feb 24 03:32:37 2025 -0800

    Some minor polishing of the release docs for 2.19.

    1. Fix indentation.  The indentation of the first three bullet points in the markdown sources did not match the indentation of the fourth and fifth bullet points, nor of the bullet points further below.

    2. Wrap some long lines in the markdown sources, in particular where there were some
    lines wrapped but others not wrapped in the same bullet point list.

    3. Use "Python API" rather than "Interpreter" as the subheading for changes
    affecting the `tf.lite.Interpreter` Python class, for consistency with the earlier
    heading "C++ API" in the same bullet point list.

    PiperOrigin-RevId: 730380377

commit c699ef3
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 03:21:38 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 730377762

commit 2c045ad
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 03:09:13 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 730374491

commit b375fd2
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 02:56:50 2025 -0800

    Adds LITERT_FATAL to logging

    PiperOrigin-RevId: 730371384

commit 474d368
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 02:51:36 2025 -0800

    [XLA] Clean up the implementation for broadcast sinking past elementwise ops and add a test.

    This is a pure refactoring - no functional changes.

    PiperOrigin-RevId: 730369875

commit 8286168
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 02:28:59 2025 -0800

    Fix invalid pointer in environment_options

    PiperOrigin-RevId: 730363601

commit 121ddb6
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 01:02:36 2025 -0800

    Update GraphDef version to 2148.

    PiperOrigin-RevId: 730338341

commit 23beb26
Author: A. Unique TensorFlower <[email protected]>
Date:   Mon Feb 24 01:02:28 2025 -0800

    compat: Update forward compatibility horizon to 2025-02-24

    PiperOrigin-RevId: 730338320

commit 8a907d6
Author: Shraiysh <[email protected]>
Date:   Mon Feb 24 00:56:14 2025 -0800

    PR tensorflow#22614: Fix hlo_opt printing of Hlo module

    Imported from GitHub PR openxla/xla#22614

    The tool `hlo-opt` was not honoring the debug options within the HloModule while printing the HloModule.

    These options should be honored by the default printing of the HloModule as they are a part of the same HloModule. Fixed the print method to do this. This should now be reflected in all the tools using these debug options.
    Copybara import of the project:

    --
    a22584a819a0fc6ee8f41b4c50f4f8d68a6a2184 by Shraiysh Vaishay <[email protected]>:

    Fix hlo_opt printing of Hlo module

    The tool `hlo-opt` was not honoring the debug options within the HloModule while printing the HloModule.

    These options should be honored by the default printing of the HloModule as they are a part of the same HloModule. Fixed the print method to do this. This should now be reflected in all the tools using these debug options.

    --
    b42178b4da3fd5f81fc2d50346cb2f9b18153ab5 by Shraiysh Vaishay <[email protected]>:

    Rebase and avoid edits to testcases.

    --
    51cdfbfa355efe34936073fd68d4e19191131bb7 by Shraiysh Vaishay <[email protected]>:

    Addressed failing test

    Merging this change closes tensorflow#22614

    PiperOrigin-RevId: 730336982

commit 5431408
Author: Eugene Zhulenev <[email protected]>
Date:   Mon Feb 24 00:32:19 2025 -0800

    [xla:cpu] Align KernelArgs to enable aligned moves on a hot path

    ```
    name                                     old cpu/op   new cpu/op   delta
    BM_SelectAndScatterF32/128/process_time   318µs ± 2%   306µs ± 2%  -3.62%  (p=0.000 n=38+38)
    BM_SelectAndScatterF32/256/process_time  1.28ms ± 1%  1.23ms ± 2%  -4.24%  (p=0.000 n=39+35)
    BM_SelectAndScatterF32/512/process_time  5.75ms ± 2%  5.57ms ± 2%  -3.06%  (p=0.000 n=35+36)

    name                                     old time/op          new time/op          delta
    BM_SelectAndScatterF32/128/process_time   318µs ± 2%           307µs ± 2%  -3.66%  (p=0.000 n=38+40)
    BM_SelectAndScatterF32/256/process_time  1.28ms ± 1%          1.23ms ± 2%  -4.19%  (p=0.000 n=39+37)
    BM_SelectAndScatterF32/512/process_time  5.39ms ± 1%          5.21ms ± 2%  -3.41%  (p=0.000 n=38+38)
    ```

    PiperOrigin-RevId: 730330680

commit 99a4f2c
Author: Zixuan Jiang <[email protected]>
Date:   Sun Feb 23 22:56:26 2025 -0800

    Move the sharding axes from dimensions that need replication to batch dimensions, such that we replace an `all-gather` with an `all-to-all`.

    Given the following input
    ```
    ENTRY entry {
      %param0 = f32[14,257] parameter(0), sharding={devices=[1,2]0,1}
      %param1 = f32[14,116] parameter(1), sharding={devices=[1,2]0,1}
      ROOT %concatenate = f32[14,373] concatenate(%param0, %param1),
        dimensions={1}, sharding={devices=[1,2]0,1}
    }
    ```

    Previously, we (1) replicate the input along the concat dimension, (2) apply concat, (3) partition the result with dynamic-slice. With this change, we (1) use all-to-all to move sharding axis from the concat dim to batch dim for operands, (2) apply concat, and then (3) use all-to-all to reshard the result.

    Reverts 81b0a48

    PiperOrigin-RevId: 730308137

commit 0f8d58f
Author: A. Unique TensorFlower <[email protected]>
Date:   Sun Feb 23 22:23:36 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 730300711

commit 65d9195
Author: A. Unique TensorFlower <[email protected]>
Date:   Sun Feb 23 21:52:10 2025 -0800

    Automated Code Change

    PiperOrigin-RevId: 730294118

commit e9009ce
Author: Assoap <[email protected]>
Date:   Thu Dec 19 23:33:18 2024 +0800

    Fix crash of transpose
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes OS-windows type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

5 participants
@zooba @runn @tomasr8 @picnixz and others