On the GPU backend, we have the following case (modified: scalar: register shared cross simd lane; vector: register per simd lane). The MIR before phi-node-elimination:
bb.7:
%204:scalar = PHI %182:scalar, %bb.6, %203:vector, %bb.7
%203:vector= ...
(divergent-cond) goto %bb.7
bb.8:
%209:vector = PHI %182:scalar, %bb.6, %203:vector, %bb.7
The MIR after phi-node-elimination:
bb.7:
%204:scalar = COPY killed %376:scalar
%203:vector = ...
%376:scalar = COPY killed %203:vector
(divergent-cond) goto %bb.7
bb.8:
%209:vector = COPY killed %376:scalar
The temporary scalar register %376:scalar is used as the temp for both PHIs due to PHIElimination's reuse optimization. However, the copy in bb.8 isn't correct as it should use a vector register that is assigned inside the loop. (This is derived from a case that var is uniform inside a cycle, but non-uniform outside the cycle.)
The solution proposed is to add a TargetInstrInfo hook to let target to check if reusing PHI is allowed.