Regularize support for Java types/references #432

mcimadamore · 2025-05-21T10:00:12Z

This PR regularizes the support for Java types in the code model. There are two competing constraints to balance: on the one hand, the Java types should be readable from humans when inspecting the textual form of code model; on the other hand, we don't want to bake in Java-specific type parsing in the core code model API.

Unfortunately, the current implementation scores poorly on both aspects, as (a) textual Java types can become very hard to read (e.g. .<.<NewTest, NewTest$B>, NewTest$B$C>), and (b) the parsing logic (defined in DescParser) contains several ad-hoc extensions that are only there to support Java types.

To address this, this PR introduces two different kinds of externalized type forms: inflated type forms and flattened type forms. An inflated type form closely follow the structure of the JavaType or JavaRef it models. For instance, the Java type Foo<? extends Bar> is modelled as follows:

java.type.class(Foo, java.type.primitive(void),
                  java.type.wildcard(EXTENDS,
                                     java.type.class(Bar, java.type.primitive(void))))

Inflated type forms can be flattened (using JavaTypeUtils::flatten), to derive a form that is more suitable for humans. For instance, the above inflated form is flattened as follows:

java.type:"Foo<? extends Bar>"

Conversely, flattened type forms can be inflated back (using JavaTypeUtils::inflate) to their original inflated form.

This distinction between inflated and flattened forms allow us to massively simplify DescParser -- which no longer has to worries about the syntactic vagaries of Java types. All that complexity is now pushed onto the flattened type forms support. The goal is to progressively make flattened Java type forms an "implementation detail" (more on that below).

To accommodate flattened and inflated type forms, two changes are needed:

in OpWriter we need to flatten an inflated type form (if possible) before writing it out as a string (this allows for the textual form of a code model to be more readable)
in OpParser we need to inflate a flattened type form (if possible) before handing the type form back to the rest of the parsing machinery (which will e.g. invoke the type factory on such inflated type form)

All Java types and references now follow this pattern:

the externalize method returns an inflated type form
the toString method returns a readable string (the one associated with the corresponding flat type form)
the ofString factory parses the flat string into an inflated form, then turn that inflated form into either a JavaType or JavaRef, as needed

Note that (3) is the only reason as to why the flattened type forms surface into the API (since the factories are public, we would need to specify what strings they accept). Crucially, these factories are only used when parsing op attributes -- where Java types and refs are just modeled as plain flat strings -- e.g.:

%6 : java.type:"int" = invoke %5 @"java.lang.Object::hashCode():int";

Note how the Java method reference after @ is encoded as a plain string, which is then parser by the InvokeOp factory using MethodRef::ofString.

Implementation

This PR moves all the support for inflated and flattened Java type forms into a single class, namely JavaTypeUtils. This class contains factories to create the various inflated forms, as well as method to parse them, to turn them into flat strings, or to turn them into JavaType or JavaRef objects.

The code in this class is a bit tedious, as I wanted to express all the functionality we needed in terms of externalized type forms. This allows us to reduce the impact of Java types on the rest of the code model (e.g. the changes to OpParser and OpWriter are literally one liners -- at the cost of making the implementation of some of these methods slightly more convoluted.

For instance, it would have been easier to define a mapping from JavaType into a flat string (as we could leverage the toString method on the various JavaType subclasses). But doing so would create some issues, as now OpWriter would need to sometimes call externalize on the type, sometimes call toString on it -- similar idiosyncrasies would show up on the OpParser side.

Future work

The next step in this journey (not addressed by this PR) would be to parse Java type/ref attributes in a more structural fashion, so that we can let go of the ofString factories.

And, after that, another possible improvement would be to augment and generalize the grammar supported by DescParser to cover more interesting structural forms -- something like:

node:
    ident '<' node* '>'
    literal

This would allow us to model all externalized type forms as more general nodes -- but will also allow us to model other kinds of interesting structure more directly, such as location information, Java literals and, possibly, even Java annotations.

Progress

Change must not contain extraneous whitespace

Reviewers

Paul Sandoz (@PaulSandoz - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/432/head:pull/432
$ git checkout pull/432

Update a local copy of the PR:
$ git checkout pull/432
$ git pull https://git.openjdk.org/babylon.git pull/432/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 432

View PR using the GUI difftool:
$ git pr show -t 432

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/432.diff

Using Webrev

Link to Webrev Comment

…nalized type

mcimadamore · 2025-05-21T10:01:02Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/TypeElement.java

@@ -45,15 +45,6 @@ static String toString(ExternalizedTypeElement t) {
                return t.identifier;
            }

-            // Unpack array-like identifier [+


This code was removed, as it was specific to Java types

bridgekeeper · 2025-05-21T10:01:11Z

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into code-reflection will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-05-21T10:01:46Z

@mcimadamore This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Regularize support for Java types/references

Reviewed-by: psandoz

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the code-reflection branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the code-reflection branch, type /integrate in a new comment.

mcimadamore · 2025-05-21T10:03:05Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/parser/OpParser.java

@@ -121,8 +124,7 @@
 */
 public final class OpParser {

-    static final TypeElement.ExternalizedTypeElement VOID =
-            new TypeElement.ExternalizedTypeElement("void", List.of());
+    static final TypeElement.ExternalizedTypeElement VOID = JavaType.VOID.externalize();


This is a workaround: the parser uses a private void type -- but such a type is not handled by either the core type factory or the Java type factory. So injecting such type into the parsed model results into issues.

mcimadamore · 2025-05-21T10:04:21Z

test/langtools/tools/javac/reflect/CodeReflectionTester.java

@@ -61,6 +69,28 @@ static void error(String msg, Object... args) {
        System.err.println("error: " + String.format(msg, args));
    }

+    static void checkModel(Member member, String found, IR ir) {


This new method allows the harness to print the model we found on the compiled class even in the face of an error when parsing the @IR annotation. This is very handy when bulk-updating the tests, as it means that parser errors will cause the harness to just dump the "expected" model for all the methods in the class (instead of throwing on the first parser error).

mcimadamore · 2025-05-21T10:05:28Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/writer/OpWriter.java

@@ -557,7 +558,7 @@ <T> void writeSeparatedList(String separator, Iterable<T> l, Consumer<T> c) {
    }

    void writeType(TypeElement te) {
-        write(te.externalize().toString());
+        write(JavaTypeUtils.flatten(te.externalize()).toString());


When writing a type element into textual form, flatten it first (to make it more readable if it's a Java type).

mcimadamore · 2025-05-21T10:06:28Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/parser/OpParser.java

@@ -601,7 +603,7 @@ String parseName() {
    }

    TypeElement.ExternalizedTypeElement parseExTypeElem() {
-        return DescParser.parseExTypeElem(lexer);
+        return JavaTypeUtils.inflate(DescParser.parseExTypeElem(lexer));


When parsing an externalized type form, inflate it fist (from its readable form, if it's a Java type).

mcimadamore · 2025-05-21T10:08:13Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/parser/impl/DescParser.java

-        return parseRecordTypeRef(s);
-    }
-
+    //    ExType:


Note that the grammar here is very simple, as we no longer have to have support for Java types in here. An externalized type form is a type identifier, followed by zero or more externalized type arguments. An identifier is a qualified identifier -- each part in the identifier can be either string literals or identifiers. Various parts can be separated either with . or :.

mcimadamore · 2025-05-21T10:10:26Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/type/ArrayType.java

-            current = at.componentType();
-        }
-        return new ExternalizedTypeElement("[".repeat(dims), List.of(current.externalize()));
+        return JavaTypeUtils.arrayType(componentType.externalize());


This pattern is common to all types and refs:

the externalize method returns an inflated externalized form built with the corresponding factory in JavaTypeUtils

the toString method returns a flat string, obtained with the corresponding method in JavaTypeUtils

the ofString factory parses a flat string and turns it into an inflated form, which is then parsed into a JavaType or JavaRef, using corresponding methods in JavaTypeUtils

mcimadamore · 2025-05-21T10:52:35Z

test/langtools/tools/javac/reflect/ForLoopTest.java

+            func @"test1" (%0 : java.type:"ForLoopTest", %1 : java.type:"java.util.List<java.util.List<java.lang.String>>")java.type:"void" -> {
+                %2 : Var<java.type:"java.util.List<java.util.List<java.lang.String>>"> = var %1 @"ll";
+                java.enhancedFor
+                    ()java.type:"java.util.List<java.util.List<java.lang.String>>" -> {


Here and elsewhere -- it seems like block names like ^expr or ^def are no longer generated. But this PR doesn't change that (in fact, I tried creating the model for a simple for loop without these changes, and I still don't see the body names).

I believe this has to do with the fact that e.g. for ops are composed of bodies, and each body in the op has a single block (thus, it is the entry block for that body). And, entry block names are skipped.

The test was probably still working fine (despite the mismatch) because we parse the IR string, then generate the textual form again before comparing it -- this will drop the entry block names -- meaning the "expected" IR will match the one generated by javac.

mcimadamore · 2025-05-21T11:30:40Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/type/impl/JavaTypeUtils.java

+        return ptypes;
+    }
+
+    //    JavaRef:


Note: the grammar of ref and types is partially overlapping -- for this reason it's slightly easier to keep the two parsing methods separate. If we wanted to unify more, we could require all refs to be prefixed with a & which would simplify things a bit (and also be compatible with what we do to disambiguate between class and method type-variable owner).

Also, note: the grammar for method ref changed, as this now requires the type separator : to be used between the params and the return type (in a way that is more similar to what's done for field refs). If we don't want to have this incompatibility we can roll back.

mcimadamore · 2025-05-21T11:32:46Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/type/CoreTypeFactory.java

-                    return t;
-                }
-            }
+            return JavaTypeUtils.toJavaType(tree);


Note how parsing an inflated type form is now done entirely inside JavaTypeUtils.

I'm not entirely sure whether the Java type factory should support both Java type and reference inflated forms. For now I went with just types -- but if we want to push this further (e.g. to cover also op attributes) we would need the factory to support both types and refs (which is a trivial change to do).

mlbridge · 2025-05-21T11:37:53Z

Webrevs

mcimadamore · 2025-05-21T15:20:30Z

The next step in this journey (not addressed by this PR) would be to parse Java type/ref attributes in a more structural fashion, so that we can let go of the ofString factories.

The patch below contains a sketch of how we might go in that direction:

https://github.com/mcimadamore/babylon/compare/externalized_Java_type_cleanup...mcimadamore:babylon:remove_of_string_type_factories?expand=1

The fact the I was able to cook it in an hour or so makes me confident about the general direction set by this PR.

mcimadamore · 2025-05-23T10:20:57Z

cr-examples/triton/src/main/java/oracle/code/triton/MLIRGenerator.java

@@ -164,7 +164,13 @@ public static String mapType(String type) {
            Map<Integer, String> idxToOp = new HashMap<>();

            for (int i = 0; i < type.length(); ++i) {
-                if (i + 3 < type.length() && type.substring(i, i + 3).equals("ptr")) {
+                if (i + 19 < type.length() && type.substring(i, i + 19).equals("java.type.primitive")) {


I had to "tolerate" the new inflated external forms here (by skipping them). The logic here seems to assume that both class and primitive types appear "naked" in the type strings, which is no longer the case.

It's likely this area (Triton code model -> Triton MLIR -> SPIRV) will bit rot as we don't have the expertise and hardware to maintain and fully test. We might need to move it to a separate area if it becomes more burdensome. So what you did seems fine.

mcimadamore · 2025-05-23T10:24:08Z

cr-examples/triton/src/test/java/oracle/code/triton/TestAddKernel.java

-                    %19 : tensor<x64, ptr<float>> = tt.splat %2;
-                    %20 : tensor<x64, ptr<float>> = tt.addptr %19 %9;
+            module ()java.type:"void" -> {
+                tt.func @"add_kernel2_ptr<java.type.primitive<float>>_ptr<java.type.primitive<float>>_ptr<java.type.primitive<float>>_int_64_void" (%0 : ptr<java.type:"float">, %1 : ptr<java.type:"float">, %2 : ptr<java.type:"float">, %3 : java.type:"int")java.type:"void" -> {


Note (in this and other tests) -- the signature of the triton function contains the literal externalized type string of the various parameter/return types. These are NOT flattened strings because e.g. Ptr::toString does not flatten (nor it knows how to, if we were to use the public API). This leads to some asymmetry, where in some cases the inflated form appears in full (as here), while in other cases the short type string appears (e.g. because the type is a Java type, and its toString is more friendly).

I had half a mind to make toString of all triton types more friendly (e.g. not rely on externalize) -- but ultimately I decided to just stick with this very basic conversion -- perhaps we can come back to this later?

This is fine IMO. If need be we can revisit later on.

mcimadamore · 2025-05-23T10:25:15Z

src/jdk.incubator.code/share/classes/jdk/incubator/code/type/CoreTypeFactory.java

+    public static final TypeElementFactory JAVA_TYPE_FACTORY = tree -> switch (JavaTypeUtils.Kind.of(tree)) {
+            case INFLATED_TYPE -> JavaTypeUtils.toJavaType(tree);
+            case INFLATED_REF -> JavaTypeUtils.toJavaRef(tree);
+            default -> throw new UnsupportedOperationException("Unsupported: " + tree);


I'm starting to think this should return null, as we used that as "composition" signal?

PaulSandoz

Very nice work. This regularizes the code and clearly separates the boundaries. It provide a stronger foundation for further improvements.

mcimadamore · 2025-05-26T09:07:24Z

/integrate

openjdk · 2025-05-26T09:33:03Z

Going to push as commit 4278b5b.

openjdk · 2025-05-26T09:33:14Z

@mcimadamore Pushed as commit 4278b5b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

mcimadamore added 14 commits May 6, 2025 12:12

Initial push

2d6a222

More principled handling of Java type strings

e81d656

All tests pass

c460965

Improve documentation of grammar in DescParser

cefd50b

Remove tabs from grammar doc in DescParser

141f7ef

Drop unused field in RecordTypeRefImpl

e4dc4d6

Re-enable all the test cases in TestJavaType

e963ea8

Fix ClassType::rawType() to take enclosing type into account

24f2a5b

Drop Java-type specific code in ExternalizedTypeElement

dedd27e

All tests pass

087a5a1

Remove parens from method type var external type strings

ac020ab

Cleanup and strengthen code that tests for the specific kind of exter…

b0c74f3

…nalized type

Cleanup exType parser

6962d67

Add Javadoc to JavaTypeUtils

f62b557

mcimadamore commented May 21, 2025

View reviewed changes

Drop redundant field

e4b0d18

mcimadamore commented May 21, 2025

View reviewed changes

Fix JavaType grammar

b76890d

mcimadamore commented May 21, 2025

View reviewed changes

mcimadamore marked this pull request as ready for review May 21, 2025 11:33

openjdk bot added the ready Pull request is ready to be integrated label May 21, 2025

openjdk bot added the rfr Pull request is ready for review label May 21, 2025

mcimadamore added 5 commits May 21, 2025 16:39

Simplify parser code to use Lexer::is

cf48f74

Merge branch 'code-reflection' into externalized_Java_type_cleanup

59e3714

Enhance type factory to create refs

067d437

Simplify JavaTypeUtils to use ExternalizedTypeElement factories

8cc1735

Fix examples

c89986c

mcimadamore commented May 23, 2025

View reviewed changes

PaulSandoz approved these changes May 23, 2025

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label May 26, 2025

openjdk bot closed this May 26, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 26, 2025

Regularize support for Java types/references #432

Regularize support for Java types/references #432

Uh oh!

Conversation

mcimadamore commented May 21, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Future work

Progress

Reviewers

Reviewing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bridgekeeper bot commented May 21, 2025

Uh oh!

openjdk bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcimadamore May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcimadamore May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcimadamore May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcimadamore May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlbridge bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

mcimadamore commented May 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulSandoz left a comment

Choose a reason for hiding this comment

Uh oh!

mcimadamore commented May 26, 2025

Uh oh!

openjdk bot commented May 26, 2025

Uh oh!

openjdk bot commented May 26, 2025

Uh oh!

Uh oh!

mcimadamore commented May 21, 2025 •

edited by openjdk bot

Loading

openjdk bot commented May 21, 2025 •

edited

Loading

mcimadamore May 21, 2025 •

edited

Loading

mcimadamore May 21, 2025 •

edited

Loading

mcimadamore May 21, 2025 •

edited

Loading

mcimadamore May 21, 2025 •

edited

Loading

mlbridge bot commented May 21, 2025 •

edited

Loading