Skip to content

Regularize support for Java types/references #432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mcimadamore
Copy link
Collaborator

@mcimadamore mcimadamore commented May 21, 2025

This PR regularizes the support for Java types in the code model. There are two competing constraints to balance: on the one hand, the Java types should be readable from humans when inspecting the textual form of code model; on the other hand, we don't want to bake in Java-specific type parsing in the core code model API.

Unfortunately, the current implementation scores poorly on both aspects, as (a) textual Java types can become very hard to read (e.g. .<.<NewTest, NewTest$B>, NewTest$B$C>), and (b) the parsing logic (defined in DescParser) contains several ad-hoc extensions that are only there to support Java types.

To address this, this PR introduces two different kinds of externalized type forms: inflated type forms and flattened type forms. An inflated type form closely follow the structure of the JavaType or JavaRef it models. For instance, the Java type Foo<? extends Bar> is modelled as follows:

java.type.class(Foo, java.type.primitive(void),
                  java.type.wildcard(EXTENDS,
                                     java.type.class(Bar, java.type.primitive(void))))

Inflated type forms can be flattened (using JavaTypeUtils::flatten), to derive a form that is more suitable for humans. For instance, the above inflated form is flattened as follows:

java.type:"Foo<? extends Bar>"

Conversely, flattened type forms can be inflated back (using JavaTypeUtils::inflate) to their original inflated form.

This distinction between inflated and flattened forms allow us to massively simplify DescParser -- which no longer has to worries about the syntactic vagaries of Java types. All that complexity is now pushed onto the flattened type forms support. The goal is to progressively make flattened Java type forms an "implementation detail" (more on that below).

To accommodate flattened and inflated type forms, two changes are needed:

  • in OpWriter we need to flatten an inflated type form (if possible) before writing it out as a string (this allows for the textual form of a code model to be more readable)
  • in OpParser we need to inflate a flattened type form (if possible) before handing the type form back to the rest of the parsing machinery (which will e.g. invoke the type factory on such inflated type form)

All Java types and references now follow this pattern:

  1. the externalize method returns an inflated type form
  2. the toString method returns a readable string (the one associated with the corresponding flat type form)
  3. the ofString factory parses the flat string into an inflated form, then turn that inflated form into either a JavaType or JavaRef, as needed

Note that (3) is the only reason as to why the flattened type forms surface into the API (since the factories are public, we would need to specify what strings they accept). Crucially, these factories are only used when parsing op attributes -- where Java types and refs are just modeled as plain flat strings -- e.g.:

%6 : java.type:"int" = invoke %5 @"java.lang.Object::hashCode():int";

Note how the Java method reference after @ is encoded as a plain string, which is then parser by the InvokeOp factory using MethodRef::ofString.

Implementation

This PR moves all the support for inflated and flattened Java type forms into a single class, namely JavaTypeUtils. This class contains factories to create the various inflated forms, as well as method to parse them, to turn them into flat strings, or to turn them into JavaType or JavaRef objects.

The code in this class is a bit tedious, as I wanted to express all the functionality we needed in terms of externalized type forms. This allows us to reduce the impact of Java types on the rest of the code model (e.g. the changes to OpParser and OpWriter are literally one liners -- at the cost of making the implementation of some of these methods slightly more convoluted.

For instance, it would have been easier to define a mapping from JavaType into a flat string (as we could leverage the toString method on the various JavaType subclasses). But doing so would create some issues, as now OpWriter would need to sometimes call externalize on the type, sometimes call toString on it -- similar idiosyncrasies would show up on the OpParser side.

Future work

The next step in this journey (not addressed by this PR) would be to parse Java type/ref attributes in a more structural fashion, so that we can let go of the ofString factories.

And, after that, another possible improvement would be to augment and generalize the grammar supported by DescParser to cover more interesting structural forms -- something like:

node:
    ident '<' node* '>'
    literal

This would allow us to model all externalized type forms as more general nodes -- but will also allow us to model other kinds of interesting structure more directly, such as location information, Java literals and, possibly, even Java annotations.


Progress

  • Change must not contain extraneous whitespace

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/432/head:pull/432
$ git checkout pull/432

Update a local copy of the PR:
$ git checkout pull/432
$ git pull https://git.openjdk.org/babylon.git pull/432/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 432

View PR using the GUI difftool:
$ git pr show -t 432

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/432.diff

Using Webrev

Link to Webrev Comment

@@ -45,15 +45,6 @@ static String toString(ExternalizedTypeElement t) {
return t.identifier;
}

// Unpack array-like identifier [+
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was removed, as it was specific to Java types

@bridgekeeper
Copy link

bridgekeeper bot commented May 21, 2025

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into code-reflection will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 21, 2025

@mcimadamore This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Regularize support for Java types/references

Reviewed-by: psandoz

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the code-reflection branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the code-reflection branch, type /integrate in a new comment.

@@ -121,8 +124,7 @@
*/
public final class OpParser {

static final TypeElement.ExternalizedTypeElement VOID =
new TypeElement.ExternalizedTypeElement("void", List.of());
static final TypeElement.ExternalizedTypeElement VOID = JavaType.VOID.externalize();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround: the parser uses a private void type -- but such a type is not handled by either the core type factory or the Java type factory. So injecting such type into the parsed model results into issues.

@@ -61,6 +69,28 @@ static void error(String msg, Object... args) {
System.err.println("error: " + String.format(msg, args));
}

static void checkModel(Member member, String found, IR ir) {
Copy link
Collaborator Author

@mcimadamore mcimadamore May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new method allows the harness to print the model we found on the compiled class even in the face of an error when parsing the @IR annotation. This is very handy when bulk-updating the tests, as it means that parser errors will cause the harness to just dump the "expected" model for all the methods in the class (instead of throwing on the first parser error).

@@ -557,7 +558,7 @@ <T> void writeSeparatedList(String separator, Iterable<T> l, Consumer<T> c) {
}

void writeType(TypeElement te) {
write(te.externalize().toString());
write(JavaTypeUtils.flatten(te.externalize()).toString());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing a type element into textual form, flatten it first (to make it more readable if it's a Java type).

@@ -601,7 +603,7 @@ String parseName() {
}

TypeElement.ExternalizedTypeElement parseExTypeElem() {
return DescParser.parseExTypeElem(lexer);
return JavaTypeUtils.inflate(DescParser.parseExTypeElem(lexer));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parsing an externalized type form, inflate it fist (from its readable form, if it's a Java type).

return parseRecordTypeRef(s);
}

// ExType:
Copy link
Collaborator Author

@mcimadamore mcimadamore May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the grammar here is very simple, as we no longer have to have support for Java types in here. An externalized type form is a type identifier, followed by zero or more externalized type arguments. An identifier is a qualified identifier -- each part in the identifier can be either string literals or identifiers. Various parts can be separated either with . or :.

current = at.componentType();
}
return new ExternalizedTypeElement("[".repeat(dims), List.of(current.externalize()));
return JavaTypeUtils.arrayType(componentType.externalize());
Copy link
Collaborator Author

@mcimadamore mcimadamore May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is common to all types and refs:

  • the externalize method returns an inflated externalized form built with the corresponding factory in JavaTypeUtils
  • the toString method returns a flat string, obtained with the corresponding method in JavaTypeUtils
  • the ofString factory parses a flat string and turns it into an inflated form, which is then parsed into a JavaType or JavaRef, using corresponding methods in JavaTypeUtils

func @"test1" (%0 : java.type:"ForLoopTest", %1 : java.type:"java.util.List<java.util.List<java.lang.String>>")java.type:"void" -> {
%2 : Var<java.type:"java.util.List<java.util.List<java.lang.String>>"> = var %1 @"ll";
java.enhancedFor
()java.type:"java.util.List<java.util.List<java.lang.String>>" -> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere -- it seems like block names like ^expr or ^def are no longer generated. But this PR doesn't change that (in fact, I tried creating the model for a simple for loop without these changes, and I still don't see the body names).

I believe this has to do with the fact that e.g. for ops are composed of bodies, and each body in the op has a single block (thus, it is the entry block for that body). And, entry block names are skipped.

The test was probably still working fine (despite the mismatch) because we parse the IR string, then generate the textual form again before comparing it -- this will drop the entry block names -- meaning the "expected" IR will match the one generated by javac.

return ptypes;
}

// JavaRef:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the grammar of ref and types is partially overlapping -- for this reason it's slightly easier to keep the two parsing methods separate. If we wanted to unify more, we could require all refs to be prefixed with a & which would simplify things a bit (and also be compatible with what we do to disambiguate between class and method type-variable owner).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note: the grammar for method ref changed, as this now requires the type separator : to be used between the params and the return type (in a way that is more similar to what's done for field refs). If we don't want to have this incompatibility we can roll back.

return t;
}
}
return JavaTypeUtils.toJavaType(tree);
Copy link
Collaborator Author

@mcimadamore mcimadamore May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how parsing an inflated type form is now done entirely inside JavaTypeUtils.

I'm not entirely sure whether the Java type factory should support both Java type and reference inflated forms. For now I went with just types -- but if we want to push this further (e.g. to cover also op attributes) we would need the factory to support both types and refs (which is a trivial change to do).

@mcimadamore mcimadamore marked this pull request as ready for review May 21, 2025 11:33
@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 21, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label May 21, 2025
@mlbridge
Copy link

mlbridge bot commented May 21, 2025

Webrevs

@mcimadamore
Copy link
Collaborator Author

The next step in this journey (not addressed by this PR) would be to parse Java type/ref attributes in a more structural fashion, so that we can let go of the ofString factories.

The patch below contains a sketch of how we might go in that direction:

https://github.com/mcimadamore/babylon/compare/externalized_Java_type_cleanup...mcimadamore:babylon:remove_of_string_type_factories?expand=1

The fact the I was able to cook it in an hour or so makes me confident about the general direction set by this PR.

@@ -164,7 +164,13 @@ public static String mapType(String type) {
Map<Integer, String> idxToOp = new HashMap<>();

for (int i = 0; i < type.length(); ++i) {
if (i + 3 < type.length() && type.substring(i, i + 3).equals("ptr")) {
if (i + 19 < type.length() && type.substring(i, i + 19).equals("java.type.primitive")) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to "tolerate" the new inflated external forms here (by skipping them). The logic here seems to assume that both class and primitive types appear "naked" in the type strings, which is no longer the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely this area (Triton code model -> Triton MLIR -> SPIRV) will bit rot as we don't have the expertise and hardware to maintain and fully test. We might need to move it to a separate area if it becomes more burdensome. So what you did seems fine.

%19 : tensor<x64, ptr<float>> = tt.splat %2;
%20 : tensor<x64, ptr<float>> = tt.addptr %19 %9;
module ()java.type:"void" -> {
tt.func @"add_kernel2_ptr<java.type.primitive<float>>_ptr<java.type.primitive<float>>_ptr<java.type.primitive<float>>_int_64_void" (%0 : ptr<java.type:"float">, %1 : ptr<java.type:"float">, %2 : ptr<java.type:"float">, %3 : java.type:"int")java.type:"void" -> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note (in this and other tests) -- the signature of the triton function contains the literal externalized type string of the various parameter/return types. These are NOT flattened strings because e.g. Ptr::toString does not flatten (nor it knows how to, if we were to use the public API). This leads to some asymmetry, where in some cases the inflated form appears in full (as here), while in other cases the short type string appears (e.g. because the type is a Java type, and its toString is more friendly).

I had half a mind to make toString of all triton types more friendly (e.g. not rely on externalize) -- but ultimately I decided to just stick with this very basic conversion -- perhaps we can come back to this later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine IMO. If need be we can revisit later on.

public static final TypeElementFactory JAVA_TYPE_FACTORY = tree -> switch (JavaTypeUtils.Kind.of(tree)) {
case INFLATED_TYPE -> JavaTypeUtils.toJavaType(tree);
case INFLATED_REF -> JavaTypeUtils.toJavaRef(tree);
default -> throw new UnsupportedOperationException("Unsupported: " + tree);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think this should return null, as we used that as "composition" signal?

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work. This regularizes the code and clearly separates the boundaries. It provide a stronger foundation for further improvements.

@mcimadamore
Copy link
Collaborator Author

/integrate

@openjdk
Copy link

openjdk bot commented May 26, 2025

Going to push as commit 4278b5b.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 26, 2025
@openjdk openjdk bot closed this May 26, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 26, 2025
@openjdk
Copy link

openjdk bot commented May 26, 2025

@mcimadamore Pushed as commit 4278b5b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

2 participants