-
Notifications
You must be signed in to change notification settings - Fork 28
Regularize support for Java types/references #432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regularize support for Java types/references #432
Conversation
@@ -45,15 +45,6 @@ static String toString(ExternalizedTypeElement t) { | |||
return t.identifier; | |||
} | |||
|
|||
// Unpack array-like identifier [+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was removed, as it was specific to Java types
👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into |
@mcimadamore This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
@@ -121,8 +124,7 @@ | |||
*/ | |||
public final class OpParser { | |||
|
|||
static final TypeElement.ExternalizedTypeElement VOID = | |||
new TypeElement.ExternalizedTypeElement("void", List.of()); | |||
static final TypeElement.ExternalizedTypeElement VOID = JavaType.VOID.externalize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a workaround: the parser uses a private void type -- but such a type is not handled by either the core type factory or the Java type factory. So injecting such type into the parsed model results into issues.
@@ -61,6 +69,28 @@ static void error(String msg, Object... args) { | |||
System.err.println("error: " + String.format(msg, args)); | |||
} | |||
|
|||
static void checkModel(Member member, String found, IR ir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new method allows the harness to print the model we found on the compiled class even in the face of an error when parsing the @IR
annotation. This is very handy when bulk-updating the tests, as it means that parser errors will cause the harness to just dump the "expected" model for all the methods in the class (instead of throwing on the first parser error).
@@ -557,7 +558,7 @@ <T> void writeSeparatedList(String separator, Iterable<T> l, Consumer<T> c) { | |||
} | |||
|
|||
void writeType(TypeElement te) { | |||
write(te.externalize().toString()); | |||
write(JavaTypeUtils.flatten(te.externalize()).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing a type element into textual form, flatten it first (to make it more readable if it's a Java type).
@@ -601,7 +603,7 @@ String parseName() { | |||
} | |||
|
|||
TypeElement.ExternalizedTypeElement parseExTypeElem() { | |||
return DescParser.parseExTypeElem(lexer); | |||
return JavaTypeUtils.inflate(DescParser.parseExTypeElem(lexer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When parsing an externalized type form, inflate it fist (from its readable form, if it's a Java type).
return parseRecordTypeRef(s); | ||
} | ||
|
||
// ExType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the grammar here is very simple, as we no longer have to have support for Java types in here. An externalized type form is a type identifier, followed by zero or more externalized type arguments. An identifier is a qualified identifier -- each part in the identifier can be either string literals or identifiers. Various parts can be separated either with .
or :
.
current = at.componentType(); | ||
} | ||
return new ExternalizedTypeElement("[".repeat(dims), List.of(current.externalize())); | ||
return JavaTypeUtils.arrayType(componentType.externalize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is common to all types and refs:
- the
externalize
method returns an inflated externalized form built with the corresponding factory inJavaTypeUtils
- the
toString
method returns a flat string, obtained with the corresponding method inJavaTypeUtils
- the
ofString
factory parses a flat string and turns it into an inflated form, which is then parsed into aJavaType
orJavaRef
, using corresponding methods inJavaTypeUtils
func @"test1" (%0 : java.type:"ForLoopTest", %1 : java.type:"java.util.List<java.util.List<java.lang.String>>")java.type:"void" -> { | ||
%2 : Var<java.type:"java.util.List<java.util.List<java.lang.String>>"> = var %1 @"ll"; | ||
java.enhancedFor | ||
()java.type:"java.util.List<java.util.List<java.lang.String>>" -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere -- it seems like block names like ^expr
or ^def
are no longer generated. But this PR doesn't change that (in fact, I tried creating the model for a simple for loop without these changes, and I still don't see the body names).
I believe this has to do with the fact that e.g. for ops are composed of bodies, and each body in the op has a single block (thus, it is the entry block for that body). And, entry block names are skipped.
The test was probably still working fine (despite the mismatch) because we parse the IR string, then generate the textual form again before comparing it -- this will drop the entry block names -- meaning the "expected" IR will match the one generated by javac.
return ptypes; | ||
} | ||
|
||
// JavaRef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the grammar of ref and types is partially overlapping -- for this reason it's slightly easier to keep the two parsing methods separate. If we wanted to unify more, we could require all refs to be prefixed with a &
which would simplify things a bit (and also be compatible with what we do to disambiguate between class and method type-variable owner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, note: the grammar for method ref changed, as this now requires the type separator :
to be used between the params and the return type (in a way that is more similar to what's done for field refs). If we don't want to have this incompatibility we can roll back.
return t; | ||
} | ||
} | ||
return JavaTypeUtils.toJavaType(tree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how parsing an inflated type form is now done entirely inside JavaTypeUtils
.
I'm not entirely sure whether the Java type factory should support both Java type and reference inflated forms. For now I went with just types -- but if we want to push this further (e.g. to cover also op attributes) we would need the factory to support both types and refs (which is a trivial change to do).
Webrevs
|
The patch below contains a sketch of how we might go in that direction: The fact the I was able to cook it in an hour or so makes me confident about the general direction set by this PR. |
@@ -164,7 +164,13 @@ public static String mapType(String type) { | |||
Map<Integer, String> idxToOp = new HashMap<>(); | |||
|
|||
for (int i = 0; i < type.length(); ++i) { | |||
if (i + 3 < type.length() && type.substring(i, i + 3).equals("ptr")) { | |||
if (i + 19 < type.length() && type.substring(i, i + 19).equals("java.type.primitive")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to "tolerate" the new inflated external forms here (by skipping them). The logic here seems to assume that both class and primitive types appear "naked" in the type strings, which is no longer the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely this area (Triton code model -> Triton MLIR -> SPIRV) will bit rot as we don't have the expertise and hardware to maintain and fully test. We might need to move it to a separate area if it becomes more burdensome. So what you did seems fine.
%19 : tensor<x64, ptr<float>> = tt.splat %2; | ||
%20 : tensor<x64, ptr<float>> = tt.addptr %19 %9; | ||
module ()java.type:"void" -> { | ||
tt.func @"add_kernel2_ptr<java.type.primitive<float>>_ptr<java.type.primitive<float>>_ptr<java.type.primitive<float>>_int_64_void" (%0 : ptr<java.type:"float">, %1 : ptr<java.type:"float">, %2 : ptr<java.type:"float">, %3 : java.type:"int")java.type:"void" -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note (in this and other tests) -- the signature of the triton function contains the literal externalized type string of the various parameter/return types. These are NOT flattened strings because e.g. Ptr::toString
does not flatten (nor it knows how to, if we were to use the public API). This leads to some asymmetry, where in some cases the inflated form appears in full (as here), while in other cases the short type string appears (e.g. because the type is a Java type, and its toString
is more friendly).
I had half a mind to make toString
of all triton types more friendly (e.g. not rely on externalize) -- but ultimately I decided to just stick with this very basic conversion -- perhaps we can come back to this later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine IMO. If need be we can revisit later on.
public static final TypeElementFactory JAVA_TYPE_FACTORY = tree -> switch (JavaTypeUtils.Kind.of(tree)) { | ||
case INFLATED_TYPE -> JavaTypeUtils.toJavaType(tree); | ||
case INFLATED_REF -> JavaTypeUtils.toJavaRef(tree); | ||
default -> throw new UnsupportedOperationException("Unsupported: " + tree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to think this should return null
, as we used that as "composition" signal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work. This regularizes the code and clearly separates the boundaries. It provide a stronger foundation for further improvements.
/integrate |
Going to push as commit 4278b5b. |
@mcimadamore Pushed as commit 4278b5b. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This PR regularizes the support for Java types in the code model. There are two competing constraints to balance: on the one hand, the Java types should be readable from humans when inspecting the textual form of code model; on the other hand, we don't want to bake in Java-specific type parsing in the core code model API.
Unfortunately, the current implementation scores poorly on both aspects, as (a) textual Java types can become very hard to read (e.g.
.<.<NewTest, NewTest$B>, NewTest$B$C>
), and (b) the parsing logic (defined inDescParser
) contains several ad-hoc extensions that are only there to support Java types.To address this, this PR introduces two different kinds of externalized type forms: inflated type forms and flattened type forms. An inflated type form closely follow the structure of the
JavaType
orJavaRef
it models. For instance, the Java typeFoo<? extends Bar>
is modelled as follows:Inflated type forms can be flattened (using
JavaTypeUtils::flatten
), to derive a form that is more suitable for humans. For instance, the above inflated form is flattened as follows:Conversely, flattened type forms can be inflated back (using
JavaTypeUtils::inflate
) to their original inflated form.This distinction between inflated and flattened forms allow us to massively simplify
DescParser
-- which no longer has to worries about the syntactic vagaries of Java types. All that complexity is now pushed onto the flattened type forms support. The goal is to progressively make flattened Java type forms an "implementation detail" (more on that below).To accommodate flattened and inflated type forms, two changes are needed:
OpWriter
we need to flatten an inflated type form (if possible) before writing it out as a string (this allows for the textual form of a code model to be more readable)OpParser
we need to inflate a flattened type form (if possible) before handing the type form back to the rest of the parsing machinery (which will e.g. invoke the type factory on such inflated type form)All Java types and references now follow this pattern:
externalize
method returns an inflated type formtoString
method returns a readable string (the one associated with the corresponding flat type form)ofString
factory parses the flat string into an inflated form, then turn that inflated form into either aJavaType
orJavaRef
, as neededNote that (3) is the only reason as to why the flattened type forms surface into the API (since the factories are public, we would need to specify what strings they accept). Crucially, these factories are only used when parsing op attributes -- where Java types and refs are just modeled as plain flat strings -- e.g.:
Note how the Java method reference after
@
is encoded as a plain string, which is then parser by theInvokeOp
factory usingMethodRef::ofString
.Implementation
This PR moves all the support for inflated and flattened Java type forms into a single class, namely
JavaTypeUtils
. This class contains factories to create the various inflated forms, as well as method to parse them, to turn them into flat strings, or to turn them intoJavaType
orJavaRef
objects.The code in this class is a bit tedious, as I wanted to express all the functionality we needed in terms of externalized type forms. This allows us to reduce the impact of Java types on the rest of the code model (e.g. the changes to
OpParser
andOpWriter
are literally one liners -- at the cost of making the implementation of some of these methods slightly more convoluted.For instance, it would have been easier to define a mapping from
JavaType
into a flat string (as we could leverage thetoString
method on the variousJavaType
subclasses). But doing so would create some issues, as nowOpWriter
would need to sometimes callexternalize
on the type, sometimes calltoString
on it -- similar idiosyncrasies would show up on theOpParser
side.Future work
The next step in this journey (not addressed by this PR) would be to parse Java type/ref attributes in a more structural fashion, so that we can let go of the
ofString
factories.And, after that, another possible improvement would be to augment and generalize the grammar supported by
DescParser
to cover more interesting structural forms -- something like:This would allow us to model all externalized type forms as more general nodes -- but will also allow us to model other kinds of interesting structure more directly, such as location information, Java literals and, possibly, even Java annotations.
Progress
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/432/head:pull/432
$ git checkout pull/432
Update a local copy of the PR:
$ git checkout pull/432
$ git pull https://git.openjdk.org/babylon.git pull/432/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 432
View PR using the GUI difftool:
$ git pr show -t 432
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/432.diff
Using Webrev
Link to Webrev Comment