Skip to content

Expand ARM Architecture Compatibility #5954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
11 tasks done
halibobo1205 opened this issue Aug 15, 2024 · 60 comments · May be fixed by #6327
Open
11 tasks done

Expand ARM Architecture Compatibility #5954

halibobo1205 opened this issue Aug 15, 2024 · 60 comments · May be fixed by #6327

Comments

@halibobo1205
Copy link
Contributor

halibobo1205 commented Aug 15, 2024

Background

Java-Tron currently only supports the x86 architecture. Nevertheless, ARM architecture has gained significant traction recently, especially in cloud computing and mobile devices. ARM processors are known for their energy efficiency and cost-effectiveness, making them increasingly popular in data centers, cloud computing, and edge computing scenarios. It will be great to have an option to run Java-Tron using the ARM architecture.

Key developments in ARM architecture:

ARM advantages:

Related Issues and PRs

Scope of Impact

  • Build and deployment processes
  • Core application code
  • Third-party dependencies
  • Development and testing environments

Current Progress Summary

  1. JDK version

  2. Native code

  3. Third-party dependencies

  4. Floating-point arithmetic

  5. Build and deployment process

    • Update build scripts to support ARM architecture.
    • Ensure CI/CD pipelines can be built and tested in ARM environments.
    • Docker support
@angrynurd
Copy link

angrynurd commented Aug 15, 2024

I am totally in favor of extending ARM architecture compatibility. This will allow Java-Tron to run on more platforms and take advantage of the benefits of the ARM architecture, such as higher energy efficiency and lower cost.
In my opinion, we can start with the following:

  1. Prioritize ARM support for key dependencies: For example, RocksDB/LevelDB is an important database component in Java-Tron and it is critical to ensure its compatibility on ARM.
  2. Establish an ARM test environment: We need to establish a dedicated ARM test environment to ensure the stability and performance of Java-Tron on ARM.
  3. Collaborate with the community: We can work with the community to solve ARM compatibility issues and share experiences and best practices.

@tomatoishealthy
Copy link
Contributor

It sounds great, but I am a novice in ARM architecture. I am curious about the challenges of supporting ARM architecture.

Can you list something like a task list in the future? It is convenient to clearly understand the current status and future challenges.

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 15, 2024

Here are some common considerations:

Important

  1. JDK version compatibility
    Ensure the JDK version supports ARM Architecture. Consider using ARM-optimized JDK distributions.
  • Linux got support in JDK 9(non-LTS) by JEP 237
  • Windows got support in JDK 16(non-LTS) by JEP 388
  • Macs got support in JDK 17(LTS) by JEP 391

Important

2. Native code
JNI (Java Native Interface) or other native code.
These native code components need to be recompiled or upgraded for ARM architecture.

  • LevelDBJni
  • RocksDBJni
  • zksnark-java-sdk

Tip

3. Endianness
x86 is little-endian, while some ARM processors may be big-endian.
Check if any operations in the code(such as TVM) depend on a specific endianness, especially when handling binary data.

Tip

4. Memory alignment:
ARM architecture may have different memory alignment requirements than x86.
Check for code(such as TVM) that assumes specific memory alignments.

Tip

5. Atomic operations and concurrency
Some atomic operations(TVM) may be implemented differently on different architectures.
Review concurrent code to ensure it works correctly on ARM as well.

Caution

6. Floating-point arithmetic
ARM and x86 may have subtle differences in floating-point precision and behavior.
For applications that rely on precise floating-point calculations, comprehensive testing is necessary.

Tip

7. Performance optimization
x86-specific performance optimizations(TVM) may no longer be applicable on ARM.
Consider using ARM-specific optimization techniques.

Important

8. Third-party dependencies
Ensure all third-party libraries and dependencies support ARM architecture.
Some incompatible dependencies may need to be updated or replaced.

  • protoc-gen-grpc-java

Important

9. Build and deployment process:

  • Update build scripts to support ARM architecture.
  • Ensure CI/CD pipelines can be built and tested in ARM environments.
  • Docker support

Tip

10. Hardware feature dependencies:
Check if the code(TVM) relies on x86-specific hardware features.
Alternatives may need to be found for ARM.

Tip

11. System calls and OS interactions
If the code makes direct system calls, adjustments may be needed for ARM.

Important

12. Cross-platform testing

  • Establish comprehensive test suites to ensure the functionality works correctly on ARM.
  • Conduct performance benchmarking to compare x86 and ARM performance differences.

@317787106
Copy link
Contributor

317787106 commented Aug 15, 2024

@halibobo1205 Do you want to support ARM Architecture and latest JVM version at the same time ? Or just support ARM Architecture using JDK8 ?

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 15, 2024

@317787106 JVM officially supports ARM:

  • Linux got support in JDK 9 by JEP 237
  • Windows got support in JDK 16 by JEP 388
  • Macs got support in JDK 17 by JEP 391

According to Oracle Java SE Support Roadmap, JDK9 and JDK16 are non-LTS, and JDK 17 is LTS. Based on the above information, I propose that ARM support JDK17 as a minimum.

Warning

This is the last planned update of JDK 17 under the NFTC. Updates after September 2024 will be licensed under the Java SE OTN License (OTN) and production use beyond the limited free grants of the OTN license will require a fee.

@angrynurd
Copy link

Here are some common considerations:


Regarding JDK version compatibility.
You recommend using an ARM-optimized JDK distribution. What specific ARM-optimized JDK distributions do you recommend? What are their performance and stability advantages?

@abn2357
Copy link

abn2357 commented Aug 15, 2024

When is the expected completion time for this work? It sounds like a big project.

@halibobo1205
Copy link
Contributor Author

@endiaoekoe I propose that ARM support JDK17 as a minimum.

@halibobo1205
Copy link
Contributor Author

@abn2357 Tron currently only supports JDK 8, based on the above information, JDK17 supports ARM fully, perhaps Tron needs to upgrade JDK17 first, which is another big project.

@zeusoo001
Copy link
Contributor

@halibobo1205 It sounds great, and I look forward to your implementation. I see that there may be subtle differences in floating point precision and behavior between ARM and x86. When supporting it, be sure to ensure data consistency. Also investigate whether there are other places that may cause data inconsistency.

@Murphytron
Copy link

This issue has been added to the core devs community call #22, welcome to share the latest progress @halibobo1205, and discuss together with @endiaoekoe @tomatoishealthy @317787106 @zeusoo001 @abn2357.

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 20, 2024

1. JDK version compatibility
After some brief research, I found ARM 64-bit versions of JDK 8 available. cc @endiaoekoe @317787106 @abn2357

Provider Linux Mac Windows Notes
Oracle • Official support
• Requires payment for commercial use
Eclipse Temurin Free OpenJDK
• Regularly updated and supported by the Adoptium community
Azul Zulu • Free OpenJDK
• Full enterprise version requires payment
BellSoft Liberica • Free OpenJDK for all users
• Relatively less well-known
Amazon Corretto • Free OpenJDK
• long-term support by Amazon
• Amazon runs Corretto internally on thousands of production services

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 20, 2024

6. Floating-point arithmetic known issues:

Unfortunately, Tron does use Math.pow() for floating-point calculations for the Bancor trading pair in ExchangeProcessor:

private long exchangeToSupply(long balance, long quant) {
    logger.debug("balance: " + balance);
    long newBalance = balance + quant;
    logger.debug("balance + quant: " + newBalance);

    double issuedSupply = -supply * (1.0 - Math.pow(1.0 + (double) quant / newBalance, 0.0005));
    logger.debug("issuedSupply: " + issuedSupply);
    long out = (long) issuedSupply;
    supply += out;

    return out;
  }

  private long exchangeFromSupply(long balance, long supplyQuant) {
    supply -= supplyQuant;

    double exchangeBalance =
        balance * (Math.pow(1.0 + (double) supplyQuant / supply, 2000.0) - 1.0);
    logger.debug("exchangeBalance: " + exchangeBalance);

    return (long) exchangeBalance;
  }

Test case

 @Test
  public void testPow() {
    double x = 29218;
    double q = 4761432;
    double ret = Math.pow(1.0 + x / q, 0.0005);
    double ret2 = StrictMath.pow(1.0 + x / q, 0.0005);

    System.out.printf("%s%n", doubleToHex(ret)); //  3ff000033518c576
    System.out.printf("%s%n", doubleToHex(ret2)); // 3ff000033518c575
    Assert.assertEquals(0, Double.compare(ret, ret2)); // fail in jdk8_X86, success in jdk8_ARM64
  }

  public static String doubleToHex(double input) {
    // Convert the starting value to the equivalent value in a long
    long doubleAsLong = Double.doubleToRawLongBits(input);
    // and then convert the long to a hex string
    return Long.toHexString(doubleAsLong);
  }

Tron Should Use StrictMath to Avoid Cross-Platform Consistency Issues. To help ensure the portability on ARM for Java-Tron, I suggest a new proposal to convert Math to StrictMath. cc @zeusoo001

@317787106
Copy link
Contributor

@halibobo1205 First support JDK8 on mac ARM and then extend to support JDK17 on linux and mac ARM may be smooth.

@tomatoishealthy
Copy link
Contributor

JDK version compatibility After some brief research, I found ARM 64-bit versions of JDK 8 available. cc @endiaoekoe @317787106 @abn2357

Does this mean that there is no longer a dependency between ARM architecture upgrade and JDK upgrade?

In addition, TRON only focuses on Oracle JDK, right?

@halibobo1205
Copy link
Contributor Author

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Aug 21, 2024

2. Native code

JNI (Java Native Interface) or other native code.
These native code components(JNI) must be recompiled or upgraded for ARM64 architecture and may include, but not limited to, the following.

  • LevelDBJni fusesource has stopped maintaining this project, no updates since Oct 17, 2013, forked by @halibobor, thanks to @folowing

  • RocksDBJni 6.29.4.1+ RocksJava support for ARM64 architecture

    • current version is 5.15.10, It is recommended to upgrade to 7.7.3+, which has been verified by synced from 0.

    • 6.29.4.1+ is incompatible with leveldb: org.rocksdb.RocksDBException: bad block contents in file output-directory/database/asset-issue-v2/MANIFEST-000428, 5.15.10 is compatible with leveldb.

      • levelDB -> open it without doing anything or convert levelDB to rocksDB by Toolkit.jar db convert without safe mode -> ✅ rocksDB-5.15.10 -> ❌ rocksDB-6.29.4.1+ won't work
      • levelDB -> ✅ convert levelDB to rocksDB by DBConvert.jar or Toolkit.jar db convert --safe with safe mode -> ✅ rocksDB-5.15.10 -> ✅ rocksDB-6.29.4.1+ is ok
      • rocksDB(rocksDB-5.15.10) -> ✅ rocksDB-6.29.4.1+ is ok
    • TODO:

      • 1. Disable RocksDB to open LevelDB directly
      • 2. Toolkit.jar db convert should force safe mode.
      • 3. Provide rocksDB rewrite tool to fix rocksDB directly open LevelDB without doing anything or convert levelDB to rocksDB byToolkit.jar db convert without safe mode scenario.
  • zksnark-java-sdk is upgraded for ARM64 architecture since GreatVoyage-v4.7.0.1

@halibobo1205
Copy link
Contributor Author

8. Third-party dependencies

Ensure all third-party libraries and dependencies support ARM architecture.
Some incompatible dependencies may need to be updated or replaced, including, but not limited to, the following.

@halibobo1205
Copy link
Contributor Author

Warning

This is the last planned update of JDK 17 under the NFTC. Updates after September 2024 will be licensed under the Java SE OTN License (OTN) and production use beyond the limited free grants of the OTN license will require a fee.

To avoid subsequent charges for commercial use, I recommend switching to OpenJDK.

@halibobo1205
Copy link
Contributor Author

Caution

Strong data consistency and finality
Final data consistency is required for blockchain, and it's usually guaranteed by the world state. Unfortunately, Java-Tron doesn't have a world state.
We need to think about how to ensure final data consistency.

@halibobo1205
Copy link
Contributor Author

1. JDK version compatibility
Maybe try to support OpenJDK on ARM?

@halibobo1205
Copy link
Contributor Author

A hard fork solution will be introduced in 4.8.0, switching floating-point calculations from Math to StrictMath.

@halibobo1205
Copy link
Contributor Author

Currently, java-tron supports both LevelDB and RocksDB. On the ARM architecture, we intend to support only RocksDB, mainly due to the following considerations:

  1. Performance Advantages
    RocksDB, built on top of LevelDB, offers enhanced performance, reliability, and advanced features such as multi-threaded execution, compaction optimizations, and support for larger datasets, making it more suitable for high-throughput and low-latency use cases.

  2. Community Support

  • RocksDB has continuous investment and maintenance from Meta (Facebook)
  • Official support for RocksDB Java API
  • RocksDB community is more active in supporting ARM architecture
  • In comparison, LevelDB's community maintenance is relatively less active
  1. Feature Completeness
  • RocksDB offers richer features (e.g., column families, transaction support, TtlDB)
  • Built-in monitoring and performance diagnostic tools are more comprehensive
  • Provides more flexible configuration options for optimization on ARM architecture
  1. Future Development Trends
  • RocksDB is more widely used in blockchain domain
  • Continuously receives performance optimizations and feature updates
  • More timely support for new hardware features
  1. Ecosystem Integration
  • Better support for RocksDB in cloud-native environments
  • Better integration with modern monitoring tools
  • More mature support for containerized deployment
  1. Hardware Adaptation
  • Better optimization for new storage devices (e.g., NVMe SSD) in RocksDB
  • Better utilization of ARM architecture's specific instruction sets
  • Better support for large memory systems

This will ensure the best database usage experience on ARM architecture.

@halibobo1205
Copy link
Contributor Author

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented Nov 21, 2024

Important

On the ARM architecture, we intend to support only RocksDB

When using RocksDB on the CI test, we found that some tests failed due to differences in behavior between LevelDB and RocksDB.

  • JVM core dump
image
  • RocksDB does not throw an error when it is closed
    • putData
    • deleteData
    • getData

@NewOF
Copy link

NewOF commented Apr 18, 2025

Principle of X87 Instruction Simulation

  • Through relevant information and source code, it is known that the key to calculating $a ^ b$ in pow lies in two instructions, fyl2x and f2xm1. fyl2x is used to calculate $b * \log_2^a$, while f2xm1 is used to calculate $2^a -1$

  • Process:

    • Assume $y = a^b$,take the logarithm on both sides of the equation $\log_2^y = b * \log_2^a$
    • That is $\Large y = 2^{b * \log_2^a}$
    • By combining these two instructions, we can calculate $a^b$
  • At present, the implementation details of the fyl2x and f2xm1 instructions are still unknown. We will attempt to simulate the calculation through Taylor expansion

  • $\log_2^x$

    • $\log_2^x$ unfolds as $\frac{1}{\ln2}\sum_{n=1}^{\infty}\frac{(-1)^n}{n}(x-1)^n$
  • $2^x - 1$

    • $2^x - 1$ unfolds as $\sum_{n=1}^{\infty}\frac{(x\ln2)^n}{n!}$

Simulation implementation(c++)

  • Using SoftFloat library, supporting 80 bit extended dual precision(http://www.jhauser.us/arithmetic/SoftFloat.html)
  • Note 1: Float80 class is implemented based on SoftFloat library encapsulation
  • Note 2: Due to the special nature of the 48 mismatched data (with a power of 0.0005), the calculation process has been partially simplified
  Float80 ln_coe[] = {
  	Float80(0x3FFF, 0x8000000000000000), // 1.0
  	Float80(0x3FFD, 0xFFFFFFFFFFFFFFFF), // 0.5
  	Float80(0x3FFD, 0xAAAAAAAAAAAAAAAA), // 0.333...
  	Float80(0x3FFC, 0xFFFFFFFFFFFFFFFF), // 0.25
  	Float80(0x3FFC, 0xCCCCCCCCCCCCCCCC), // 0.2
  	Float80(0x3FFC, 0xAAAAAAAAAAAAAAAA), // 0.166...
  };

  Float80 taylor_ln2_float80(Float80 x) {
  	return x -
           x * x * ln_coe[1] +
           x * x * x * ln_coe[2] -
           x * x * x * x * ln_coe[3] +
           x * x * x * x * x * ln_coe[4] -
           x * x * x * x * x * x * ln_coe[5];
  }

  // ln(2)^n/n!
  Float80 exp_coe[] = {
  	Float80(0.69314718055994530942),
  	Float80(0.24022650695910071233),
  	Float80(0.05550410866482157995),
  	Float80(0.00961812910762847716),
  	Float80(0.00133335581464284434),
  };

  Float80 taylor_exp2_float80(Float80 x) {
  	return x * exp_coe[0] +
           x * x * exp_coe[1] +
           x * x * x * exp_coe[2] +
           x * x * x * x * exp_coe[3] +
           x * x * x * x * x * exp_coe[4];
  }

  double taylor_pow2_float80(double x, double y) {
  	Float80 x80(x), y80(y);
  	Float80 ln2(0x3FFE, 0xB17217F7D1CF7BBB);

  	Float80 y_lg2_x = y80 * taylor_ln2_float80(x80 - Float80(1)) / ln2;
    // For the test dataset, since y_lg2_x<1, this step omits the exponentiation of the integer part
  	Float80 exp_y_lg2_x = taylor_exp2_float80(y_lg2_x) + Float80(1);

  	return exp_y_lg2_x.to_double();
  }
  • Test data(base, exp, expected)
  Data("3ff0192278704be3", 0.0005, "3ff000033518c576"); //  4137160
  Data("3ff000002fc6a33f", 0.0005, "3ff0000000061d86"); //  4065476
  Data("3ff00314b1e73ecf", 0.0005, "3ff0000064ea3ef8"); //  4071538
  Data("3ff0068cd52978ae", 0.0005, "3ff00000d676966c"); //  4109544
  Data("3ff0032fda05447d", 0.0005, "3ff0000068636fe0"); //  4123826
  Data("3ff00051c09cc796", 0.0005, "3ff000000a76c20e"); //  4166806
  Data("3ff00bef8115b65d", 0.0005, "3ff0000186893de0"); //  4225778
  Data("3ff009b0b2616930", 0.0005, "3ff000013d27849e"); //  4251796
  Data("3ff00364ba163146", 0.0005, "3ff000006f26a9dc"); //  4257157
  Data("3ff019be4095d6ae", 0.0005, "3ff0000348e9f02a"); //  4260583
  Data("3ff0123e52985644", 0.0005, "3ff0000254797fd0"); //  4367125
  Data("3ff0126d052860e2", 0.0005, "3ff000025a6cde26"); //  4402197
  Data("3ff0001632cccf1b", 0.0005, "3ff0000002d76406"); //  4405788
  Data("3ff0000965922b01", 0.0005, "3ff000000133e966"); //  4490332
  Data("3ff00005c7692d61", 0.0005, "3ff0000000bd5d34"); //  4499056
  Data("3ff015cba20ec276", 0.0005, "3ff00002c84cef0e"); //  4518035
  Data("3ff00002f453d343", 0.0005, "3ff000000060cf4e"); //  4533215
  Data("3ff006ea73f88946", 0.0005, "3ff00000e26d4ea2"); //  4647814
  Data("3ff00a3632db72be", 0.0005, "3ff000014e3382a6"); //  4766695
  Data("3ff000c0e8df0274", 0.0005, "3ff0000018b0aeb2"); //  4771494
  Data("3ff00015c8f06afe", 0.0005, "3ff0000002c9d73e"); //  4793587
  Data("3ff00068def18101", 0.0005, "3ff000000d6c3cac"); //  4801947
  Data("3ff01349f3ac164b", 0.0005, "3ff000027693328a"); //  4916843
  Data("3ff00e86a7859088", 0.0005, "3ff00001db256a52"); //  4924111
  Data("3ff00000c2a51ab7", 0.0005, "3ff000000018ea20"); //  5098864
  Data("3ff020fb74e9f170", 0.0005, "3ff00004346fbfa2"); //  5133963
  Data("3ff00001ce277ce7", 0.0005, "3ff00000003b27dc"); //  5139389
  Data("3ff005468a327822", 0.0005, "3ff00000acc20750"); //  5151258
  Data("3ff00006666f30ff", 0.0005, "3ff0000000d1b80e"); //  5185021
  Data("3ff000045a0b2035", 0.0005, "3ff00000008e98e6"); //  5295829
  Data("3ff00e00380e10d7", 0.0005, "3ff00001c9ff83c8"); //  5380897
  Data("3ff00c15de2b0d5e", 0.0005, "3ff000018b6eaab6"); //  5400886
  Data("3ff00042afe6956a", 0.0005, "3ff0000008892244"); //  5864127
  Data("3ff0005b7357c2d4", 0.0005, "3ff000000bb48572"); //  6167339
  Data("3ff00033d5ab51c8", 0.0005, "3ff0000006a279c8"); //  6240974
  Data("3ff0000046d74585", 0.0005, "3ff0000000091150"); //  6279093
  Data("3ff0010403f34767", 0.0005, "3ff0000021472146"); //  6428736
  Data("3ff00496fe59bc98", 0.0005, "3ff000009650a4ca"); //  6432355,6493373
  Data("3ff0012e43815868", 0.0005, "3ff0000026af266e"); //  6555029
  Data("3ff00021f6080e3c", 0.0005, "3ff000000458d16a"); //  7092933
  Data("3ff000489c0f28bd", 0.0005, "3ff00000094b3072"); //  7112412
  Data("3ff00009d3df2e9c", 0.0005, "3ff00000014207b4"); //  7675535
  Data("3ff000def05fa9c8", 0.0005, "3ff000001c887cdc"); //  7860324
  Data("3ff0013bca543227", 0.0005, "3ff00000286a42d2"); //  8292427
  Data("3ff0021a2f14a0ee", 0.0005, "3ff0000044deb040"); //  8517311
  Data("3ff0002cc166be3c", 0.0005, "3ff0000005ba841e"); //  8763101
  Data("3ff0000cc84e613f", 0.0005, "3ff0000001a2da46"); //  9269124
  Data("3ff000057b83c83f", 0.0005, "3ff0000000b3a640"); //  9631452
  • Comparison of Results
exp:0.0005 base:1.00628495434413 3ff019be4095d6ae
expected: 1.0000031326481 3ff0000348e9f02a
result:   1.0000031326481 3ff0000348e9f029

exp:0.0005 base:1.00805230779141 3ff020fb74e9f170
expected: 1.00000401003852 3ff00004346fbfa2
result:   1.00000401003852 3ff00004346fbfa1
  • At the beginning of the test, the iterative calculation used directly had poor results. Later, by directly calculating the polynomial expansion and using pre calculated coefficients, the effect was significantly improved. (However, there are still two pieces of data that do not fully match expectations)

@halibobo1205
Copy link
Contributor Author

diff data(48 POW calculation instances) is the result of Math and StrictMath, and if algorithmic simulations are performed, the implementation needs to be fully tested for full equivalence with the Math library.

@NewOF
Copy link

NewOF commented Apr 21, 2025

  • Using simulated pow to calculate 48 mismatched floating-point data, the best result currently is 2 mismatches.
  • By comparing historical data (block height of 11 million 673496 pieces of data, including contextual numerical calculations, and comparing the final exchange balance), the simulated implementation of pow has 58072 mismatches compared to Math.pow. At the same time, using the pow of the C++standard library directly for calculation resulted in 48 mismatches. Compared to its implementation, it is consistent with StrictMath.pow (both with 48 mismatches).
  • By analyzing the relevant resources and source code currently found, it is speculated that the logarithmic and power operations implemented within the x87 instruction should also be polynomial expansions, while utilizing preprocessed coefficients for acceleration. At present, due to the lack of further implementation details and the instability of floating-point operations themselves, it is difficult to achieve accurate matching. In contrast, using hard coding is more feasible.

@halibobo1205
Copy link
Contributor Author

In the Java HotSpot virtual machine, do_intrinsic is an important concept related to intrinsic functions.

Intrinsics in the HotSpot JVM are special, optimized implementations of commonly used Java methods. When the JVM identifies specific method calls, it may replace the standard Java implementation of these methods with more efficient native code. Math.pow() is one such method that is commonly intrinsically optimized.

Specifically for the Math.pow() method, the HotSpot JVM handles it in the following ways:

  1. The JVM has a function called do_intrinsic that determines whether a method can be replaced with an intrinsic implementation, and how to perform the replacement.

  2. For Math.pow(), when the JIT (Just-In-Time compiler) compiles code containing this method call, the do_intrinsic mechanism examines the call and may replace it with native instructions or optimized algorithms corresponding to the processor architecture.

  3. Typically, the intrinsic implementation of Math.pow() directly utilizes instructions from the CPU's floating-point unit, such as FSIN, FCOS, FPTAN in x86 architecture, or calls to underlying math libraries (like Intel's MKL), thereby avoiding the slower pure Java implementation.

This optimization mechanism is usually managed by the vmIntrinsics namespace in the HotSpot source code, with relevant implementations distributed across files such as src/hotspot/share/classfile/vmSymbols.hpp, src/hotspot/share/opto/library_call.cpp, and others.

do_intrinsic(_dlog, java_lang_Math, log_name, double_double_signature, F_S)

Through this intrinsic function optimization, the HotSpot JVM allows Java programs to maintain platform independence while achieving performance close to native code, which is particularly beneficial for math-computation-intensive applications.

Here's the logic for log, and pow is similar:
Image

@317787106
Copy link
Contributor

@halibobo1205 When Math calculates pow(double,double), how can you determine if the result is inconsistent with that calculated by StrictMath? What to do when inconsistencies are found? And you can specify what's hardcoded.

@halibobo1205
Copy link
Contributor Author

@317787106

  1. Calculate the bancor transaction based on Math.pow and StrictMath.pow, respectively, in an x86 JDK8 environment, and record if the final buyTokenQuant is inconsistent
public class ExchangeCapsule implements ProtoCapsule<Exchange> {

  public long transaction(byte[] sellTokenID, long sellTokenQuant, boolean useStrictMath) {
    long supply = 1_000_000_000_000_000_000L;
    ExchangeProcessor processor = new ExchangeProcessor(supply, useStrictMath);
    ExchangeProcessor strictProcessor = new ExchangeProcessor(supply, true);

    long buyTokenQuant = 0;
    long strictBuyTokenQuant = 0;
    long firstTokenBalance = this.exchange.getFirstTokenBalance();
    long secondTokenBalance = this.exchange.getSecondTokenBalance();

    if (this.exchange.getFirstTokenId().equals(ByteString.copyFrom(sellTokenID))) {
      buyTokenQuant = processor.exchange(firstTokenBalance,
          secondTokenBalance,
          sellTokenQuant);
      strictBuyTokenQuant = strictProcessor.exchange(firstTokenBalance,
          secondTokenBalance,
          sellTokenQuant);
      if (!useStrictMath && buyTokenQuant != strictBuyTokenQuant) {
        logAndRecord("{}\t{}\t{}\t{}\t{}", buyTokenQuant, strictBuyTokenQuant, firstTokenBalance, secondTokenBalance, sellTokenQuant); // logAndRecord pow data
      }
      this.exchange = this.exchange.toBuilder()
          .setFirstTokenBalance(firstTokenBalance + sellTokenQuant)
          .setSecondTokenBalance(secondTokenBalance - buyTokenQuant)
          .build();
    } else {
      buyTokenQuant = processor.exchange(secondTokenBalance,
          firstTokenBalance,
          sellTokenQuant);
      strictBuyTokenQuant = strictProcessor.exchange(secondTokenBalance,
          firstTokenBalance,
          sellTokenQuant);
      if (!useStrictMath && buyTokenQuant != strictBuyTokenQuant) {
        logAndRecord("{}\t{}\t{}\t{}\t{}", buyTokenQuant, strictBuyTokenQuant,secondTokenBalance, firstTokenBalance, sellTokenQuant); // logAndRecord pow data
      }
      this.exchange = this.exchange.toBuilder()
          .setFirstTokenBalance(firstTokenBalance - buyTokenQuant)
          .setSecondTokenBalance(secondTokenBalance + sellTokenQuant)
          .build();
    }
    
    return buyTokenQuant;
  }
  1. Based on the data collected in step 1, calculate the pow data to be hardcoded: issuedSupply and exchangeBalance
     
public class ExchangeProcessor {

  private long supply;
  private final boolean useStrictMath;

  public ExchangeProcessor(long supply, boolean useStrictMath) {
    this.supply = supply;
    this.useStrictMath = useStrictMath;
  }

  private long exchangeToSupply(long balance, long quant) {
    long newBalance = balance + quant;
    double issuedSupply = -supply * (1.0 - Maths.pow(1.0 + (double) quant / newBalance, 0.0005, this.useStrictMath));
    long out = (long) issuedSupply;
    supply += out;
    return out;
  }

  private long exchangeFromSupply(long balance, long supplyQuant) {
    supply -= supplyQuant;
    double exchangeBalance = balance * (Maths.pow(1.0 + (double) supplyQuant / supply, 2000.0, this.useStrictMath) - 1.0);
    return (long) exchangeBalance;
  }

  public long exchange(long sellTokenBalance, long buyTokenBalance, long sellTokenQuant) {
    long relay = exchangeToSupply(sellTokenBalance, sellTokenQuant);
    return exchangeFromSupply(buyTokenBalance, relay);
  }

}
  1. Adjust StrictMathWrapper
   private static final Map<Double, Double> powData = Collections.synchronizedMap(new HashMap<>());

  public static double pow(double a, double b) {
    double strictResult = StrictMath.pow(a, b);
    return powData.getOrDefault(a, strictResult);
  }
}

@halibobo1205
Copy link
Contributor Author

If there are other ways to implement X87 Instruction Simulation, please discuss them.
Hard-coding for pow data is currently the better solution based on performance, implementation complexity, and verification difficulty.

@317787106
Copy link
Contributor

@halibobo1205 I noticed that the pow results in exchangeToSupply and exchangeFromSupply are converted to long. Could this precision loss impact the handling of hardcoded special cases?

@halibobo1205
Copy link
Contributor Author

@halibobo1205 I noticed that the pow results in exchangeToSupply and exchangeFromSupply are converted to long. Could this precision loss impact the handling of hardcoded special cases?

Yes, precision loss precisely reduces the amount of the pow data that needs to be hard-coded.

@halibobo1205
Copy link
Contributor Author

Principle of X87 Instruction Simulation

USE MPFR mpfr_pow

#include <stdio.h>
#include <gmp.h>
#include <mpfr.h>


// Precision settings
#define X87_PRECISION 64  // 64-bit mantissa for x87 80-bit format
int main(void) {
    mpfr_t base1, exp1, result1;
    mpfr_t base2, exp2, result2;
    mpfr_set_default_prec(X87_PRECISION);
    mpfr_set_default_rounding_mode(MPFR_RNDN);
    mpfr_init2(base1, X87_PRECISION);
    mpfr_init2(exp1, X87_PRECISION);
    mpfr_init2(result1, X87_PRECISION);

    mpfr_init2(base2, X87_PRECISION);
    mpfr_init2(exp2, X87_PRECISION);
    mpfr_init2(result2, X87_PRECISION);

    mpfr_set_d(base1, 1.0061363892207218, MPFR_RNDN);
    mpfr_set_d(exp1, 0.0005, MPFR_RNDN);

    mpfr_set_d(base2, 1.0000046943914231, MPFR_RNDN);
    mpfr_set_d(exp2, 2000, MPFR_RNDN);

    mpfr_pow(result1, base1, exp1, MPFR_RNDN);
    mpfr_pow(result2, base2, exp2, MPFR_RNDN);

    printf("pow(1.0061363892207218, 0.0005) = ");
    mpfr_out_str(stdout, 10, 17, result1, MPFR_RNDN);
    printf("\n");

    printf("pow(1.0000046943914231, 2000) = ");
    mpfr_out_str(stdout, 10, 17, result2, MPFR_RNDN);
    printf("\n");

    mpfr_clear(base1);
    mpfr_clear(exp1);
    mpfr_clear(result1);

    mpfr_clear(base2);
    mpfr_clear(exp2);
    mpfr_clear(result2);
    mpfr_free_cache();
    return 0;
}

❌ Unable to precisely simulate x86 pow

@halibobo1205
Copy link
Contributor Author

halibobo1205 commented May 8, 2025

Principle of X87 Instruction Simulation

USE Apfloat

❌ The short answer is "no", see details:

Image

@NewOF
Copy link

NewOF commented May 8, 2025

According to the documentation of the 8087 support library mentioned in the link, it provides a simulated implementation of the relevant instructions. However, since the corresponding source code is not provided, it is not possible to verify the correctness.

@halibobo1205
Copy link
Contributor Author

Hard-Code:

@NewOF
Copy link

NewOF commented May 8, 2025

Some constants mentioned in another link do not provide precise hexadecimal representations; instead, they use decimal floating-point numbers with insufficient precision, which are not very meaningful for our calculation scenario.

@halibobo1205
Copy link
Contributor Author

Floating-point arithmetic

Important

For historical data(the Bancor trading pair), Hardcoded Special Cases(48 POW calculation instances)

Hard-coding remains the optimal solution at this stage, though we'll continue exploring alternative approaches. I welcome your thoughts and input on this matter.

@halibobo1205 halibobo1205 linked a pull request May 13, 2025 that will close this issue
@halibobo1205
Copy link
Contributor Author

All current progress is documented in PR #6327. We welcome any new ideas or suggestions for further improvement.

@halibobo1205
Copy link
Contributor Author

Milestone Update (2025-05-16)

@halibobo1205
Copy link
Contributor Author

Milestone Update (2025-05-23)

@abc-x-t
Copy link

abc-x-t commented May 26, 2025

  1. Cross-platform testing
  • Establish comprehensive test suites to ensure the functionality works correctly on ARM.
  • Conduct performance benchmarking to compare x86 and ARM performance differences.

@halibobo1205 Hi, What's the current progress on this test? Looking forward to seeing the test results for this part, as this outcome is quite important.

@halibobo1205
Copy link
Contributor Author

Cross-platform testing

@abc-x-t

  • Test suites are passed on MACOS and Linux for arm64
  • Conduct performance benchmarking to compare x86 and ARM performance differences: Performance benchmarks based on GravitonV2, Intel Xeon, and AMD EPYC are ready to begin, and the cloud provider is AWS.

@abc-x-t
Copy link

abc-x-t commented May 28, 2025

Cross-platform testing

@abc-x-t

  • Test suites are passed on MACOS and Linux for arm64[ ] Conduct performance benchmarking to compare x86 and ARM performance differences: Performance benchmarks based on GravitonV2, Intel Xeon, and AMD EPYC are ready to begin, and the cloud provider is AWS.

Great! Do we have a detailed schedule for the remaining work on this feature before it’s released? And is there any way the community can get involved in reviewing or testing?

@halibobo1205
Copy link
Contributor Author

@abc-x-t Performance benchmarks based on GravitonV2, Intel Xeon, and AMD EPYC are underway! Welcome to review and testing in #6327

@abc-x-t
Copy link

abc-x-t commented May 29, 2025

@abc-x-t Performance benchmarks based on GravitonV2, Intel Xeon, and AMD EPYC are underway! Welcome to review and testing in #6327

OK, thanks for the information! Maybe we can add more tests to cover more scenarios.
For example, performance between JDK8 and JDK17 on x86. Tests under opcode level described in #6292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.