Unhandled exception type segmentation error vmstate 0x00000000

Java -version output openjdk version "11.0.7" 2020-04-14 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64...

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and
privacy statement. We’ll occasionally send you account related emails.

Already on GitHub?
Sign in
to your account


Closed

edrevo opened this issue

May 27, 2020

· 49 comments


· Fixed by #10159

Comments

@edrevo

Java -version output

openjdk version «11.0.7» 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64-64-Bit Compressed References 20200416_574 (JIT enabled, AOT enabled)
OpenJ9 — 05fa2d3
OMR — d4365f371
JCL — 838028fc9d based on jdk-11.0.7+10)

Summary of problem

Segmentation fault:

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F9E1F2FDED0 Handler2=00007F9E1EBEF9F0 InaccessibleAddress=0000000000000008
RDI=0000000001CDAC68 RSI=0000000000000008 RAX=0000000000000850 RBX=00007F9DBE5DF187
RCX=0000000001CDABA0 RDX=0000000000000008 R8=0000000000000000 R9=00000000FD970CA8
R10=00000000FD970CA8 R11=00000000000000FF R12=0000000001CDAB98 R13=0000000002C8F4D0
R14=00007F9DBE5DF18C R15=00007F9DF83049D0
RIP=00007F9E1F285ADB GS=0000 FS=0000 RSP=00007F9DF8304650
EFlags=0000000000010202 CS=0033 RBP=0000000000000001 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 0000000000000006 (f: 6.000000, d: 2.964394e-323)
xmm1 00000000fd9765f8 (f: 4254557696.000000, d: 2.102031e-314)
xmm2 00007f9df8304a00 (f: 4163914240.000000, d: 6.932554e-310)
xmm3 00007f9df8304a00 (f: 4163914240.000000, d: 6.932554e-310)
xmm4 43e0000000000000 (f: 0.000000, d: 9.223372e+18)
xmm5 000000003d01e9e2 (f: 1023535616.000000, d: 5.056938e-315)
xmm6 000000004aef5a72 (f: 1257200256.000000, d: 6.211394e-315)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 0000000048123dc0 (f: 1209155072.000000, d: 5.974019e-315)
xmm9 000000003f8dd600 (f: 1066259968.000000, d: 5.268024e-315)
xmm10 00000000000025c9 (f: 9673.000000, d: 4.779097e-320)
xmm11 00000000411a9820 (f: 1092261888.000000, d: 5.396491e-315)
xmm12 000000004246d6ae (f: 1111938688.000000, d: 5.493707e-315)
xmm13 0000000044de0000 (f: 1155399680.000000, d: 5.708433e-315)
xmm14 0000000045310000 (f: 1160839168.000000, d: 5.735308e-315)
xmm15 414f79ffb262a5ad (f: 2992809472.000000, d: 4.125695e+06)
Module=/opt/java/openjdk/lib/compressedrefs/libj9vm29.so
Module_base_address=00007F9E1F26B000
Target=2_90_20200416_574 (Linux 4.15.0-1082-azure)
CPU=amd64 (4 logical CPUs) (0x3e92f0000 RAM)
----------- Stack Backtrace -----------
(0x00007F9E1F285ADB [libj9vm29.so+0x1aadb])
(0x00007F9E1F27DB30 [libj9vm29.so+0x12b30])
(0x00007F9E1F339B02 [libj9vm29.so+0xceb02])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2020/05/27 14:48:26 - please wait.
JVMDUMP032I JVM requested System dump using '//core.20200527.144826.6.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.230.

 

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F9E1F2FDED0 Handler2=00007F9E1EBEF9F0 InaccessibleAddress=0000000000000008
RDI=0000000000C08618 RSI=0000000000000008 RAX=0000000000000850 RBX=00007F9DBE5DF187
RCX=0000000000C08550 RDX=0000000000000008 R8=0000000000000000 R9=00000000FD921428
R10=00000000FD921428 R11=00000000000000FF R12=0000000000C08548 R13=0000000002C8F4D0
R14=00007F9DBE5DF18C R15=00007F9DF97B29D0
RIP=00007F9E1F285ADB GS=0000 FS=0000 RSP=00007F9DF97B2650
EFlags=0000000000010202 CS=0033 RBP=0000000000000001 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 0000000000000006 (f: 6.000000, d: 2.964394e-323)
xmm1 00000000fdd32bb8 (f: 4258475008.000000, d: 2.103966e-314)
xmm2 00007f9df97b2a00 (f: 4185598464.000000, d: 6.932555e-310)
xmm3 00007f9df97b2a00 (f: 4185598464.000000, d: 6.932555e-310)
xmm4 43e0000000000000 (f: 0.000000, d: 9.223372e+18)
xmm5 000000003d01e9e2 (f: 1023535616.000000, d: 5.056938e-315)
xmm6 000000004aef5a72 (f: 1257200256.000000, d: 6.211394e-315)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 0000000047e38180 (f: 1206092160.000000, d: 5.958887e-315)
xmm9 000000003f6451c0 (f: 1063539136.000000, d: 5.254582e-315)
xmm10 42b255f50e94e148 (f: 244638016.000000, d: 2.016039e+13)
xmm11 00000000411567aa (f: 1091921792.000000, d: 5.394811e-315)
xmm12 000000004252ece6 (f: 1112730880.000000, d: 5.497621e-315)
xmm13 3f847ae147ae147b (f: 1202590848.000000, d: 1.000000e-02)
xmm14 4014000000000000 (f: 0.000000, d: 5.000000e+00)
xmm15 414f79ffb262a5ad (f: 2992809472.000000, d: 4.125695e+06)
Module=/opt/java/openjdk/lib/compressedrefs/libj9vm29.so
Module_base_address=00007F9E1F26B000
Target=2_90_20200416_574 (Linux 4.15.0-1082-azure)
CPU=amd64 (4 logical CPUs) (0x3e92f0000 RAM)
----------- Stack Backtrace -----------
(0x00007F9E1F285ADB [libj9vm29.so+0x1aadb])
(0x00007F9E1F27DB30 [libj9vm29.so+0x12b30])
(0x00007F9E1F339B02 [libj9vm29.so+0xceb02])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2020/05/27 14:48:31 - please wait.
JVMDUMP012E Error in System dump: The core file created by child process with pid = 230 was not found. Expected to find core file with name "//core"
JVMDUMP032I JVM requested Java dump using '//javacore.20200527.144826.6.0003.txt' in response to an event
JVMDUMP032I JVM requested System dump using '//core.20200527.144831.6.0002.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.231.

 

JVMDUMP010I Java dump written to //javacore.20200527.144826.6.0003.txt
JVMDUMP012E Error in System dump: The core file created by child process with pid = 231 was not found. Expected to find core file with name "//core"
JVMDUMP032I JVM requested Snap dump using '//Snap.20200527.144826.6.0004.trc' in response to an event
JVMDUMP032I JVM requested Java dump using '//javacore.20200527.144831.6.0005.txt' in response to an event
JVMDUMP010I Snap dump written to //Snap.20200527.144826.6.0004.trc
JVMDUMP007I JVM Requesting JIT dump using '//jitdump.20200527.144826.6.0006.dmp'
JVMDUMP010I JIT dump written to //jitdump.20200527.144826.6.0006.dmp
JVMDUMP013I Processed dump event "gpf", detail "".

JAVA_OPTS:

-Xtune:virtualized -Xshareclasses:cacheDir=/jvmcache,nonfatal,name=core-openj9-020 -Xaot:loadExclude={akka/stream/impl/GraphStageIsland.onIslandReady*} -Xscmx512M -Xjit:enableSelfTuningScratchMemoryUsageBeforeCompile

We have seen this when the cache had been previously propulated with a different version of some of the classes. Specifically, the SCC had classes for Akka 2.5.26 that conflicted with the classes from Akka 2.6.5 that the crashing JVM needed.

My expectation would be that this problem shouldn’t generate a crash. One of two things could happen:

  1. The JVM could invalidate the SCC
  2. The JVM could opt-out of AOT

Diagnostic files

jitdump.20200527.150333.6.0004.dmp.zip

@pshipton

Access to the (jextract’ed) core file would help progress this, but I see there was an error creating it. Running java -Xcheck:dump may help determine why.

AOT is mentioned, does the problem still occur when -Xnoaot is specified?

@edrevo

It does not. -Xnoaot fixes the problem, as well as specifying a different SCC name to avoid the class conflicts.

@pshipton

@dsouzai

@edrevo could you run with

-Xaot:traceRelocatableDataDetailsRT,traceRelocatableDataRT,log=log,aotrtDebugLevel=30,rtLog=rtLog

on both runs (ie populate SCC run and load SCC run)? Btw, you’ll have to merge all your AOT options into one -Xaot string as the compiler only looks at the rightmost one.

Also, providing the corefile is probably essential.

@DanHeidinga

@edrevo do you have a reproducible test case we can use? it would help to get this resolved.

@edrevo

Hi @DanHeidinga. After 4 hard hours trying to get the necessary conditions to repro this bug again, I have been able to get the segfault once again!

I have run the JVMs with the aot options @dsouzai mentioned. This is a k8s environment, so k8s is constantly rebooting the failing pods, which is why there are so many logs (appologies for that). Logs are quite large, so here’s a link so you can download the file: https://1drv.ms/u/s!AvHYjMzLfKM5g8Vaoel_h6nPEYlMBA?e=4VEzoN

The core dump continues to fail because of apport interaction with cgroups. This is a common issue that spans both docker (kubernetes/kubernetes#48787) and k8s (kubernetes/kubernetes#48787). If there’s any JVM option that would allow the dump to bypass the normal kernel handling of crash dumps, I would be able to get a core dump.

@edrevo

I have a core dump! In case anyone stumbles on Google on this thread: if you need a core dump of OpenJ9 on a container or a Kubernetes pod, it will fail to due the apport configuration in the host. In order to solve the problem, you can run your pod/container as privileged and configure the JVM with -Xdump:directory=<host mount> so you can retrieve the dump after the container restarts (or access the host, if that is an option).

Anyway, here’s the dump!
baikal2.zip

@DanHeidinga

Thanks @edrevo for getting the logs and core!

@dsouzai can you jump on this? Logs and core are now available

@dsouzai

@edrevo baikal2.zip doesn’t have a coredump; it only has the javacore, jitdump, and Snapfile. The jitdump is a badly named file (.dmp) because it’s not a coredump, but a tracelog.

@edrevo

😭

are the logs I attached enough to investigate the issue?

@dsouzai

Unfortunately no :/. Those logs would’ve been helpful in conjuction with the coredump, because (and this is me hoping how things would go heh) I would somehow find what was causing the seg fault, use that to find what class caused that wrong thing to happen, and then see if we relocated/validated that class in the logs you sent, and then kind of go from there.

@pshipton

Labeling as JIT since the problem doesn’t occur with -Xnoaot.

@andrewcraik

@dsouzai what’s the next step here? This is still tagged for 0.21 so are we going to try and fix it or do we need to move it to 0.22?

@dsouzai

I don’t have enough information right now; without a coredump in conjunction with those relo tracing logs, there’s not much I can do really.

@andrewcraik

Given the lack of diagnostics I think we have to defer to 0.22 — FYI @pshipton in case you feel strongly otherwise, but there isn’t much to go on and nothing to investigate with so I don’t think this can be fixed for M2.

@edrevo

OK, I think I do have a core dump now (core*.dmp), but it is quite large, so here’s a link to my OneDrive. Let me know if you have any problems accessing it: https://1drv.ms/u/s!AvHYjMzLfKM5g8YmEFqoYs542t6Lwg?e=b1T3Pe

Sorry for the long delay and many many thanks for being on top of this issue.

@dsouzai

Thanks for the coredumps @edrevo. I don’t know what’s wrong with them, but they seem to be missing information. For example, according to javacore.20200702.132859.6.0003.txt

2XHREGISTER      RIP: 00007FC49BA89ADB

but in gdb:

(gdb)  x/i 0x00007FC49BA89ADB
   0x7fc49ba89adb:      Cannot access memory at address 0x7fc49ba89adb

I also can’t view the addresses listed out in the native memory stack:

3XMTHREADINFO3           Native callstack:
4XENATIVESTACK               (0x00007FC49B422D32 [libj9prt29.so+0x49d32])
4XENATIVESTACK               (0x00007FC49B3F4753 [libj9prt29.so+0x1b753])
4XENATIVESTACK               (0x00007FC49B422DAE [libj9prt29.so+0x49dae])
4XENATIVESTACK               (0x00007FC49B422EA4 [libj9prt29.so+0x49ea4])
4XENATIVESTACK               (0x00007FC49B3F4753 [libj9prt29.so+0x1b753])
4XENATIVESTACK               (0x00007FC49B422C0B [libj9prt29.so+0x49c0b])
...

I think the coredumps might be truncated; I’m guessing this based on the fact that according to the javacore:

1MEMUSER       JRE: 1,563,106,856 bytes / 40491 allocations

but the coredump is only 300M and 500M respectively; given that the heap is ~300M I would’ve expected the coredump to be much bigger. Also the second javacore is truncated.

Probably worth verifying your ulimits (both host and container) as well as ensuring you have enough disk space for the coredumps.

@dsouzai

Oh also, the output from -Xaot:traceRelocatableDataDetailsRT,traceRelocatableDataRT,log=/path/to/log,aotrtDebugLevel=30,rtLog=/path/to/rtLog (namely the log.* and rLog.* files generated) is likely going to be necessary (in conjunction with the coredump) to see why a stale class would cause a problem.

@edrevo

@edrevo

I realized that my previous link didn’t have the aot logs. Here’s one I got (from a different run). Tomorrow, if you need me to, I can get the rtLogs, coredump, aotlogs and all the other debug artifacts from a single run. Sorry about that.

rtLog.20200702.163150.6.zip

@dsouzai

I’ll take a look at the corefile, but yeah having the rtlogs+coredump from the same process is going to be ideal because the J9Class (the C struct that represents a Java class in the JVM) pointers that get printed out in the rtlog can then be followed in the coredump.

Actually, it might also be valuable to have the SCC file; you can find that file in the cacheDir specified on the -Xshareclasses cmd line, though this is more of a «nice to have» than absolutely necessary (…I hope :P)

@dsouzai

I believe it’s crashing here: https://github.com/eclipse/openj9/blob/8c8ce1e50962a7e8f1678d678cd69561a1a7c8bb/runtime/vm/BytecodeInterpreter.hpp#L6776

I can’t automatically get the backtrace, but the corefile does encapsulate the entire address space, so I’m at least able to manaully find information. The stackslot walker complains about a corrupt stack when performing the walk, but it is able to print out some useful info:

> !stackslots 0x1d48d00
<1d48d00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x0000000001D48D00 ***
<1d48d00>       ITERATE_O_SLOTS
<1d48d00>       RECORD_BYTECODE_PC_OFFSET
<1d48d00> Initial values: walkSP = 0x00000000025EFE40, PC = 0x00007F525948E4B7, literals = 0x00000000027F12D0, A0 = 0x00000000025EFEE8, j2iFrame = 0x00000000025F0190, ELS = 0x00007F523FF7ABA0, decomp = 0x0000000000000000
<1d48d00> Bytecode frame: bp = 0x00000000025EFE68, sp = 0x00000000025EFE40, pc = 0x00007F525948E4B7, cp = 0x00000000027F0D00, arg0EA = 0x00000000025EFEE8, flags = 0x0000000000000000
<1d48d00>       Method: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.onPush()V !j9method 0x00000000027F12D0
<1d48d00>       Bytecode index = 163
<1d48d00>       Using local mapper
<1d48d00>       Locals starting at 0x00000000025EFEE8 for 0x0000000000000010 slots
<1d48d00>               O-Slot: a0[0x00000000025EFEE8] = 0x00000000FFD60A10
<1d48d00>               I-Slot: t1[0x00000000025EFEE0] = 0x00000000FFD72BD0
<1d48d00>               O-Slot: t2[0x00000000025EFED8] = 0x00000000FFD72B90
<1d48d00>               I-Slot: t3[0x00000000025EFED0] = 0x00000000FFD72BD0
<1d48d00>               I-Slot: t4[0x00000000025EFEC8] = 0x00000000FFD72BC0
<1d48d00>               I-Slot: t5[0x00000000025EFEC0] = 0x00000000F06013D0
<1d48d00>               I-Slot: t6[0x00000000025EFEB8] = 0x00000000F06A6DD0
<1d48d00>               O-Slot: t7[0x00000000025EFEB0] = 0x00000000F06013D0
<1d48d00>               I-Slot: t8[0x00000000025EFEA8] = 0x00000000F06A6DD0
<1d48d00>               O-Slot: t9[0x00000000025EFEA0] = 0x00000000FFD72BE0
<1d48d00>               I-Slot: t10[0x00000000025EFE98] = 0x000000000000000A
<1d48d00>               I-Slot: t11[0x00000000025EFE90] = 0x00000000FFD72B80
<1d48d00>               I-Slot: t12[0x00000000025EFE88] = 0x00000000FFD72B40
<1d48d00>               I-Slot: t13[0x00000000025EFE80] = 0x00000000F06013D0
<1d48d00>               I-Slot: t14[0x00000000025EFE78] = 0x00000000025EFF10
<1d48d00>               I-Slot: t15[0x00000000025EFE70] = 0x00007F51F86E805C
Jul 02, 2020 2:31:34 PM com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptData encountered iterating o-slots. walkThread = 0x0000000001D48D00
com.ibm.j9ddr.CorruptDataException: Operand stack underflow in StackMap
...

Looking at the bytecodes for !j9method 0x00000000027F12D0:

> !bytecodes 0x27f12d0
...
  156 getfield 7 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.$outer Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;
  159 aload0getfield
  160 getfield 7 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.$outer Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;
  163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
...

we’re crashing on 163 invokevirtual based on the Bytecode index = 163 & pc = 0x00007F525948E4B7 printed in the stackslot above.

From the javacore:

1XHREGISTERS   Registers:
...
2XHREGISTER      RBX: 00007F525948E4B7
...
2XHREGISTER      RDX: 0000000000000008
...
2XHREGISTER      R12: 00000000025EFE40
...
2XHREGISTER      RIP: 00007F52BA8D0ADB
...
2XHREGISTER      RSP: 00007F523FF7A650
...

from gdb & objdump:

(gdb) x/10i 0x7F52BA8D0AAF
   0x7f52ba8d0aaf <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32623>:        mov    QWORD PTR [rsp+0x140],rax
   0x7f52ba8d0ab7 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32631>:        mov    rax,QWORD PTR [rsp+0x140]
   0x7f52ba8d0abf <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32639>:        mov    rdx,QWORD PTR [rsp+0x140]
   0x7f52ba8d0ac7 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32647>:        shr    rax,0x8
   0x7f52ba8d0acb <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32651>:        movzx  edx,dl
   0x7f52ba8d0ace <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32654>:        mov    rdx,QWORD PTR [r12+rdx*8]
   0x7f52ba8d0ad2 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32658>:        test   rdx,rdx
   0x7f52ba8d0ad5 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32661>:
    je     0x7f52ba8ddb97 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+86103>
   0x7f52ba8d0adb <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32667>:        mov    ebx,DWORD PTR [rdx]

Where the last instruction is where the crash happens. The associated code in the interpreter is

   UDATA volatile methodIndexAndArgCount = ramMethodRef->methodIndexAndArgCount;
   UDATA methodIndex = methodIndexAndArgCount >> 8;
   j9object_t receiver = ((j9object_t*)_sp)[methodIndexAndArgCount & 0xFF];
   if (J9_UNEXPECTED(NULL == receiver)) {
      ...
   } else {
      J9Class *receiverClass = J9OBJECT_CLAZZ(_currentThread, receiver);
      ...
   }

After cross-referencing with the core, javacore, and disassembly, it looks like r12 is _sp and rdx at the time of the crash is supposed to be an object reference. However, it ends up having the value 0x8 which is an invalid object reference, hence the crash.

Given that the stackwalker choked after 0x00000000025EFE70, I figured it was worth looking at the stack:

(gdb) x/20gx 0x25efe40
0x25efe40:      0x00000000ffd72be0      0x0000000000000008
0x25efe50:      0x00000000ffd60908      0x000000000113dfb8
0x25efe60:      0x00007f51f86e805c      0x00000000025eff10
0x25efe70:      0x00007f51f86e805c      0x00000000025eff10
0x25efe80:      0x00000000f06013d0      0x00000000ffd72b40
0x25efe90:      0x00000000ffd72b80      0x000000000000000a
0x25efea0:      0x00000000ffd72be0      0x00000000f06a6dd0
0x25efeb0:      0x00000000f06013d0      0x00000000f06a6dd0
0x25efec0:      0x00000000f06013d0      0x00000000ffd72bc0
0x25efed0:      0x00000000ffd72bd0      0x00000000ffd72b90

It looks like the stackwalker choked because 0x25efe60 and 0x25efe68 are exactly the same as 0x25efe70 & 0x25efe78, which I suppose is unexpected. I imagine this is probably relevant to the issue but I don’t know right now.

I’ll have to go deeper, maybe I’ll chase after the ramMethodRef and see where that takes me.

@DanHeidinga

From the code:

		U_16 index = *(U_16*)(_pc + 1);
		J9ConstantPool *ramConstantPool = J9_CP_FROM_METHOD(_literals);
		J9RAMVirtualMethodRef *ramMethodRef = ((J9RAMVirtualMethodRef*)ramConstantPool) + index;

From the stackslots, _pc is 0x00007F525948E4B7. and the constant pool of the method at the top of the stack is 0x00000000027F0D00, so:

(gdb) x/hx 0x00007F525948E4B7+8
0x7f525948e4bf: 0x22b6

(gdb) x/2gx 0x00000000027F0D0+0x22b6
0x281386:       0x0000000000000000      0x0000000000000000

Can you confirm that offset is correct? Given the line of code:

((J9RAMVirtualMethodRef*)ramConstantPool) + index;

The constantpool entry should be 0x00000000027F0D0 + (0x10 * 0x22b6) as each J9RAMVirtualMethodRef is two pointers worth of data.

Can you dump the memory at 0x2A1C30 instead (assuming I’ve done the math right)?

So it looks like ramMethodRef has NULL for both methodIndexAndArgCount and method. @DanHeidinga does this mean the methodref is unresolved, or does NULL mean uninitailized?

In general, any initialized class should have the argCount portion of the methodIndexAndArgCount initialized. The method slot being null would mean unresolved.

@edrevo

@dsouzai

Can you confirm that offset is correct? Given the line of code:

((J9RAMVirtualMethodRef*)ramConstantPool) + index;

The constantpool entry should be 0x00000000027F0D0 + (0x10 * 0x22b6) as each J9RAMVirtualMethodRef is two pointers worth of data.

Can you dump the memory at 0x2A1C30 instead (assuming I’ve done the math right)?

Ah yes, missed the pointer cast.

(gdb) x/2gx 0x00000000027F0D0+0x22b60
0x2a1c30:       0x0000000000000240      0x0000000000000248

Does that look more reasonable to you @DanHeidinga ? Seems like the struct J9Method *volatile method field doesn’t look right.

@dsouzai

Messed up the pointer being printed out in gdb, the constant pool should’ve been 0x00000000027F0D00 (missed an extra 0 at the end). This is what the J9RAMVirtualMethodRef looks like:

(gdb) x/2gx 0x00000000027F0D00+0x22b60
0x2813860:      0x0000000000017800      0x00007f52bac37f00

Because 0x0000000000017800 & 0xFF still yields 0x0,

j9object_t receiver = ((j9object_t*)_sp)[methodIndexAndArgCount & 0xFF];

will still cause the receiver to be 0x8.

@DanHeidinga

How are you walking the java stack? Using !stackslots <J9VMThread>? For a crash in the interpreter, we’re not guaranteed that the values in the J9VMThread are up to date. We only store them back when we might GC, etc. Most of the time they are live in registers / locals.

It’s a pain but we need to look at the disassembly and map the vm state to the right registers (SP & PC) and the right locals

@dsouzai

How are you walking the java stack? Using !stackslots <J9VMThread>? For a crash in the interpreter, we’re not guaranteed that the values in the J9VMThread are up to date. We only store them back when we might GC, etc. Most of the time they are live in registers / locals.

I was using the !stackslots cmd from jdmpview. However, given your comment I took a look at the assembly again assuming that we can’t trust _pc and _sp. It turns out that I kind of got lucky; the actual pc is 0x00007F525948E4BC which is at bytecode 168, which is also an invokevirtual.

Also, the data printed out in #9710 (comment) is wrong because the index is wrong, because in #9710 (comment) I didn’t realize that _pc is a reference to a U_8 *. So, to get the right J9RAMVirtualMethodRef (finally):

(gdb) x/hx 0x00007F525948E4BC+1
0x7f525948e4bd: 0x0021

(gdb)  x/2gx 0x00000000027f0d00+0x210
0x27f0f10:      0x0000000000085001      0x00007f52bac37f00

This makes more sense; what I said above was incorrect, namely:

Because 0x0000000000017800 & 0xFF still yields 0x0,

j9object_t receiver = ((j9object_t*)_sp)[methodIndexAndArgCount & 0xFF];

will still cause the receiver to be 0x8.

methodIndexAndArgCount (0x0000000000085001 & 0xFF) has to be 0x1 in order to get the bad receiver value of 0x8. Also, 0x00007f52bac37f00 is definitely not a j9method.


bytecode 168 is

  168 invokevirtual 33 scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;

searching for it:

> !methodforname scala/collection/immutable/Queue.enqueue*
Searching for methods named 'scala/collection/immutable/Queue.enqueue*' in VM=0x00007F52B4016A50...
!j9method 0x0000000000E259F0 --> scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;
!j9method 0x0000000000E25A10 --> scala/collection/immutable/Queue.enqueue(Lscala/collection/immutable/Iterable;)Lscala/collection/immutable/Queue;
Found 2 method(s) named scala/collection/immutable/Queue.enqueue*

> !j9method 0x0000000000E259F0
J9Method at 0xe259f0 {
  Fields for J9Method:
        0x0: U8* bytecodes = !j9x 0x00007F525948B16C
        0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000000000E24AA0 (flags = 0x0)
        0x10: void* methodRunAddress = !j9x 0x0000000000000006
        0x18: void* extra = !j9x 0x0000000000001753
}
Signature: scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue; !bytecodes 0x0000000000E259F0
ROM Method: !j9rommethod 0x00007F525948B158
Next Method: !j9method 0x0000000000E25A10

0x0000000000E259F0 looks legitimate and there’s only one method enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue; so there aren’t any overrides. So I guess the question is still why the J9RAMVirtualMethodRef in the constant pool of the top method isn’t right.

@DanHeidinga

Is 0x00007f52bac37f00 equal to J9JavaVM.initialMethods.initialVirtualMethod? I gave you bad info earlier when I said null meant unresolved. We initialize the method to point to an initialMethod that does the resolution so we don’t have to check if it’s resolved on every invoke

@dsouzai

Heh, looks like it’s an initialSpecialMethod:

> !j9initializermethods 0x00007F52B401B6B0
J9InitializerMethods at 0x7f52b401b6b0 {
  Fields for J9InitializerMethods:
        0x0: struct J9Method* initialStaticMethod = !j9method 0x00007F52BAC3CB20 // <FAULT>
        0x8: struct J9Method* initialSpecialMethod = !j9method 0x00007F52BAC37F00 // <FAULT>
        0x10: struct J9Method* initialVirtualMethod = !j9method 0x00007F52BAC37EE0 // <FAULT>
        0x18: struct J9Method* invokePrivateMethod = !j9method 0x00007F52BAC37EC0 // <FAULT>
}

@DanHeidinga

That makes sense. The cpEnty can be shared by an invokevirtual and an invokespecial or an invokestatic. So the layout is:

[vtableOffset | argCount]
[initialSpecialMethod or initialStaticMethod]

And we have a special slot in the vtable to hold the initialvirtualmethod so initialize the vtableOffset to point to that vtable slot. (I need to write this down in detail sometime)

So 0x0000000000085001 == [vtableOffset | argCount]. To get the method from the vtable, take vtableOffset, 0x0000000000085001 >> 8 and add it to the J9Class from the receiver.

So (UDATA)receiverClass + 0x850 will be slot with the resolved method

@dsouzai

Ah ok I see, thanks @DanHeidinga. Therefore then the question is why the object ref is bad. From the bytecodes:

  163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
  166 aload 9
  168 invokevirtual 33 scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;

the value returned by 163 is the receiver for the invokevirtual in 168. Looks like there’s only one version of it:

> !methodforname *openTimeouts
Searching for methods named '*openTimeouts' in VM=0x00007F52B4016A50...
!j9method 0x00000000025885B0 --> akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
Found 1 method(s) named *openTimeouts
> !j9method 0x00000000025885b0
J9Method at 0x25885b0 {
  Fields for J9Method:
        0x0: U8* bytecodes = !j9x 0x00007F5259311A78
        0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000000002588150 (flags = 0x0)
        0x10: void* methodRunAddress = !j9x 0x0000000000000018
        0x18: void* extra = !j9x 0x00007F529E297900
}
Signature: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue; !bytecodes 0x00000000025885B0
ROM Method: !j9rommethod 0x00007F5259311A64
Next Method: !j9method 0x00000000025885D0

This method is JIT’d and likely AOT’d (given that the JVM was run with -Xtune:virtualized). This method is likely the one that’s causing the problem. I guess I shall continue this next week and see where it goes.

@dsouzai

One possibly relevant detail though is that the J9RAMVirtualMethodRef for 163 is still unresolved:

  163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;

index = 31 = 0x1F

(gdb) x/2gx 0x00000000027f0d00+0x1F0
0x27f0ef0:      0x0000000000042800      0x00007f52bac37f00

Considering that we already executed 163, I would’ve thought that the methodref would’ve been resolved by the time of the crash.

@DanHeidinga

One possibly relevant detail though is that the J9RAMVirtualMethodRef for 163 is still unresolved:

  163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;

index = 31 = 0x1F

(gdb) x/2gx 0x00000000027f0d00+0x1F0
0x27f0ef0:      0x0000000000042800      0x00007f52bac37f00

Considering that we already executed 163, I would’ve thought that the methodref would’ve been resolved by the time of the crash.

It’s an invokevirtual so the first slot in the cp is [vtableOffset | argCount] == 0x0000000000042800 and vtableOffset == 0x428. So definitely resolved.

@dsouzai

So definitely resolved.

Interesting; I was thinking it wasn’t because the method is 0x00007f52bac37f00 which is

 0x8: struct J9Method* initialSpecialMethod = !j9method 0x00007F52BAC37F00 // <FAULT>

@DanHeidinga

Each bytecode has unique resolution status, even though there are only two slots that can be filled in. The whole picture involves splitting some slots during classload to resolve conflicts

@dsouzai

The bytecodes of openTimeouts:

    0 aload0getfield
    1 getfield 8 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Lscala/collection/immutable/Queue;
    4 return1

The J9RAMFieldRef for the getfield 8:

(gdb) x/2gx 0x25881D0
0x25881d0:      0x000000000000003c      0xffffffffc8020002

The JIT’d code for openTimeouts:

0x7f529e297900 +0   488b442408           mov       rax, qword ptr [rsp+8]
0x7f529e297905 +5   483b6550             cmp       rsp, qword ptr [rbp+0x50] J9VMThread.stackOverflowMark
0x7f529e297909 +9   7611                 jbe       0x7f529e29791c C>> +28
0x7f529e29790b +11  4883ec08             sub       rsp, 8 <<< ^+38
0x7f529e29790f +15  4889442410           mov       qword ptr [rsp+0x10], rax
0x7f529e297914 +20  8b403c               mov       eax, dword ptr [rax+0x3c]
0x7f529e297917 +23  4883c408             add       rsp, 8
0x7f529e29791b +27  c3                   ret

The JIT’d code looks right; I guess the object the field is being loaded from is bad.

@dsouzai

Ok I take it back, the AOT code is incorrect. I thought because mov eax, dword ptr [rax+0x3c] is loading from offset 0x3c and the fieldref says the offset is 0x3c it was correct. However, an object does have some data before the field at «offset 0».

These are the bytecodes around the call to the AOT’d method & the invocation where we crash:

  159 aload0getfield
  160 getfield 7 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.$outer Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;
  163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
  166 aload 9
  168 invokevirtual 33 scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;

Regarding 159&160, we’re loading fieldref 7 from the this pointer, which is arg0EA == 0x00000000FFD60A10:

> !fj9object 0x00000000FFD60A10
!J9Object 0x00000000FFD60A10 {
        struct J9Class* clazz = !j9class 0x27F1100 // akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4
        Object flags = 0x00000000;
        I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
        Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3; $outer = !fj9object 0xffd60908 (offset = 4) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4)
}

Sanity check: the class matches the class of the j9method currently being executed.

So the object at the top of the stack before the call at 163 is 0xffd60908:

> !fj9object 0xffd60908
!J9Object 0x00000000FFD60908 {
        struct J9Class* clazz = !j9class 0x2587D00 // akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3
        Object flags = 0x00000000;
        I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
        I inCount = 0x00000002 (offset = 48) (akka/stream/stage/GraphStageLogic)
        I outCount = 0x00000002 (offset = 52) (akka/stream/stage/GraphStageLogic)
        I stageId = 0x00000008 (offset = 56) (akka/stream/stage/GraphStageLogic)
        Lakka/stream/Attributes; attributes = !fj9object 0xffd608d8 (offset = 12) (akka/stream/stage/GraphStageLogic)
        Lakka/stream/stage/GraphStageWithMaterializedValue; originalStage = !fj9object 0xf0558918 (offset = 16) (akka/stream/stage/GraphStageLogic)
        [Ljava/lang/Object; handlers = !fj9object 0xffd60958 (offset = 20) (akka/stream/stage/GraphStageLogic)
        [Lakka/stream/impl/fusing/GraphInterpreter$Connection; portToConn = !fj9object 0xffd60970 (offset = 24) (akka/stream/stage/GraphStageLogic)
        Lakka/stream/impl/fusing/GraphInterpreter; _interpreter = !fj9object 0xff989158 (offset = 28) (akka/stream/stage/GraphStageLogic)
        Ljava/lang/Throwable; lastCancellationCause = !fj9object 0x0 (offset = 32) (akka/stream/stage/GraphStageLogic)
        Lscala/collection/immutable/List; callbacksWaitingForInterpreter = !fj9object 0xeddd8138 (offset = 36) (akka/stream/stage/GraphStageLogic)
        Ljava/util/concurrent/atomic/AtomicReference; akka$stream$stage$GraphStageLogic$$asyncCallbacksInProgress = !fj9object 0xffd60988 (offset = 40) (akka/stream/stage/GraphStageLogic)
        Lakka/stream/stage/GraphStageLogic$StageActor; _stageActor = !fj9object 0x0 (offset = 44) (akka/stream/stage/GraphStageLogic)
        J asyncCleanupCounter = 0x0000000000000000 (offset = 4) (akka/stream/stage/GraphStageLogic)
        Lscala/collection/immutable/Queue; akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts = !fj9object 0xffd60998 (offset = 60) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)
        Lakka/stream/stage/AsyncCallback; akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback = !fj9object 0xffd609b8 (offset = 64) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)
        Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; $outer = !fj9object 0xf0558918 (offset = 68) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)
}

According to bytecodes in #9710 (comment), the field that the JIT’d code should return is at field offset 0x3c == 60 which is

Lscala/collection/immutable/Queue; akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts = !fj9object 0xffd60998 (offset = 60) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)

However, from the stackslots from way above, the object on the java stack that gets passed to the call at 168 (ie the receiver) is 0x00000008. If you look at 0xffd60908 carefully, you’ll see that theres this field:

I stageId = 0x00000008 (offset = 56) (akka/stream/stage/GraphStageLogic)

which is what you get if you just do 0xffd60908+0x3c. The field load is off by 4 bytes.

I believe I’ve reached the limit of what I can do with this corefile. I shall have to take a look at the second core @edrevo provided, since I can take a look at the rtlog and see how we passed validation. It’s obvious the shape of the class is different from what the AOT’d method expects, so now will have to determine what validation we missed.

@dsouzai

I took a look latest set of cores.

bytecodes of the method that was AOT’d:

> !bytecodes 0x000000000272deb0
  Name: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts
  Signature: ()Lscala/collection/immutable/Queue;
  Access Flags (3050201): public
  Max Stack: 1
  Argument Count: 1
  Temp Count: 0

    0 aload0getfield
    1 getfield 8 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Lscala/collection/immutable/Queue;
    4 return1
...

When I run printAllStats on the SCC @edrevo provided:

...
1: 0x00007FF040D26290 ROMCLASS: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.!STALE!
        Index 36 in classpath 0x00007FF041591C04
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts_$eq Signature: (Lscala/collection/immutable/Queue;)V Address: 0x00007FF02531D990
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback Signature: ()Lakka/stream/stage/AsyncCallback; Address: 0x00007FF02531D9BC
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$$outer Signature: ()Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; Address: 0x00007FF02531D9E0
        ROMMETHOD: $anonfun$callback$2 Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Z Address: 0x00007FF02531DA00
        ROMMETHOD: $anonfun$callback$3 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;)V Address: 0x00007FF02531DA40
        ROMMETHOD: $anonfun$callback$1 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)V Address: 0x00007FF02531DA68
        ROMMETHOD: <init> Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport;)V Address: 0x00007FF02531DB2C
        ROMMETHOD: $anonfun$callback$2$adapted Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Ljava/lang/Object; Address: 0x00007FF02531DBC4
        ROMMETHOD: $anonfun$callback$1$adapted Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)Ljava/lang/Object; Address: 0x00007FF02531DBF8
        ROMMETHOD: $deserializeLambda$ Signature: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; Address: 0x00007FF02531DC2C
...
7: 0x00007FF03E6F6A14 ROMCLASS: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.!STALE!
        Index 79 in classpath 0x00007FF03E707088
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts_$eq Signature: (Lscala/collection/immutable/Queue;)V Address: 0x00007FF02531D990
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback Signature: ()Lakka/stream/stage/AsyncCallback; Address: 0x00007FF02531D9BC
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$$outer Signature: ()Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; Address: 0x00007FF02531D9E0
        ROMMETHOD: $anonfun$callback$2 Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Z Address: 0x00007FF02531DA00
        ROMMETHOD: $anonfun$callback$3 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;)V Address: 0x00007FF02531DA40
        ROMMETHOD: $anonfun$callback$1 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)V Address: 0x00007FF02531DA68
        ROMMETHOD: <init> Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport;)V Address: 0x00007FF02531DB2C
        ROMMETHOD: $anonfun$callback$2$adapted Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Ljava/lang/Object; Address: 0x00007FF02531DBC4
        ROMMETHOD: $anonfun$callback$1$adapted Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)Ljava/lang/Object; Address: 0x00007FF02531DBF8
        ROMMETHOD: $deserializeLambda$ Signature: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; Address: 0x00007FF02531DC2C
...
1: 0x00007FF03C061484 AOT: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
        for ROMClass akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.
...
8: 0x00007FF03AFD229C ROMCLASS: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.
        Index 68 in classpath 0x00007FF03B1EFA28
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts_$eq Signature: (Lscala/collection/immutable/Queue;)V Address: 0x00007FF02531D990
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback Signature: ()Lakka/stream/stage/AsyncCallback; Address: 0x00007FF02531D9BC
        ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$$outer Signature: ()Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; Address: 0x00007FF02531D9E0
        ROMMETHOD: $anonfun$callback$2 Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Z Address: 0x00007FF02531DA00
        ROMMETHOD: $anonfun$callback$3 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;)V Address: 0x00007FF02531DA40
        ROMMETHOD: $anonfun$callback$1 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)V Address: 0x00007FF02531DA68
        ROMMETHOD: <init> Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport;)V Address: 0x00007FF02531DB2C
        ROMMETHOD: $anonfun$callback$2$adapted Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Ljava/lang/Object; Address: 0x00007FF02531DBC4
        ROMMETHOD: $anonfun$callback$1$adapted Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)Ljava/lang/Object; Address: 0x00007FF02531DBF8
        ROMMETHOD: $deserializeLambda$ Signature: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; Address: 0x00007FF02531DC2C
...

The ROMMethod of akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; always has the same address. I would have thought that if a class in the SCC became stale, the new class would have a new ROMClass address (and therefore all of its methods would have new J9ROMMethod addresses). How come the J9ROMMethod address stays the same? @hangshao0 or @pshipton do you guys know?

The other point of concern is that the problem here is that we’re loading from the constant pool of the receiver (i.e. the class of the AOT’d method). I was under the impression that we validate the class of the AOT’d method being loaded, but it doesn’t look like we do; I think the assumption has always been that if vmThread->javaVM->sharedClassConfig->findCompiledMethodEx1 does not return NULL, then the shape of the ROMClass of the AOT’d method being loaded hasn’t changed. I guess depending on the answer to the previous question, we may have to store the class chain of the class of the AOT’d method. However, to me it seems more intuitive that all AOT bodies of a class should be marked as stale if the class is marked as stale.

@pshipton

A class becomes stale when the classpath changes. This forces the class to be loaded from the classpath (jar) again (which may result in storing new metadata into the cache for the new classpath, if there is room), but if the ROM class loaded is the same as the one already stored in the shared cache, the same ROM class in the cache continues to be used. Classpath changes can change the RAM class, while continuing to use the same ROM class.

@dsouzai

Hm, in this case, once the class is stale, the AOT code ends up being off by 4 bytes when trying to load the field from an object. That sounds like the layout of the romclass changed doesn’t it?

@pshipton

Could one of the superclasses have affected that? The ROM class only records the name of the superclass, and it’s own fields, it doesn’t control the final layout of the Object. The layout is controlled by the RAM class hierarchy. i.e. perhaps one of the superclass ROM classes changed.

@dsouzai

Yeah you’re right:

>  !j9class 0x000000000272D600
J9Class at 0x272d600 {
  Fields for J9Class:
        0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
        0x8: struct J9ROMClass* romClass = !j9romclass 0x00007F18A131D538
        0x10: void** superclasses = !j9x 0x000000000272D290
        0x18: UDATA classDepthAndFlags = 0x00000000020E0002 (34471938)
...
Class name: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3
(gdb) x/2gx 0x000000000272D290
0x272d290:      0x000000000003fd00      0x00000000010b1e00
> !j9class 0x00000000010b1e00
J9Class at 0x10b1e00 {
  Fields for J9Class:
        0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
        0x8: struct J9ROMClass* romClass = !j9romclass 0x00007F18A40B9A68
...
Class name: akka/stream/stage/GraphStageLogic
>  !shrc findclass akka/stream/stage/GraphStageLogic
!j9sharedclassconfig 0x00007F18FC183A90

Looking for class "akka/stream/stage/GraphStageLogic"
Meta data region to be used: 0x00007F18B6E74D54..0x00007F18BD711FFC
1: 0x00007F18BCD28C90 ORPHAN: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210
1: 0x00007F18BCD28C64 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
        Index 117 in !shrc classpath 0x00007F18BD591C04
2: 0x00007F18BCB4F90C ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
        Index 157 in !shrc classpath 0x00007F18BCB81C40
3: 0x00007F18BC87638C ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
        Index 90 in !shrc classpath 0x00007F18BC910D5C
4: 0x00007F18BC39F940 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
        Index 157 in !shrc classpath 0x00007F18BC3A0FBC
5: 0x00007F18BC279F04 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
        Index 240 in !shrc classpath 0x00007F18BC2F25BC
6: 0x00007F18BB06451C ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
        Index 161 in !shrc classpath 0x00007F18BB0652C4
7: 0x00007F18BA75F6C4 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
        Index 177 in !shrc classpath 0x00007F18BA7A4A4C
8: 0x00007F18B7139EF8 ORPHAN: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A40B9A68
8: 0x00007F18B7139ECC ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A40B9A68
        Index 188 in !shrc classpath 0x00007F18B71EFA28

@dsouzai

@dsouzai

@edrevo

Many thanks for jumping on the issue and fixing it! Do you know which version of Openj9 will contain the fix?

@pshipton

It’s missed the 0.21.0 release for July, the next release for Java 11 is the 0.23.0 release in Oct.

@pshipton

Содержание

  1. MiniMix 5m test fails with Segmentation error vmState=0x00000000 on AIX JDK11 OpenJ9 build #12120
  2. Comments
  3. [aix_ppc-64] SE80_GIT TestSunAttachClasses_SE80_0 — Segmentation error vmState=0x00000000 #11839
  4. Comments
  5. Failure link
  6. Optional info
  7. Failure output (captured from console output)
  8. Unhandled exception Using native Sass compiler #5961
  9. Comments
  10. Java -version output
  11. Summary of problem
  12. :compileJava UP-TO-DATE :buildCSS Using native Sass compiler Unhandled exception Type=Segmentation error vmState=0x00040000 Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FF8FBD72F83 ContextFlags=0010005f Handler1=00007FF8FBD6BAB0 Handler2=00007FF91C23CA30 InaccessibleReadAddress=0000000000000010 RDI=0000000002AB9A00 RSI=0000000000000000 RAX=000000000262EA08 RBX=00000000122B74E8 RCX=0000000002AB9A00 RDX=000000000FA94128 R8=0000000000000000 R9=00007FF8FBE41D20 R10=0000000000000001 R11=00007FF8FBE41D20 R12=000000000FA94128 R13=00000000008D5B60 R14=00000000000000B2 R15=0000000000000001 RIP=00007FF8FBD72F83 RSP=000000000262E990 RBP=0000000000000000 GS=002B FS=0053 ES=002B DS=002B XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+000) XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+000) Module=C:Program FilesAdoptOpenJDKjdk-8.0.212.04-openj9jrebincompressedrefsj9vm29.dll Module_base_address=00007FF8FBCF0000 Offset_in_DLL=0000000000082f83 Target=2_90_20190521_368 (Windows 10 10.0 build 17763) CPU=amd64 (8 logical CPUs) (0x3f8732000 RAM) ———— Stack Backtrace ———— (0x00007FF8FBD72F83 [j9vm29+0x82f83]) Java_org_bridj_JNI_bindJavaMethodsToCFunctions+0x255 (0x00007FF913DC68E5 [bridj+0x68e5]) J9_CreateJavaVM+0x8b7c3 (0x00007FF8FBE02773 [j9vm29+0x112773]) J9_CreateJavaVM+0x8b39c (0x00007FF8FBE0234C [j9vm29+0x11234c]) (0x00007FF8FBD05430 [j9vm29+0x15430])
  13. BridJ 0.7-SNAPSHOT gives segmentation error on IBM JRE #38
  14. Comments
  15. Test_openjdk12_j9_extended.system_s390x_linux MathLoadTest JIT crash #6248
  16. Comments

MiniMix 5m test fails with Segmentation error vmState=0x00000000 on AIX JDK11 OpenJ9 build #12120

Test

MixedLoadTest -test-args=»timeLimit=30m»
Variation: Mode554
JVM_OPTIONS: —add-opens java.base/java.lang=ALL-UNNAMED —add-opens java.base/java.io=ALL-UNNAMED -Xcompressedrefs -Xgcpolicy:balanced -XXgc:tarokEnableExpensiveAssertions,fvtest_tarokPGCRotateCollectors -Xjit:count=0,optlevel=hot,gcOnResolve,rtResolve

Java Version

Error details copied from console output

To reproduce

Run grinder with following settings:

The text was updated successfully, but these errors were encountered:

Started a 5x grinder at internal Jenkins, Grinder/14023/

interesting . crash in replaceTrampoline

@zl-wang : Please note, the problem was reproduced 3/5 times in hyc-runtimes-jenkins/Grinder/14023 .

Another JIT crash is observed while running the same test in Aarch64 as well :

Please advise if this requires a different issue.

Separate issue please .
I didn’t see this one’s back trace, and thus not sure if it crashed at similar place.

So it seems like there’s a problem with the machine config that is preventing the generation of a corefile.

You are right, the core file was missing in that zip.

Could you please visit the internal Grinder/14023/console ?

If you scroll down to the bottom, you will see a link that ends with Grinder/14023/system_test_output.tar.gz , that will contain the core files of this defect too.

The problem is occuring due to a crash inside of OMR::CodeCache::replaceTrampoline . Near the start of replaceTrampoline is a lookup into _resolvedMethodHT :
https://github.com/eclipse/omr/blob/165a4a81709504806f14bd8e540d97962cafe5f5/compiler/runtime/OMRCodeCache.cpp#L614

The method can’t be found in _resolvedMethodHT so NULL is returned and stored to entry . A bit later on there are load/stores after dereferencing entry which causes the crash.

If you look at those lines resolveHashEntry is called if the method is found in _unresolvedMethodHT but is not inside _resolvedMethodHT . As it turns out, the lookup into _unresolvedMethodHT also can’t find the entry. But this turns out to be a mistake due to a slight difference in how the method was added to the hashtable and how the lookup was performed.

The lookup inside adjustTrampolineReservation uses the the Constant Pool pointer cp and the Constant Pool Index cpIndex to do the look up. cpIndex has the type int32_t but it was orginally passed in as an unsigned 18-bit value. That’s well within the positive range of an int32_t . So here, cpIndex is positive.

The problem is that when the entry was added to _unresolvedMethodHT , a negative cpIndex was used.

The cpIndex is stored as an signed 18-bit bit field. 0x20000 in binary is 10 0000 0000 0000 0000 . This means that 0x20000 is interpreted as -131072 and sign extended when cast to a I_32 .

The key is different. Due the original 18-bit cpIndex being sign extended when used to add the entry and being zero extended when used to look up the entry, the entry can not be found.

I am currently looking at making a fix where the sign on cpIndex is consistent for all access (add, delete, lookup) to the hashtables.

This is a common issue so it should be a problem on X and Z as well not just Power and Arm.

Источник

[aix_ppc-64] SE80_GIT TestSunAttachClasses_SE80_0 — Segmentation error vmState=0x00000000 #11839

Failure link

From an internal build ibuild 464729 :

Optional info

Failure output (captured from console output)

The native stack trace looks similar with an old issue #9495

The text was updated successfully, but these errors were encountered:

This has been failing since the test was apparently un-excluded on 2020-08-04. There is a core.
@tajila

This has been failing since the test was apparently un-excluded on 2020-08-04.

Just note that it was marked by a problem tracker (PT) for a different failure, and discovered when the PT wasn’t propagated properly in latest ibuild.

Assuming the crash is using an uninitialized or bad monitor, the likely caller is:

The value is initialized in the core:

The attach thread is forked from java, so I don’t see how there could be a race condition where the CTM is uninitialized.

I believe the problem is that there is a duplicated libj9thr29.so in the memory, one for the default and one for the compresedrefs. We were running in compressedrefs, and the crash happened with the default libj9thr29.so .

cd /service/issues/11839/core
/service/serviceStore/xa.ibm.sh core.20210222.233314.5701912.0001.dmp

/> context
Available contexts (* = currently selected context) :

Source : file:///service/issues/11839/core/core.20210222.233314.5701912.0001.dmp
*0 : PID: 0 :
JRE 1.8.0 AIX ppc64-64 (build 8.0.7.0 — pap6480sr7-20210122_01(SR7))

IBM J9 VM(JRE 1.8.0 AIX ppc64-64-Bit Compressed References 20210222_466181 (JIT enabled, AOT enabled)
OpenJ9 — bad91dc
OMR — 83a1e0b
IBM — 358762e)

/> info mod | grep -b libj9thr29.so
/bluebird/builds/bld_466181/sdk/ap6480/jre/bin/../lib/ppc64/default/libj9thr29.so @ 0x90000000515a000, sections:
0x90000000515a000 — 0x900000005174209, name: «.text», size: 0x1a209
0x9001000a0710118 — 0x9001000a0712180, name: «.data», size: 0x2068

/> gpinfo | charsfrom IAR
IAR=0900000005161838 LR=090000000515E2B0 MSR=A00000000000D032 CTR=090000000053ACE0
/> whichmod 0900000005161838
Address (0x0900000005161838) is found in the following entries from the output from command «info mod»
—/bluebird/builds/bld_466181/sdk/ap6480/jre/bin/../lib/ppc64/default/libj9thr29.so @ 0x90000000515a000, sections:
0x90000000515a000 — 0x900000005174209, name: «.text», size: 0x1a209
0x9001000a0710118 — 0x9001000a0712180, name: «.data», size: 0x2068

Источник

Unhandled exception Using native Sass compiler #5961

Java -version output

openjdk version «1.8.0_212»
OpenJDK Runtime Environment (build 1.8.0_212-b04)
Eclipse OpenJ9 VM (build openj9-0.14.2, JRE 1.8.0 Windows 10 amd64-64-Bit Compressed References 20190521_368 (JIT enabled, AOT enabled)
OpenJ9 — 4b1df46
OMR — b56045d2
JCL — a8c217d402 based on jdk8u212-b04)

Summary of problem

I replaced the Oracle JDK with the OpenJ9 indicated above. I use the Liferay Developer Studio (based on Eclipse) to compile and build modules for deployment. After the JDK change, A Gradle build of a module fails with this error:

Gradle User Home: C:UsersPete.BSF_INT.gradle
Gradle Distribution: Gradle wrapper from target build
Gradle Version: 3.0
Java Home: C:Program FilesAdoptOpenJDKjdk-8.0.212.04-openj9
JVM Arguments: None
Program Arguments: None
Build Scans Enabled: false
Offline Mode Enabled: false
Gradle Tasks: build

:compileJava UP-TO-DATE
:buildCSS
Using native Sass compiler
Unhandled exception
Type=Segmentation error vmState=0x00040000
Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FF8FBD72F83 ContextFlags=0010005f
Handler1=00007FF8FBD6BAB0 Handler2=00007FF91C23CA30 InaccessibleReadAddress=0000000000000010
RDI=0000000002AB9A00 RSI=0000000000000000 RAX=000000000262EA08 RBX=00000000122B74E8
RCX=0000000002AB9A00 RDX=000000000FA94128 R8=0000000000000000 R9=00007FF8FBE41D20
R10=0000000000000001 R11=00007FF8FBE41D20 R12=000000000FA94128 R13=00000000008D5B60
R14=00000000000000B2 R15=0000000000000001
RIP=00007FF8FBD72F83 RSP=000000000262E990 RBP=0000000000000000 GS=002B
FS=0053 ES=002B DS=002B
XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+000)
XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+000)
Module=C:Program FilesAdoptOpenJDKjdk-8.0.212.04-openj9jrebincompressedrefsj9vm29.dll
Module_base_address=00007FF8FBCF0000 Offset_in_DLL=0000000000082f83
Target=2_90_20190521_368 (Windows 10 10.0 build 17763)
CPU=amd64 (8 logical CPUs) (0x3f8732000 RAM)
———— Stack Backtrace ————
(0x00007FF8FBD72F83 [j9vm29+0x82f83])
Java_org_bridj_JNI_bindJavaMethodsToCFunctions+0x255 (0x00007FF913DC68E5 [bridj+0x68e5])
J9_CreateJavaVM+0x8b7c3 (0x00007FF8FBE02773 [j9vm29+0x112773])
J9_CreateJavaVM+0x8b39c (0x00007FF8FBE0234C [j9vm29+0x11234c])
(0x00007FF8FBD05430 [j9vm29+0x15430])

JVMDUMP039I Processing dump event «gpf», detail «» at 2019/05/30 16:29:38 — please wait.
JVMDUMP032I JVM requested System dump using ‘E:WorkspacesLR70ServiceDashboardcore.20190530.162938.10468.0001.dmp’ in response to an event
JVMDUMP010I System dump written to E:WorkspacesLR70ServiceDashboardcore.20190530.162938.10468.0001.dmp
JVMDUMP032I JVM requested Java dump using ‘E:WorkspacesLR70ServiceDashboardjavacore.20190530.162938.10468.0002.txt’ in response to an event
JVMDUMP010I Java dump written to E:WorkspacesLR70ServiceDashboardjavacore.20190530.162938.10468.0002.txt
JVMDUMP032I JVM requested Snap dump using ‘E:WorkspacesLR70ServiceDashboardSnap.20190530.162938.10468.0003.trc’ in response to an event
JVMDUMP010I Snap dump written to E:WorkspacesLR70ServiceDashboardSnap.20190530.162938.10468.0003.trc
JVMDUMP007I JVM Requesting JIT dump using ‘E:WorkspacesLR70ServiceDashboardjitdump.20190530.162938.10468.0004.dmp’
JVMDUMP010I JIT dump written to E:WorkspacesLR70ServiceDashboardjitdump.20190530.162938.10468.0004.dmp
JVMDUMP013I Processed dump event «gpf», detail «».
:buildCSS FAILED

FAILURE: Build failed with an exception.

  • What went wrong:
    Execution failed for task ‘:buildCSS’.

Источник

BridJ 0.7-SNAPSHOT gives segmentation error on IBM JRE #38

When trying to upgrade my BridJ libraries from 0.6.2 to the latest 0.7 snapshot I run into a segmentation fault using a IBM JRE. I have already used the 0.7 version on the same OS (Linux) running a Oracle JRE with success. I’ve tried to analyse the core dump and here are my findings so far:

The error is of type Segmentation error:

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FCCF33C29A0 Handler2=00007FCCF3061BD0 InaccessibleAddress=0000000000000010
.
/libj9vm26.so
Module_base_address=00007FCCF33A7000
Target=2_60_20140409_195732 (Linux 2.6.32-220.13.1.el6.x86_64)
CPU=amd64 (2 logical CPUs) (0xef927000 RAM)
Stack Backtrace
(0x00007FCCF33DE24C [libj9vm26.so+0x3724c])

And the corresponding core txt file gives me;
1XMCURTHDINFO Current thread
3XMTHREADINFO «main» J9VMThread:0x00000000224B0100, j9thread_t:0x00007FCCEC007BE0, java/lang/Thread:0x00000000025D9CF8, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x1, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x7515, native priority:0x5, native policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range from:0x00007FCCF3775000, to:0x00007FCCF4176000, size:0xA01000)
3XMCPUTIME CPU usage total: 1.623299182 secs
3XMHEAPALLOC Heap bytes allocated since last GC cycle=718152 (0xAF548)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at org/bridj/JNI.bindJavaMethodsToCFunctions(Native Method)
4XESTACKTRACE at org/bridj/NativeEntities.addDefinitions(NativeEntities.java:149)
4XESTACKTRACE at org/bridj/CRuntime.register(CRuntime.java:468)
5XESTACKTRACE (entered lock: org/bridj/CRuntime@0x0000000002557DE0, entry count: 1)
4XESTACKTRACE at org/bridj/CRuntime.register(CRuntime.java:345)
4XESTACKTRACE at org/bridj/BridJ.register(BridJ.java:363)
4XESTACKTRACE at org/bridj/BridJ.register(BridJ.java:203)
5XESTACKTRACE (entered lock: org/bridj/BridJ@0x00000000024CB1E8, entry count: 1)
4XESTACKTRACE at . /jna/TbusLibrary.(TbusLibrary.java:28)
4XESTACKTRACE at java/lang/J9VMInternals.initializeImpl(Native Method)
4XESTACKTRACE at java/lang/J9VMInternals.initialize(J9VMInternals.java:237(Compiled Code))

The IBM snap file indicates that the class org/bridj/CallIO is loaded before the error handling is invoked;

14:25:40.271491000 0x224b0100 j9vm.221 Event sendInitialize — class org/bridj/CallIO
14:25:40.271493000 0x224b0100 j9bcverify.107 Entry >j9bcv_verifyBytecodes — org/bridj/CallIO
14:25:40.271495000 0x224b0100 j9bcverify.50 Entry >checkAllClassLoadingConstraints(ramClass=0x2270d500 — org/bridj/CallIO)
14:25:40.271496000 0x224b0100 j9bcverify.111 Exit sendClinit
14:25:40.271503000 0x224b0100 j9vm.21 Exit 0X224b0100 loader 0x7fccec042ba8 class org/bridj/LastError arbitratedLoadClass enter className
14:25:40.271508000 0x224b0100 j9vm.119 Event send loadClass(org/bridj/LastError), stringObject: 0x2238d1b0 loader: 0x25d64e8
14:25:40.271509000 0x224b0100 j9vm.28 Entry >sendLoadClass
14:25:40.271519000 0x224b0100 j9vm.351 Entry >loader 0x25d5f00 class org/bridj/LastError attemptDynamicClassLoad entry
14:25:40.271528000 0x224b0100 j9vm.353 Exit j9bcv_verifyClassStructure — class: org/bridj/LastError
14:25:40.271849000 0x224b0100 j9bcverify.18 Exit Create RAM class from ROM class 0x7fccec73e8d8 in class loader 0x7fccec042ba8
14:25:40.271936000 0x224b0100 j9vm.80 Event ROM class 0x7fccec73e8d8 is named org/bridj/LastError

I’m not sure how to analyse this error further, I could probably add some debug logging but so far I haven’t seen anything useful in my application logs.

The text was updated successfully, but these errors were encountered:

Источник

Test_openjdk12_j9_extended.system_s390x_linux MathLoadTest JIT crash #6248

The text was updated successfully, but these errors were encountered:

Changes since previous build

I don’t see anything obvious in the diff. Could also be some OMR change as well. I’ll try to take a look at this by EOD.

Could also be some OMR change as well

OMR didn’t change between the builds. Could be a latent intermittent issue then, and we got unlucky to get two failures in the same build.

Tried to reproduce this locally on my z14 Docker container but I’m unable to see a reproduction. It seems like an intermittent problem. I’ll have to work off the Grinders instead and take a look at the core file to see if we can spot what the problem is.

@R-Santhir : perhaps this is a good non-x86 test for the parallel grinders feature. I’m sure it will save Filip some time.

This has to be caused by a recent change.

@R-Santhir : perhaps this is a good non-x86 test for the parallel grinders feature. I’m sure it will save Filip some time.

@R-Santhir I can definitely be a tester for this feature. Catch me on Slack whenever you can to explain how to run parallel and we can give it a shot. I’m going to work on this using existing Grinders in the mean time.

Internal grinder shows 0/20 failure rate. External grinders [1] [2] show 1/30 failure rate. Totaling up I’m observing a 1/50 failure rate for this issue.

Looking at the core file from the most recent failure referenced in #6248 (comment) the method in question ( java/math/BigInteger. ([CII)V ) is a hot method, which may explain the low reproducibility rate. Why we crash is not very clear as kca is having trouble parsing the core files. Regardless, we’ll need a log.

One interesting question is that the javacore reports:

However the method it reports is not found on the backtrace of any thread! How can that be? @pshipton any ideas?

Launched 3×20 grinders to try and collect a log:

However the method it reports is not found on the backtrace of any thread! How can that be?

I thought methods got queued for async compilation, so execution of the method could be finished by the time the compiler does the compilation.

I thought methods got queued for async compilation, so execution of the method could be finished by the time the compiler does the compilation.

This isn’t a crash in a compilation thread. It’s a crash in a JIT compiled method body. Looks like we have a failure in Grinder 421. Downloading the artifact now (it’s very slow) and hopefully can extract some information from the log + core.

Seems weird then, indicating another bug somewhere. Could the method name being printed be incorrect?

Could the method name being printed be incorrect?

It would seem very unlikely given the PSW register indeed does point to java/math/BigInteger. ([CII)V , and R13 points to the crashing J9VMThread* . However when I dump the !stackslots via jdmpview or look at that thread in the javacore I see a totally different backtrace for that J9VMThread* .

Looking at the failure from [1] the crash point is at:

Taking a look at the log:

We tried to load a byte from that location. GPR7 seems to contain some garbage in the high word. From what I can tell GPR7 should be a 64-bit register. Now looking at the trees:

It seems we have skipped the evaluation of the i2l node at n2919n and assigned it virtual register GPR_1545 (the child). The node is not marked as unneededConv , and rightfully so because we do need to perform sign extension here because we are about to use the node within a memory reference. This seems like the reason why we crash. I’m looking into why we skipped evaluation of this sign extension.

Just dropping a quick comment here that I’ve identified the bug in the code and am now working towards the fix. I’ll detail what exactly is wrong in the PR. We should hopefully have a fix tomorrow.

Источник

vipnaik

Posts: 3
Joined: Wed Dec 13, 2017 11:42 am

JVM crash problem on running performance monitoring

For monitoring real time performance of JVM we have activated YourKit agent at WebSphere JVM by adding below mentioned JVM generic arguments.
-agentpath:/data/yourkit/YourKit-JavaProfiler-2017.02/bin/linux-x86-64/libyjpagent.so”. After adding this argument and on invoking profiling JVM gets crashed after few minutes.

In native_stderr.log we are getting below error:-
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F486FB1C270 Handler2=00007F486F426DB0 InaccessibleAddress=0000000D7F0A3F98
RDI=00000007FEF2EE68 RSI=0000000000000014 RAX=0000000D7F0A3F98 RBX=0000000000000000
RCX=000000000000011D RDX=0000000000000000 R8=0000000000000020 R9=00000007FE824458
R10=0000000000000000 R11=D8632EA33DBF6A25 R12=0000000000000020 R13=0000000580005D70
R14=0000000000000000 R15=0000000580000000
RIP=00007F486D7EFB92 GS=0000 FS=0000 RSP=0000000007644C18
EFlags=0000000000010246 CS=0033 RBP=0000000003EE1C00 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000D7F0A3F98
xmm0 00000007fe824458 (f: 4269949952.000000, d: 1.696361e-313)
xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm2 5c5c5c5c5c5c5c5c (f: 1549556864.000000, d: 8.245487e+136)
xmm3 5c5c5c5c5c5c5c5c (f: 1549556864.000000, d: 8.245487e+136)
xmm4 3d7e03121ec5f7f8 (f: 516290560.000000, d: 1.705984e-12)
xmm5 3fe8687be3b574d6 (f: 3820320000.000000, d: 7.627544e-01)
xmm6 3bbcc86800000000 (f: 0.000000, d: 6.095003e-21)
xmm7 4035c2254f4699e3 (f: 1330027008.000000, d: 2.175838e+01)
xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9 3c74800000000000 (f: 0.000000, d: 1.778092e-17)
xmm10 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
xmm11 bec1680000000000 (f: 0.000000, d: -2.074987e-06)
xmm12 3d7dfbbcd7292bf6 (f: 3609799680.000000, d: 1.704356e-12)
xmm13 40357cd0e7026640 (f: 3875694080.000000, d: 2.148756e+01)
xmm14 3c2ce63eab883717 (f: 2877830912.000000, d: 7.833196e-19)
xmm15 40357cd0e7026640 (f: 3875694080.000000, d: 2.148756e+01)
Module=/opt/IBM/WebSphere/AppServer/java_1.8_64/jre/lib/amd64/compressedrefs/libj9jit29.so
Module_base_address=00007F486D593000

Please let me know if anybody else has faced the similar issue.

vipnaik

Posts: 3
Joined: Wed Dec 13, 2017 11:42 am

Re: JVM crash problem on running performance monitoring

Post

by vipnaik » Mon Jan 08, 2018 1:47 pm

Please find the SDK version that we are using can be found below:-
8.0.0.0-WS-WASJavaSDK8-LinuxX64-IFPI89657 : 8.0.0.20171117_1428

CWSDK1004I: Profile base :
CWSDK1006I: PROFILE_COMMAND_SDK = 1.8_64
CWSDK1008I: Node iapnode0 SDK name: 1.8_64
CWSDK1009I: Server server1 SDK name: 1.8_64

As per the recommendation by Yourkit support we have applied latest java patch as you can see above.

fappel

Posts: 2
Joined: Tue Nov 13, 2018 6:59 am

Re: JVM crash problem on running performance monitoring

Post

by fappel » Tue Nov 13, 2018 7:22 am

Hello,

it seems that we are facing a similar problem on an AIX 7.1 with IBM Java 1.8 Version 8.0.3.20 and WebSphere 8.5.5.11 Fixpack Version. The VM crashes while restarting WAS after attaching the agent. The yip log seems to be fine. The excerpt below from the server error log indicates that the segmentation error probably happens in libj9jit28.so:

Unhandled exception
Type=Segmentation error vmState=0x00000000
[…]
Module=/opt/Module_base_address=0900000011F26000
Target=2_80_20161013_322271 (AIX 7.1)
CPU=ppc64 (72 logical CPUs) (0x1c00000000 RAM)WebSphere/AppSystemV855/java_1.8_64/jre/lib/ppc64/compressedrefs/libj9jit28.so

Any help to solve the problem is appreciated.

Kind regards,
Frank Appel

I hit below error while i was trying to make 11gR2 installation on AIX6.

/runInstaller
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 190 MB.   Actual 2030 MB    Passed
Checking swap space: must be greater than 150 MB.   Actual 16384 MB    Passed
Checking monitor: must be configured to display at least 256 colors.    Actual 16777216    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2011-02-10_09-14-33AM. Please wait ...[oracle@test]</oracle11g/setup/11gR1_client_setup/client> Oracle Universal Installer, Version 11.1.0.6.0 Production
Copyright (C) 1999, 2007, Oracle. All rights reserved.

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000033
Handler1=09001000A631EC78 Handler2=09001000A6316538
R0=000000000000013C R1=0FFFFFFFFFFFDB80 R2=0000000112E52348 R3=0000000000000400
R4=070000000082E8B0 R5=0000000111959FE8 R6=00000000000009F8 R7=000000005D16800E
R8=000000005D168008 R9=0000000000000020 R10=0000000112EBE8C0 R11=090000001C9E8C38
R12=09000000158537A0 R13=000000011000EAA0 R14=0000000111AE6558 R15=000000011009D600
R16=0000000110F82790 R17=0000000112E4EFC0 R18=09001000A63223B0 R19=0000000000000002
R20=0000000112EBEA28 R21=0000000111AE6658 R22=000000011009D680 R23=0000000000000000
R24=BADC0FFEE0DDF00D R25=0000000000000000 R26=000000011009D600 R27=0000000110018CB0
R28=0000000000000000 R29=0700000000758CB0 R30=070000000001D540 R31=070000000082E8B0
IAR=090000001C9E8C38 LR=000000011078FC80 MSR=A00000000000D032 CTR=090000001C9E8C38
CR=2258584B20000004 FPSCR=8200400000000000 XER=2000000482004000
FPR0 4040000000000000 (f: 0.000000, d: 3.200000e+01)
FPR1 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
FPR2 4040800000000000 (f: 0.000000, d: 3.300000e+01)
FPR3 3fee666660000000 (f: 1610612736.000000, d: 9.500000e-01)
FPR4 4530000000000000 (f: 0.000000, d: 1.934281e+25)
FPR5 3e02b770efe4a9b3 (f: 4024740352.000000, d: 5.447238e-10)
FPR6 4330080036b5fcbf (f: 917896384.000000, d: 4.512397e+15)
FPR7 0000000082004000 (f: 2181054464.000000, d: 1.077584e-314)
FPR8 8000000000000000 (f: 0.000000, d: -0.000000e+00)
FPR9 40338c0000000000 (f: 0.000000, d: 1.954688e+01)
FPR10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR11 4026000000000000 (f: 0.000000, d: 1.100000e+01)
FPR12 3fe8000000000000 (f: 0.000000, d: 7.500000e-01)
FPR13 4020800000000000 (f: 0.000000, d: 8.250000e+00)
FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/tmp/OraInstall2011-02-10_09-14-33AM/jdk/jre/bin/libj9jit23.so
Module_base_address=090000001C9BE000
Target=2_30_20070131_11312_BHdSMr (AIX 6.1)
CPU=ppc64 (8 logical CPUs) (0x400000000 RAM)
JVMDUMP006I Processing Dump Event "gpf", detail "" - Please Wait.
JVMDUMP007I JVM Requesting System Dump using '/tmp/core.20110210.091448.10289326.dmp'
JVMDUMP010I System Dump written to /tmp/core.20110210.091448.10289326.dmp
JVMDUMP007I JVM Requesting Snap Dump using '/tmp/Snap0001.20110210.091448.10289326.trc'
JVMDUMP010I Snap Dump written to /tmp/Snap0001.20110210.091448.10289326.trc
JVMDUMP007I JVM Requesting Java Dump using '/tmp/javacore.20110210.091448.10289326.txt'
JVMDUMP012E Error in Java Dump: /tmp/javacore.20110210.091448.10289326.txt
JVMDUMP013I Processed Dump Event "gpf", detail "".

The solution for this error is run below command before installation

export JAVA_COMPILER=NONE

./runInstaller

Понравилась статья? Поделить с друзьями:
  • Unhandled exception occurred see log for details dead air как исправить
  • Unhandled exception occurred see log for details call of chernobyl как исправить ошибку
  • Unhandled exception in the host library как исправить
  • Unhandled exception has occurred in a component in your application 3ds max как исправить
  • Unhandled exception c0000005 gta vice city как исправить windows 10 64 bit