Finally closed a VOS case that’s been open for over a year which was related to high system consumption caused by running multiple Oracle RDBMS’s on a single system. The observation was 80/90% system cpu consumption from mpstat 1 and the following from lockstat profiling:
Profiling interrupt: 67240 events in 2.168 seconds (31017 events/sec)
Count genr cuml rcnt nsec Hottest CPU+PIL Caller
——————————————————————————-
40920 61% —- 0.00 987 cpu[7] fop_ioctl
40920 61% —- 0.00 987 cpu[7] ioctl
40880 61% —- 0.00 986 cpu[7] read_kstat_data
40248 60% —- 0.00 1077 cpu[7] syscall_trap
38780 58% —- 0.00 947 cpu[2] mutex_vector_enter
32478 48% —- 0.00 947 cpu[5] kstat_hold_bykid
32477 48% —- 0.00 947 cpu[5] kstat_hold
13516 20% —- 0.00 1845 cpu[102] (usermode)
6466 10% —- 0.00 1904 cpu[423] syscall_trap32
6169 9% —- 0.00 926 cpu[3] kstat_rele
4738 7% —- 0.00 1626 cpu[96] thread_start
2420 4% —- 0.00 1359 cpu[135]+11 idle
2317 3% —- 0.00 1464 cpu[135]+11 disp_getwork
2122 3% —- 0.00 2764 cpu[101] fop_read
1388 2% —- 0.00 2510 cpu[101] vx_read
1379 2% —- 0.00 2509 cpu[101] vx_read1
1352 2% —- 0.00 2503 cpu[101] vx_cache_read
1267 2% —- 0.00 2459 cpu[418] trap
1215 2% —- 0.00 3059 cpu[128] fop_write
1082 2% —- 0.00 2339 cpu[418] utl0
——————————————————————————-
Originally I raised CR 6734910 – "kstat_hold doesn’t scale well on large systems" to track this but it seemed as though Oracle could do a better job of utilizing the kstat interface which then resulted in Oracle bug and Patch 8531434 – "KSTAT CALLS BY MMNL/CJQ0 INCUR HIGH SYSTEM CPU WHEN RUNNING NUMEROUS INSTANCES" being logged and fixed for 10.2.0.4.0. 11g doesn’t appear effected by this issue. So to avoid a performance hit whilst running multiple Oracle instances on a single host you can either use the workaround of Oracle parameter _job_queue_internal (e.g. from 5 to 30) and potentially loose granularity of performance statistics or patch.