Good morning,
I am testing a dataguard environment at the moment and I found something in the alert.log, I can’t solve:
WARNING: Heavy swapping observed on system in last 5 mins.
Heavy swapping can lead to timeouts, poor performance, and instance eviction.
Errors in file D:ORACLEdiagrdbmstest_stbytest_stbytracetest_stby_dbrm_4964.trc (incident=209923):
ORA-00700: Interne Warnung, Argumente: [kskvmstatact: excessive swapping observed], [], [], [], [], [], [], [], [], [], [], []
Incident details in: D:ORACLEdiagrdbmstest_stbytest_stbyincidentincdir_209923test_stby_dbrm_4964_i209923.trc
2019-05-24T03:19:40.786866+02:00
Dumping diagnostic data in directory=[cdmp_20190524031940], requested by (instance=1, osid=4964 (DBRM)), summary=[incident=209923].
Sometimes I get an additional message in between:
WARNING: Heavy swapping observed on system in last 5 mins.
Heavy swapping can lead to timeouts, poor performance, and instance eviction.
ERROR: Unable to normalize symbol name for the following short stack (at offset 182):
dbgexProcessError()+398<-dbgePostErrorKGE()+2191<-dbkePostKGE_kgsf()+90<-kgeadse()+457<-kgerinv_internal()+86<-kgesoftnmierr()+768<-kskvmstatact()+1602<-kskdbrmtoutact()+259<-ksbcti..1()+348<-ksbabs()+15825<-ksbrdp()
+1556<-opirip()+1539<-opidrv()+960<-sou2o()+112<-opimai_real()+222<-opimai()+231<-BackgroundThreadStart()+702<-00007FFE7BFA84D4<-00007FFE7C25E851
It seems like, that this situation occurs only to the database, which has the standby role at this moment. The dataguard environment consists of one primary db and one standby db (Standalone, no RAC) which run in maxavailibility mode.
I found some solutions online, which suggest to increase the overall memory or to decrease sga_target or pga_aggregate_target or and pga_aggregate_limit.
Even the trace files suggest this.
I reduced some values. But still the same result. The problem occurs not every hour, but a couple of times in 24 hours.
Here the memory configuration:
memory_max_target big integer 0
memory_target big integer 0
pga_aggregate_limit big integer 10848M
pga_aggregate_target big integer 5424M
sga_target big integer 9600M <— I decreased this value from 17GB for testing
sga_max_size big integer 10016M <— I decreased this value from 17GB for testing
Here some more information:
total physical memory: 32767 MB
bytes swapped in/out in last 5 mins: 779775 KB
cpu utilization: 1%
SGA_TARGET: 9600 MB
PGA_AGGREGATE_LIMIT: 10848 MB
maximum PGA allocated since startup: 373 MB
total PGA currently allocated: 373 MB
PGA_AGGREGATE_TARGET: 5424 MB
ORA-00700: Interne Warnung, Argumente: [kskvmstatact: excessive swapping observed], [], [], [], [], [], [], [], [], [], [], []
Heavy swapping observed in last 5 mins: |pct of total memory][bytes]
total swpin [ 6.94%][2330311K], total swpout [ 0.00%][0K]
vm stats captured every 30 secs for last 5 mins:
swpin:swpout:
[0.02%][7566K][0.00%][0K]
[0.30%][102901K][0.00%][OK]
[0.27%][91913K][0.00%][0K]
[0.08%][29549K][0.00%][0K]
[5.01%][1683433K][0.00%][0K]
[0.51%][172509K][0.00%][0K]
[0.64%][215786K][0.00%][0K]
[0.02%][8386K][0.00%][0K]
[0.02%][8190K][0.00%][0K]
[0.03%][10073K][0.00%][0K]
2019-05-23T18:18:47.069181+02:00
Incident 204890 created, dump file: D:ORACL….
The server uses only 12GB of 32GB.
Do you have any idea, what could be the issue? Any suggestions?
Thank you!
Best regards
IronMike
█ 24.08.2017 19:46
Добрый день!
При выполнении определенной обработки в 1С инталев Корпоративный менджемнт 7.3, 1С вываливается в ошибку:
Ошибки:
———————————————————————————
22.08.2017 15:41:49
Невосстановимая ошибка
Ошибка при выполнении запроса POST к ресурсу /e1cib/logForm:
При этом в алерте:
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x0] [PC:0x143C106F8, kkexGetUStat()+60]
ERROR: Unable to normalize symbol name for the following short stack (at offset 213):
dbgexProcessError()+200<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+2269<-dbkePostKGE_kgsf()+77<-kgeade()+562<-kgerelv()+151<-kgerev()+45<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1862<-sss_xcpt_EvalFilter()+174<-.1.6_8+59<-00007FFEA78A6B96<-00007FFEA78BAB6D<-00007FFEA7859933<-00007FFEA78B9C8A<-kkexGetUStat()+60<-kkexFunc()+241<-kkeOperCost1()+2297<-kkeOperCost1()+665<-kkeOperCost1()+665<-kkeOperCost()+207<-PGOSF692_kkeTraverseVCExpr()+296<-qkebTraverseExpr()+208<-qkebTraverseExpr()+162<-kkeGetVCColumnCostCorrection()+71<-kkePredCost()+453<-kkeFiltPredRankCost()+290<-kkeIdxFiltCosts()+255<-kkeidc()+7041<-kketac()+644<-kkonxc()+1988<-kkotap()+1776<-kkojnp()+3397<-kkocnp()+80<-kkooqb()+1564<-kkoOrExpand()+1155<-kkoqbc()+6884<-apakkoqb()+202<-apaqbdDescendents()+483<-apaqbdList()+79<-apaqbdDescendents()+799<-apaqbdList()+79<-apaqbdDescendents()+296<-apaqbdList()+79<-apaqbdDescendents()+296<-apaqbdList()+79<-apaqbd()+17<-apadrv()+881<-opitca()+2614<-PGOSF521_kksFullTypeCheck()+95
Errors in file D:APPORACLEdiagrdbmsintalevintalevtraceintalev_ora_5492.trc (incident=6179):
ORA-07445: обнаружено прерывание: core dump [kkexGetUStat()+60] [ACCESS_VIOLATION] [ADDR:0x0] [PC:0x143C106F8] [UNABLE_TO_READ] []
Incident details in: D:APPORACLEdiagrdbmsintalevintalevincidentincdir_6179intalev_ora_5492_i6179.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Aug 24 15:07:10 2017
Dumping diagnostic data in directory=[cdmp_20170824150710], requested by (instance=1, osid=5492), summary=[incident=6179].
Thu Aug 24 15:07:13 2017
Sweep [inc][6179]: completed
Sweep [inc2][6179]: completed
Указанные трейсы прилагаю. ТП производителя ответили, что с oracle они новую версию не тестили. У меня, в силу малого опыта идеи закончились. Прошу помощи.
Windows server 2016 Standart
Intel Xeon E3-1270 v5
64 Gb ОЗУ
Oracle DB 11.2.0.4 + ASM
█ 24.08.2017 20:39
Запустите от админа…
Для начала …
█ 24.08.2017 20:48
Аналогично
Thu Aug 24 20:45:22 2017
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x0] [PC:0x143C106F8, kkexGetUStat()+60]
Errors in file D:APPORACLEdiagrdbmsintalevintalevtraceintalev_ora_6800.trc (incident=6180):
ORA-07445: обнаружено прерывание: core dump [kkexGetUStat()+60] [ACCESS_VIOLATION] [ADDR:0x0] [PC:0x143C106F8] [UNABLE_TO_READ] []
Incident details in: D:APPORACLEdiagrdbmsintalevintalevincidentincdir_6180intalev_ora_6800_i6180.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Aug 24 20:46:59 2017
Dumping diagnostic data in directory=[cdmp_20170824204659], requested by (instance=1, osid=6800), summary=[incident=6180].
Thu Aug 24 20:47:01 2017
Sweep [inc][6180]: completed
Sweep [inc2][6180]: completed
█ 24.08.2017 21:28
с разбегу и с телефона могу сразу вопрос задать, когда Винду ребутили в последний раз. Ещё и с ASM… Любите проблемы…
█ 24.08.2017 21:57
ребутили до и после, а по поводу ASM — не вижу более производительной альтернативы для шести HDD в рамках бюджета.
█ 25.08.2017 07:08
проблема не в ASM, а в винде, на которой никто за производительностью не гонится и ценные базы не держит, лишь бы работало. Рейд 10 и ладно.
нештатно ребутился, по бюджету, я так понимаю?
validate structure cascade всем таблицам запроса сделайте
█ 25.08.2017 09:01
Прошу прощения, забыл упомянуть, база свежесозданная, пустая, работает оптимизатор, и эта проблема возникает при начальном заполнении данными (конкретно — публикация плана счетов).
█ 25.08.2017 10:51
если новая база, то предлагаю от ASM избавиться.
█ 26.08.2017 10:10
выдернул бд из ASM при помощи RMAN — ошибка осталась.
пересоздал бд с нуля без ASM и залил в нее дамп средствами 1С — ошибка осталась
создал файловую БД и залил в нее дамп средствами 1С — ошибки нет.
Вы можете, хотябы, вкратце пояснить природу этой ошибки? Из мануалов мало что понял. В поисках чаще всего эта ошибка появляется вместе с ORA-600 на 11.2.0.3 или 11.2.0.1
█ 26.08.2017 19:06
природа этой ошибки именно такая же, как и у ORA-600, даже ora-600 error lookup tool ищет по обоим номерам ошибок… я сейчас с телефона, но на металинке скорее всего сведут к ошибке ASM на винде или самой винды. углубляться нет смысла, на винде, ещё раз повторяю, только игрушечные базы.
Часовой пояс GMT +3, время: 18:33.
Форум на базе vBulletin®
Copyright © Jelsoft Enterprises Ltd.
В случае заимствования информации гипертекстовая индексируемая ссылка на Форум обязательна.
Here i have listed all issue that occurred after 12.2 upgrade from 11.2.0.4
Problem:
12.2 DBCA does not update oratab with the database name in RAC environment.
Solution:
It is expected behaviour in RAC Database creations from 12.2.0.1. Use srvctl to get instance name and oracle home information.
For single instance, database entry gets updated in the /etc/oratab.
12.2 Real Application Cluster DBCA Do Not Update Oratab (Doc ID 2278584.1)
Problem:
Application hangs for one particular user with his own schema when tries to open application
Solution 1:
Deploy new application with this sql build in while log in
alter session set optimizer_features_enable=’11.2.0.4′;
it worked after application deploy with this code.
Solution 2:
Another options is to create a sql patch for sqlid 31xs7d7yj3km4.
How to Create a SQL Patch to add Hints to Application SQL Statements ( Doc ID 1931944.1 )
Problem
Background process SCM0 consuming excessive CPU
Solution
Disable SCM proccess
alter system set “_dlm_stats_collect” = 0 scope = spfile sid = ‘*’;
show parameter _dlm_stats_collect;
12.2 RAC DB Background process SCM0 consuming excessive CPU (Doc ID 2373451.1)
Problem
Acknowledge over PGA limit’ Wait Event
Solution
reset parameter
ALTER SYSTEM SET PGA_AGGREGATE_LIMIT=0 SID=’*’ SCOPE=BOTH;
show parameter pga;
12c: ‘acknowledge over PGA limit’ Wait Event (Doc ID 2138882.1)
Problem
Change .profile
Problem
Check dblink and fix if any issue
Problem
log in issue
Solution
Connect with container database and change parameter
alter system set sec_case_sensitive_logon=false scope=both sid=’*’;
Upgrade set sec_case_sensitive_logon to TRUE –so set to FALSE
Problem
Modify sqlnet.ora for 10g invalid password issue after upgrade
Solution
Modify sqlnet.ora on server side according to client version users are using
SQLNET.ALLOWED_LOGON_VERSION_CLIENT=10
SQLNET.ALLOWED_LOGON_VERSION_SERVER=10
Problem
.NET application return strange result
Solution
.NET application return truncated string with 12c client. But it works with 11g client. In toad it works with 12c though so it strange issue.
Problem
Connection issue who have 11g client –ORA-28040 : No matching authentication protocol
Solution:
Add below entry on server side of sqlnet.ora on both node
SQLNET.ALLOWED_LOGON_VERSION_CLIENT=10
SQLNET.ALLOWED_LOGON_VERSION_SERVER=10
Problem
Invalids in SYS schema – V$XS_SESSION_ROLE in PDB but not in CDB
Solution:
bash-4.2$ sqlplus sys/@db12 as sysdba
SQL> alter session set container=db11g;
Session altered.
SQL> show con_name;
CON_NAME
——————————
db11g
SQL>
SQL> DROP VIEW SYS.V$XS_SESSION_ROLE;
DROP VIEW SYS.V$XS_SESSION_ROLE
*
ERROR at line 1:
ORA-02030: can only select from fixed tables/views
If you get above error then do option 2 recommended by oracle development. At this time it is not documented in oracle
SQL> update obj$ set status = 1 where name = ‘V$XS_SESSION_ROLE’ and subname = ‘$DEPRECATED$’;
1 row updated.
select owner,OBJECT_NAME,OBJECT_TYPE,status from dba_objects where object_name = ‘V$XS_SESSION_ROLE’;
SYS V$XS_SESSION_ROLE VIEW VALID
PUBLIC V$XS_SESSION_ROLE SYNONYM VALID
Problem:
UTL_FILE directory
The UTL_FILE_DIR initialization parameter is deprecated in Oracle Database 12c Release 2 (12.2.0.1) and may not be supported in a future release.
In the past, accessible directories for PL/SQL file I/O were specified in the initialization file using the UTL_FILE_DIR parameter. However, UTL_FILE_DIR access is no longer recommended. Oracle recommends that you instead use the directory object feature, which replaces UTL_FILE_DIR. Directory objects offer more flexibility and granular control to the UTL_FILE application administrator, can be maintained dynamically (that is, without shutting down the database), and are consistent with other Oracle tools.
Solution:
Change all the PL/SQL code with Database Directory
- Check instance, database, version
select * from gv$instance;
select * from gv$database;
select * from gv$version;
Problem
- Alert log error
Buffer Cache Full DB Caching mode changing from FULL CACHING ENABLED to FULL CACHING DISABLED
Full DB Caching disabled: DEFAULT_CACHE_SIZE should be at least 283902 MBs bigger than current size.
db11g(3):Errors in file /acfs_mnt/diag/rdbms/trace/1_j006_17433166.trc:
ORA-12012: error on auto execute of job “abc”.”PRC_JOB”
ORA-29280: invalid directory path
ORA-06512: at “SYS.UTL_FILE”, line 41
ORA-06512: at “SYS.UTL_FILE”, line 478
ORA-06512: at “MED.PRC_JOB”, line 63
Errors in file /acfs_mnt/diag/rdbms///trace/1_j001_9961876.trc:
ORA-12012: error on auto execute of job “SYS”.”ORA$AT_OS_OPT_SY_37″
ORA-20001: Statistics Advisor: Invalid task name for the current user
ORA-06512: at “SYS.DBMS_STATS”, line 47207
ORA-06512: at “SYS.DBMS_STATS_ADVISOR”, line 882
ORA-06512: at “SYS.DBMS_STATS_INTERNAL”, line 20059
ORA-06512: at “SYS.DBMS_STATS_INTERNAL”, line 22201
ORA-06512: at “SYS.DBMS_STATS”, line 47197
Solution
Please refer
ORA-12012 Error on auto execute of job “SYS”.”ORA$AT_OS_OPT_SY_<NN> in 12.2 Database ( Doc ID 2127675.1 )
Next Steps:-
Please run dbms_stats.init_package() in the container database to create the tasks correctly:
$ sqlplus / as sysdba
EXEC dbms_stats.init_package();
Once done, please execute below query ..
column name format A35
set linesize 120
select name, ctime, how_created
from sys.wri$_adv_tasks
where owner_name = ‘SYS’
and name in (‘AUTO_STATS_ADVISOR_TASK’,’INDIVIDUAL_STATS_ADVISOR_TASK’);
Output of the query will look like this:
NAME CTIME HOW_CREATED
———————————– ———- ——————————
AUTO_STATS_ADVISOR_TASK 14-APR-16 CMD
INDIVIDUAL_STATS_ADVISOR_TASK 14-APR-16 CMD
Problem
- db11g(3):ERROR: Unable to normalize symbol name for the following short stack (at offset 494):
dbgexProcessError()+228<-dbgeExecuteForError()+116<-dbgePostErrorKGE()+2056<-dbkePostKGE_kgsf()+88<-kgeade()+444<-kgesev()+148<-kgesec1()+60<-pppjmpl_local()+328<-pppjmp()+708<-pdz2Mz2_Asser
t_Print()+1364<-pdz2M14_AssertO()+176<-pdy3M90_Allocate_One_Constant()+996<-pdy3M02_Allocate_Constants()+1460<-pdy1F82_Write_MCode()+164<-pdy1F01_Driver()+264<-pdw0F82_Run_Code_Gen()+208<-pd
w0F01_Code_Gen()+372<-phpcog()+28<-phpcmp()+1616<-pcicmp0()+472<-kkxcmp0()+968<-rpiswu2()+580<-kkxcmp()+248<-IPRA.$kkpalt()+2892<-kkpalt()+160<-opiexe()+20268<-opiosq0()+4972<-kpooprx()+412<
-kpoal8()+828<-opiodr()+1248<-ttcpip()+616<-opitsk()+1888<-opiino()+948<-opiodr()+1248<-opidrv()+996<-sou2o()+184<-opimai_real()+680<-ssthrdmain()+460<-main()+208<-__start()+112
Errors in file /acfs_mnt/diag/rdbms//trace/_1_ora_53674430.trc (incident=66072) (PDBNAME=db11g):
ORA-06544: PL/SQL: internal error, arguments: [*** ASSERT at file pdy3.c, line 1931; Layout Mismatch – 2 < 16; PKG_FORM_PARM___76748[1625, 1]], [], [], [], [], [], [], []
db11g(3):Incident details in: /acfs_mnt/diag/rdbms///incident/incdir_66072/1_ora_53674430_i66072.trc
Errors in file /acfs_mnt/diag/rdbms///trace/1_ora_53674430.trc (incident=66074) (PDBNAME=db11g):
ORA-06544: PL/SQL: internal error, arguments: [*** ASSERT at file pdy3.c, line 1931; Layout Mismatch – 2 < 16; PKG_FORM_76748[1625, 1]], [], [], [], [], [], [], []
db11g(3):Incident details in: /acfs_mnt/diag/rdbms///incident/incdir_66074/1_ora_53674430_i66074.trc
2018-01-12T10:55:32.311247-06:00
WARNING: too many parse errors, count=600 SQL hash=0x50366ef0
PARSE ERROR: ospid=50987508, error=942 for statement:
2018-01-12T10:55:32.316163-06:00
SELECT av.version_dt FROM rob.application_version av WHERE av.app_cd = :1
Additional information: hd=70001008dae60b8 phd=70001007196ea38 flg=0x101476 cisid=157 sid=157 ciuid=157 uid=157
2018-01-12T10:55:32.321188-06:00
—– PL/SQL Call Stack —–
object line object
handle number name
Solution:
Developer require to change SQL or code
Problem
Redo warning
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
SRDC – Required Diagnostic Data Collection for Redo Log Corruption ( Doc ID 1671530.1 )
Kindly take a look at – An Internal Routine Has Requested A Dump Of Selected Redo – Message Appears In Alert Log ( Doc ID 2295361.1 )
Solution:
ignore it if there is no error after this message. Also monitor log if error occurs sometime
Problem
Partition message in alert log – as per SR
TABLE SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY: ADDED INTERVAL PARTITION SYS_P
These are partitions of an internal table that collects AWR statistics – Alert Log Shows Messages Starting “TABLE WRI$_OPTSTAT_* ADDED INTERVAL PARTITION…” ( Doc ID 2018434.1 ) –
ORA-00060: deadlock resolved; details in file /acfs_mnt/diag/rdbms//1/trace/_1_j002_32178392.trc
2018-03-10T10:12:17.283105-06:00
Errors in file /acfs_mnt/diag/rdbms//_1/trace/_1_j002_32178392.trc:
ORA-12012: error on auto execute of job “SYS”.”ORA$AT_SA_SPC_SY_5115″
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at “SYS.DBMS_SPACE”, line 2741
ORA-06512: at “SYS.DBMS_HEAT_MAP_INTERNAL”, line 716
ORA-06512: at “SYS.DBMS_HEAT_MAP_INTERNAL”, line 1164
ORA-06512: at “SYS.DBMS_HEAT_MAP”, line 228
ORA-06512: at “SYS.DBMS_SPACE”, line 2747
2018-03-10T10:32:13.788957-06:00
SOLUTION / ACTION PLAN
Your issue looks similar to Doc ID 2321020.1
In 12.2 Auto Space Advisor Job Fails With ORA-60 ( Doc ID 2321020.1 )
Until Bug 24687075 gets fixed, you can disable the space advisory job as
workaround:
SQL> EXEC DBMS_AUTO_TASK_ADMIN.DISABLE(‘AUTO SPACE ADVISOR’,NULL, NULL);
It’s not good to experience the exceptional downtime of database. But, it happens some time. What we encountered this time is “Instance terminated by ARC1, pid = 10353”, which brought down one of production databases.
Alert log shows:
ORA-1092 : opitsk aborting process Instance terminated by ARC1, pid = 10353 ...... System state dump requested by (instance=1, osid=29144 (ARC1)), summary=[abnormal instance termination]. System State dumped to trace file /oracle11/diag/rdbms/db1/DB1/trace/DB1_diag_28895.trc Instance terminated by ARC1, pid = 29144 ...... Errors in file /oracle11/diag/rdbms/db1/DB1/trace/DB1_m000_16420.trc: ORA-12751: cpu time or run time policy violation WARNING: aiowait timed out 1 times ERROR: Unable to normalize symbol name for the following short stack (at offset 139): dbgexProcessError()+176<-dbgePostErrorKGE()+1348<-dbkePostKGE_kgsf()+48<-kgeade()+640<-kgerelv()+240<-kgerec4()+80<-kjdgpstackdmp()+892<-_$c1A.kjdglblkrdmpint()+216<-ksikblkrdmpi()+240<-ksqgtlctx()+9760<-ksqgelctx()+800<-kcc_get_enqueue()+544<-kccocx()+716<-kcc_begin_txn_internal()+76<-krsa_cftxn_begin()+2752<-krse_arc_complete()+540<-krse_arc_driver_core()+4940<-krse_arc_driver()+1292<-kcrrwkx()+21916<-kcrrwk()+1560<-ksbabs()+1348<-ksbrdp()+1616<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380 Errors in file /oracle11/diag/rdbms/db1/DB1/trace/DB1_arc1_10353.trc (incident=77612): ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 9876' Incident details in: /oracle11/diag/rdbms/db1/DB1/incident/incdir_77612/DB1_arc1_10353_i77612.trc Errors in file /oracle11/diag/rdbms/db1/DB1/trace/DB1_ora_10256.trc (incident=77940): ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 9876' Incident details in: /oracle11/diag/rdbms/db1/DB1/incident/incdir_77940/DB1_ora_10256_i77940.trc Killing enqueue blocker (pid=9876) on resource CF-00000000-00000000 by (pid=10353) by killing session 438.1 Killing enqueue blocker (pid=9876) on resource CF-00000000-00000000 by (pid=10353) by terminating the process ARC1 (ospid: 10353): terminating the instance due to error 2103
DB1_m000_16420.trc shows:
DDE: Problem Key 'ORA 12751' was flood controlled (0x2) (no incident) ORA-12751: cpu time or run time policy violation ORA-12751: cpu time or run time policy violation KEBM: MMON slave action policy violation. krammonsl_; viol=1; err=12751
Above findings gave us the couple of messages and clues:
- ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by ‘inst 1, osid 9876’
- ORA-12751: cpu time or run time policy violation
- KEBM: MMON slave action policy violation. krammonsl_; viol=1; err=12751
- Killing enqueue blocker (pid=9876) on resource CF-00000000-00000000 by (pid=10353) by terminating the process
Finding no.1 simply guide us to adjust size of redo log files to avoid competition and waiting, which is described in my previous posting “ORA-00494 Error enqueue [CF] held for too long causing database hung”.
Finding no.2 and no.3 brought us to oracle support document “DOC ID 1671412.1: Alert Log Shows ‘ORA-12751: cpu time or run time policy violation’ and Associated MMON Trace Shows ‘KEBM: MMON action policy violation. ‘Block Cleanout Optim, Undo Segment Scan’ viol=1; err=12751’”. The symptoms showing in our system are exactly what this document mentioned.
From this document, “This is due to Bug 9040676 MMON ACTION POLICY VIOLATION. ‘BLOCK CLEANOUT OPTIM, UNDO SEGMENT SCAN’.
Sometimes when there are long and large transactions in the database, MMON starts scanning undo tablespace aggressively, causing the errors and AWR not being generated. The MMON Process is directly related to the AWR; this process is responsible for the collection of statistics for the Automatic Workload Repository (AWR). MMON may suspend actions where there was a large queue of background tasks waiting for service or in a case of server resource exhaustion. This explains why AWR report could not be generated in that period of time.”
The workaround for solving this is to issue command below to disable the cleanout optimization routine,
SQL> alter system set "_smu_debug_mode"=134217728 scope=both;
Finding 4 is a little bit tricky. It looked like killing enqueue blocker caused the termination of instance. Oracle support document “Database Crashes With ORA-00494 (Doc ID 753290.1)” talks about one possible cause that matches the symptoms found in our system.
Cause#1: The lgwr has killed the ckpt process, causing the instance to crash. From the alert.log we can see: The database has waited too long for a CF enqueue, so the next error is reported: ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 38356' Then the LGWR killed the blocker, which was in this case the CKPT process which then causes the instance to crash. Checking the alert.log further we can see that the frequency of redo log files switch is very high (almost every 1 min).
Also, this document offers three solutions:
- re-size redo log files to reduce the contention on the control files
- check the storage used for storing the database as this issue is I/O issue as per collected data
- adjust hidden init parameter _kill_enqueue_blocker
Solution no.1 was applied and also we tried to working solution no. 3. I just simply included the solution no.3 here. That explain how we can do very well.
Solution#3:
This kill blocker interface / ORA-494 was introduced in 10.2.0.4. This new mechanism will kill *any* kind of blocking process, non-background or background.
- The difference will be that if the enqueue holder is a non-background process, even if it is killed, the instance can function without it.
- In case the holder is a background process, for example the LGWR, the kill of the holder leads to instance crash.
If you want to avoid the kill of the blocker (background or non-background process), you can set
_kill_controlfile_enqueue_blocker=false.
This means that no type of blocker will be killed anymore although the resolution to this problem should focus on why the process is holding the enqueue for so long. Also, you may prefer to only avoid killing background processes, since they are vital to the instance, and you may want to allow the killing of non-background blokers.
This has been addressed in a secondary bug – unpublished Bug 7914003 ‘KILL BLOCKER AFTER ORA-494 LEADS TO FATAL BG PROCESS BEING KILLED’ which was closed as Not a bug.
In order to prevent a background blocker from being killed, you can set the following init.ora parameter to 1 (default is 3).
_kill_enqueue_blocker=1
With this parameter, if the enqueue holder is a background process, then it will not be killed, therefore the instance will not crash. If the enqueue holder is not a background process, the new 10.2.0.4 mechanism will still try to kill it.
The reason why the blocker interface with ORA-494 is kept is because, in most cases, customers would prefer crashing the instance than having a cluster-wide hang.
_kill_enqueue_blocker = { 0 | 1 | 2 | 3 }
0. Disables this mechanism and no foreground or background blocker process in enqueue will be killed.
1. Enables this mechanism and only kills foreground blocker process in enqueue while background process is not affected.
2. Enables this mechanism and only kills background blocker process in enqueue.
3. Default value. Enables this mechanism and kills blocker processes in enqueue.
My database crashed on my dad’s birthday.
There’s no correlation to be made there, let alone causation.
“The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.” Steven J Gould.
And if you haven’t read it, Tom Kyte has a great Oracle related article on the matter. His old AskTom link is dead, but here is one from Oracle China — I think the transmission is by carrier pigeon, or a 9000 baud modem, but give it a few minutes. Similar to opening a sensis website, really — except it’s all text.
Now my DBA hat is really small. I can’t open this like that jeff smith fellow.
Nope, the first thing I did was informed my boss that my laptop blue screened — first time on this sucker, not long after I noticed Glassfish gone and the Oracle DB simmering on the ashes.
After a quick layman’s look following a hunch, I found my SYSTEM tablespace was lacking elbow room.
I thought I’d lost the original figures in another crash (opening the lid from hibernation), but I found them in my e-mail to Penny.
SQL> @free sys%
NAME KBytes Used Free Used Largest
—————— ———— ———— —— ————
SYSAUX 768,000 630,848 137,152 82.1 40,960
SYSTEM 1,249,280 1,244,032 5,248 99.6 5,120
With my uneducated eye, I deduced that SYSTEM Used 99.6% was called a «vital clue».
So in another script I had hidden away, some other little tip Penny gave me one day, was to make a bigger canvas, so to speak
alter tablespace system add datafile 'C:appScottoracle11.2.0oradatasw11gSYSTEM02.dbf' SIZE 1000M;
And once Penny got back to me with another suggestion:
select owner,segment_name,segment_type ,bytes/(1024*1024) size_m from dba_segments where tablespace_name = 'SYSTEM' and bytes/(1024*1024)> 1 order by size_m desc / OWNER SEGMENT_NAME SEGMENT SIZE_M ----- --------------- ------- ------ SYS AUD$ TABLE 360 SYS IDL_UB1$ TABLE 288 SYS SOURCE$ TABLE 120 SYS IDL_UB2$ TABLE 40 SYS C_TOID_VERSION# CLUSTER 23 SYS C_OBJ#_INTCOL# CLUSTER 22 SYS ARGUMENT$ TABLE 16 SYS I_SOURCE1 INDEX 15 SYS C_OBJ# CLUSTER 14
She palmed me off to the relevant documentation where I could do some trimming. 11.2.0.1 for those watching at home.
Here’s what I came up with
BEGIN DBMS_AUDIT_MGMT.SET_DEBUG_LEVEL(DBMS_AUDIT_MGMT.TRACE_LEVEL_ERROR); -- didn't work? DBMS_AUDIT_MGMT.init_cleanup( audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL, default_cleanup_interval => 24*7 /* hours */); END; / BEGIN DBMS_AUDIT_MGMT.create_purge_job( audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL, audit_trail_purge_interval => 24*30 /* hours */, audit_trail_purge_name => 'PURGE_ALL_AUDIT_TRAILS', use_last_arch_timestamp => FALSE); END; /
I was originally having problems with seeing the trace, as I didn’t have enough temporary space (or something similar, I lost the actual message) in SYSAUX, so I gave that tablespace another 300M datafile.
Now my free space looks like I have enough room to swing a dinosaur, and I haven’t had any velociraptors opening doors since.
SQL> @free sys%
NAME KBytes Used Free Used Largest
----------------- ------------ ------------ ------ ------------
SYSAUX 1,075,200 1,015,680 59,520 94.5 43,008
SYSTEM 2,273,280 875,840 1,397,440 38.5 1,022,976
------------ ------------ ------------
sum 3,348,480 1,891,520 1,456,960
I saw Jeff Smith’s article about his ORA-3113 issue via twitter. After locating my alert log, I couldn’t track down anything wrong from where I first saw issues, but I’m not used to reading these logs.
Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" ERROR: Unable to normalize symbol name for the following short stack (at offset 199): dbgexProcessError()+193<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+1726<-dbkePostKGE_kgsf()+75<-kgeade()+560<-kgerev()+125<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1869<-sss_xcpt_EvalFilter()+174<-.1.4_5+59<-00000000775A85A8<-00000000775B9D0D<-00000000775A91AF<-00000000775E1278<-kgllkal()+151<-kglLockCursor()+188<-kxsGetLookupLock()+146<-kkscsCheckCursor()+326<-kkscsSearchChildList()+1067<-kksfbc()+12294<-kkspsc0()+2117<-kksParseCursor()+181<-opiosq0()+2538<-kpooprx()+357<-kpoal8()+940<-opiodr()+1662<-PGOSF523_kpoodrc()+32<-rpiswu2()+2757<-kpoodr()+717<-xupirtrc()+2739<-upirtrc()+124<-kpurcsc()+150<-kpuexec()+9766<-OCIStmtExecute()+70<-kewrose_oci_stmt_exec()+79<-kewrgwxf1_gwrsql_exft_1()+407<-kewrgwxf_gwrsql_exft()+620<-kewrews_execute_wr_sql()+72<-kewrftbs_flush_table_by_sql()+210<-kewrft_flush_table()+150<-kewrftec_flush_table_ehdlcx()+454<-kewrfat_flush_all_tables()+1021<-kewrfos_flush_onesnap()+167<-kewrfsc_flush_snapshot_c()+613<-kewrafs_auto_flush_slave()+548<-kebm_slave_main()+856<-ksvrdp()+2506<-opirip()+965<-opidrv()+909<-sou2o()+98 Sun Feb 26 19:16:53 2012 Errors in file c:appscottdiagrdbmssw11gsw11gtracesw11g_smon_7520.trc (incident=73309): ORA-00600: internal error code, arguments: [25027], [2], [2965385640], [], [], [], [], [], [], [], [], [] Errors in file c:appscottdiagrdbmssw11gsw11gtracesw11g_m002_7944.trc (incident=73461): ORA-07445: exception encountered: core dump [kgllkal()+151] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFFFFFFFFFF] [PC:0x92A2D07] [UNABLE_TO_READ] [] Incident details in: c:appscottdiagrdbmssw11gsw11gincidentincdir_73309sw11g_smon_7520_i73309.trc Incident details in: c:appscottdiagrdbmssw11gsw11gincidentincdir_73461sw11g_m002_7944_i73461.trc Non-fatal internal error happenned while SMON was doing cursor transient type cleanup. SMON encountered 1 out of maximum 100 non-fatal internal errors. Sun Feb 26 19:17:01 2012 Trace dumping is performing id=[cdmp_20120226191701]
Maybe it might help someone on day, or someone can add further detail.
References
- CNOUG’s Archiver
- Steven J Gould Wiki
- Auditing Enhancements 11gR2
- That Jeff Smith