alert日志遇到 ORA-04030 的分析

2019年9月16日 17:59:34Linux大全评论831 views阅读模式

昨天去一客户那里做巡检，遇到了经典的ora-04030错误，alert日志如下：

Tue Oct 28 09:57:46 2014
Errors in file /Oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177302):
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177302/wmsdb_ora_33358038_i177302.trc
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177303):
ORA-04030: out of process memory when trying to allocate 169040 bytes (pga heap,kgh stack)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177303/wmsdb_ora_33358038_i177303.trc
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177302/wmsdb_ora_33358038_i177302.trc:
ORA-04030: out of process memory when trying to allocate 169040 bytes (pga heap,kgh stack)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177304):
ORA-04030: out of process memory when trying to allocate 160 bytes (pga heap,control file cache)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177304/wmsdb_ora_33358038_i177304.trc
Tue Oct 28 09:57:53 2014
Trace dumping is performing id=[cdmp_20141028095753]
Tue Oct 28 09:57:54 2014
Sweep Incident[177303]: completed
Tue Oct 28 09:57:57 2014
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177302/wmsdb_ora_33358038_i177302.trc:
ORA-04030: out of process memory when trying to allocate 160 bytes (pga heap,control file cache)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177305):
ORA-04030: out of process memory when trying to allocate 16776728 bytes (QERHJ hash-joi,QERHJ Hash Table Entries)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177305/wmsdb_ora_33358038_i177305.trc
Trace dumping is performing id=[cdmp_20141028095802]
......

该客户的数据库是Oracle 11.1.0.7，运行在p550主机上，16个逻辑CPU，16G内存，AIX 6.1 64位系统，共分配了8G内存给SGA，2G内存给PGA，由memory_target=10G参数动态调整，同时，memory_max_target也设置成10G，表示Oracle最多可以占用OS的内存为10G（物理内存的62.5%）。这2个参数是11g新增的，可以动态分配SGA和PGA，而在10g中仅只能通过设置sga_target来动态管理sga中的各内存组件，pga是要另外手动设置的。

SQL> show parameters target

NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
archive_lag_target integer 0
db_flashback_retention_target integer 1440
fast_start_io_target integer 0
fast_start_mttr_target integer 0
memory_max_target big integer 10G
memory_target big integer 10G
pga_aggregate_target big integer 2G
sga_target big integer 8G

网上google了一下，很多关于ora-04030的错误的描述都大同小异，主要由以下几种：
A. 对于32 BIT系统，有SGA 1.7G限制
B. 某些OS系统本身也有一些内存参数限制
C. OS系统本身物理内存+Swap的限制

对于A，由于系统是64 bit的，没有1.7G的限制；
对于B，用ulimit查看的结果为ulimited;
对于C，OS本身的内存为16G，不过用topas查看，内存使用率已经为90%

通常ORA-04030的问题一般是PGA过度分配造成的（对应的操作是sort/hash_join）。从9i开始，pga_aggregate_target指定了所有session总共使用的最大PGA上限，这里为2G。如果该值被设定了则默认的workarea_size_policy=auto, sort_area_size/sort_area_retained_size将被忽略。那么直接减小pga_aggregate_target就能解决一部分ORA-04030问题。

另外，该错误意味着Oracle服务器进程无法从操作系统分配更多内存。该内存由PGA（Program Global Area）组成，其内容取决于服务器配置。对于专用的服务器进程，内存包含堆栈以及用于保存用户会话数据、游标信息和排序区的UGA（User Global Area）。在多线程配置中（共享服务器），UGA被分配在SGA（System Global Area）中，所以在这种配置下UGA不是造成ORA-4030错误的原因。因此，ORA-04030表示进程需要更多内存（堆栈 UGA 或 PGA）来执行其任务。

由于发生了这个错误，因此无法从操作系统分配内存。这个错误可能是进程本身导致的，例如进程需要过多的内存，或者一些其他原因导致操作系统内存被耗尽，例如SGA太大或系统虚拟内存（物理Memory + Swap）中要容纳的进程过多。许多操作系统会对单个进程能够获取的内存量加以限制，以便自我保护。

查阅了一篇官方文档，是这么说的：

APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.3 and later --11.2.0.3版本居然也没有修复这个bug!
Information in this document applies to any platform.
***Checked for currency 24-July-2014***

SYMPTOMS
1. A session crashes with:
ORA-04030:out of process memory when trying to allocate 4194344 bytes(QERHJ hash-join,QERHJ list array)
2. Review of the instance parameters set reveals:
"_pga_max_size"=614400KB

CAUSE
The cause of this problem has been identified in:
Bug:13447197 - ORA-04030: OUT OF PROCESS MEMORY WHEN TRYING TO ALLOCATE 262168 BYTES (QERGH HAS
suspended as related to:
unpublished Bug:9506362 - ORA-04030: OUT OF PROCESS MEMORY (QERHJ HASH-JOI,KLLCQAS:KLLSLTBA)
Both bugs have been suspended due to lack of reproducibility.

--这是一个在11g上的bug

SOLUTION
As the bugs have been suspended due to lack of information to perform analysis, the only possible
workaround is to set the _PGA_MAX_SIZE instance parameter to a smaller value.

--唯一可能的解决方案是设置隐含参数“_PGA_MAX_SIZE”为一个较小的值

This parameter determines the maximum size which can be used for per-process PGA memory. The default value
is 200MB and the range of valid values is from 10MB up to 4TB-1.

--这个参数可以指定为每一个进程分配地PGA内存，缺省值为200M，合法的值为10M~(4T-1)

The per-process PGA memory can be limited by setting the _PGA_MAX_SIZE to a smaller value which internally
forces the hash-join to use the less memory and avoid the ORA-4030, like in:
SQL> alter system set "_pga_max_size"=100M;

--用“_PGA_MAX_SIZE”强制hash-join用更少的内存，来避免ora-04030，如设置为100M。

有关避免此错误的一般建议

1.一些操作需要大量的内存，对于排序问题，减少SORT_AREA_SIZE会有所帮助。Oracle服务器进程会将PGA中的SORT_AREA_SIZE字节分配给排序操作。如果完成搜索需要更多内存，服务器进程将会使用temporary segment。这意味着减少SORT_AREA_SIZE会对需要大量排序操作的查询性能产生影响。
2.对于9i及更高版本，通过将参数WORKAREA_SIZE_POLICY设置为AUTO，以及在初始化文件中指定PGA_AGGREGATE_TARGET的大小，即可启用自动SQL执行内存管理功能。使用自动PGA内存管理将有助于减少发生ORA-04030错误的可能性。注意，OpenVMS操作系统在Oracle 9i版本上不支持PGA_AGGREGATE_TARGET，但是在Oracle 10g版本上是支持的。
3.PL/SQL程序也可分配大量内存，因此可能需要重写应用程序的某些部分。尽管PL/SQL表非常容易使用，但它确实需要在PGA中分配内存。
4.查看optimizer策略，一些访问路径可能会因排序操作、较多行上的函数使用等原因而需要更多内存。
5.在某些操作系统上（例如 Microsoft Windows），可能要降低 SGA 的大小以便于PGA获得更大的内存。
6.确保操作系统和Oracle的内存限制设置合理。
7.确保有足够的可用内存（物理Memory和Swap）。

在CentOS 6.4下安装Oracle 11gR2(x64) http://www.linuxidc.com/Linux/2014-02/97374.htm

Oracle 11gR2 在VMWare虚拟机中安装步骤 http://www.linuxidc.com/Linux/2013-09/89579p2.htm

Debian 下安装 Oracle 11g XE R2 http://www.linuxidc.com/Linux/2014-03/98881.htm