1. 性能报告采集

1.1 STATSPACK报告生成

生成快照

SQL> exec statspack.snap;

建立JOB定期生成快照

SQL> @?/rdbms/admin/spauto.sql

生成报告

SQL> @?/rdbms/admin/spreport

SQL语句显示不完整处理,修改脚本中参数num_rows_per_hash,默认为5。

1.2 AWR报告生成

抓取AWR快照

SQL> exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

本实例

SQL> @?/rdbms/admin/awrrpt

其他实例

SQL> @?/rdbms/admin/awrrpti

RAC相关信息

SQL> @?/rdbms/admin/awrgrpt

1.3 ASH报告生成

SQL> @?/rdbms/admin/ashrpt

1.4 SQL报告生成

SQL> @?/rdbms/admin/awrsqrpt

1.5 ADDM报告生成

SQL> @?/rdbms/admin/addmrpt

1.6 SQL Monitor报告生成

生成脚本

set trimspool on
set arraysize 512
set trim on
set pagesize 0
set linesize 1000k
set long 1000000
set longchunksize 1000000
spool sqlmon.html
select /*+ noparallel */ dbms_sqltune.report_sql_monitor (sql_id=>'input your SQL ID', report_level=>'ALL', type=>'ACTIVE') from dual;
spool off

1.7 ASH信息导出

查询最早的ASH时间点

SQL> SELECT sample_count, oldest_sample_time FROM v$ash_info;

1.7.1 方法一(数据泵)

备份ASH表

create table bak_ash_160920 tablespace users 
as select * from gv$active_session_history 
where sample_time > sysdate - 3/24 ;

导出ASH表

exp \’/ as sysdba \’tables=bak_ash_160920 file=/tmp/bak_ash_160920.dmp

1.7.2 方法二(Trace文件)

导出ASH信息到Trace,Level等级代表导出最近的分钟数,示例为60分钟。

SQL> ALTER SYSTEM SET events 'immediate trace name ashdump level 60';

查看ASH Trace路径

SQL> SELECT value FROM v$diag_info WHERE name = 'Default Trace File';

导入其他数据库方法: 建立表结构

SQL> CREATE TABLE ashdump AS SELECT * FROM SYS.WRH$_ACTIVE_SESSION_HISTORY WHERE rownum < 0;

生成控制文件

$ sed -n '1,/^Step 2:/d;/^Step 3:/,$d;p' ash.trc > ashldr.ctl

加载ASH数据

$ sqlldr \' / as sysdba \' control=ashldr.ctl data=ash.trc errors=1000000

1.8 慢SQL抓取

抓取内存中执行时间超过5秒的SQL

set lin 200 pages 2000
select sql_id,sql_exec_id,count(*) sec from v$active_session_history
where sample_time > to_date('2017-07-28 09:00:00','YYYY-MM-DD HH24:MI:SS')
  and sample_time < to_date('2017-07-28 11:00:00','YYYY-MM-DD HH24:MI:SS')
  and sql_exec_id is not null
group by sql_id,sql_exec_id having count(*) > 5 order by 3 desc;

1.9 ash 定位问题

SELECT NVL(a.event, 'ON CPU') AS event,
       COUNT(*) AS total_wait_time
FROM   v$active_session_history a
WHERE  a.sample_time > SYSDATE - 5/(24*60) -- 5 mins
GROUP BY a.event
ORDER BY total_wait_time DESC;

EVENT                                                            TOTAL_WAIT_TIME
---------------------------------------------------------------- ---------------
db file sequential read                                                      750
log file parallel write                                                       43
log file sync                                                                 42
db file parallel read                                                         32
control file sequential read                                                  22
ON CPU                                                                        21
db file parallel write                                                        21
log file switch (private strand flush incomplete)                              8
Disk file operations I/O                                                       1
control file parallel write                                                    1
buffer busy waits                                                              1

11 rows selected.
SQL>

SELECT NVL(a.event, 'ON CPU') AS event,
       COUNT(*)*10 AS total_wait_time
FROM   dba_hist_active_sess_history a
WHERE  a.sample_time > SYSDATE - 1 
GROUP BY a.event
ORDER BY total_wait_time DESC;

EVENT                                                            TOTAL_WAIT_TIME
---------------------------------------------------------------- ---------------
db file sequential read                                                     3860
ON CPU                                                                      1500
control file sequential read                                                 990
direct path read temp                                                        580
direct path read                                                             560
log file parallel write                                                      280
db file parallel write                                                       270
Disk file operations I/O                                                     240
log file switch completion                                                   150
log file sync                                                                130
db file parallel read                                                        130
.
.
.

26 rows selected.

SQL>

2. EVENT跟踪

https://feed.askmaclean.com/archives/oracle-is-hanging-dont-forget-hanganalyze.html

sqlplus / as sysdba
 oradebug setmypid
 oradebug unlimit
 oradebug hanganalyze 3
 oradebug dump ashdumpseconds 30
 oradebug systemstate 266
 oradebug tracefile_name

2.1 EVENT 10046

ORADEBUG

ORADEBUG SETMYPID;
ORADEBUG EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12;
select * from dual;
ORADEBUG TRACEFILE_NAME;
ORADEBUG EVENT 10046 TRACE NAME CONTEXT OFF;

SQLPLUS

session 级别

alter session set events '10046 trace name context forever,level X';
alter session set events '10046 trace name context off';

system 级别

alter system set events '10046 trace name context forever,level X';
alter system set events '10046 trace name context off';

使用TKPROF工具整理Trace文件

按照执行时间排序输出10个SQL

$ tkprof xx_ora_34471968.trc sort=exeela print=10

2.2EVENT 10053

ORADEBUG

ORADEBUG SETMYPID;
ORADEBUG EVENT 10053 TRACE NAME CONTEXT FOREVER, LEVEL 12;
select * from dual;
ORADEBUG TRACEFILE_NAME;
ORADEBUG EVENT 10053 TRACE NAME CONTEXT OFF;

SQLPLUS

session 级别

alter session set events '10053 trace name context forever,level X';
alter session set events '10053 trace name context off';

system 级别

alter system set events '10053 trace name context forever,level X';
alter system set events '10053 trace name context off';

2.3 ErrorStack

ORADEBUG

oradebug setmypid
oradebug tracefile_name
oradebug event 1000 trace name errorstack level 10;
oradebug event 1000 trace name errorstack off;

2.4 DBMS_MONITOR包

开启跟踪

execute dbms_monitor.client_id_trace_enable (client_id => '<client_identifier>', waits => TRUE, binds => TRUE);

确认跟踪

col primary_id format a56 
select trace_type, primary_id, waits, binds from dba_enabled_traces;

关闭跟踪

execute dbms_monitor.client_id_trace_disable (client_id => '<PRIMARY_ID>');

合并跟踪文件

trcsess output=<output file name> clientid=<client_identifier> service=<service name> *.trc

生成报告

tkprof tracefile outputfile sys=no record=filename

3. DB信息采集

3.1 查看进程PGA消耗

set lin 200 pages 2000
col memory for a25
col spid for a6
col program for a40
SELECT p.spid, s.sid, p.program, substr(n.name,1,25) memory, 
ROUND(s.value/1024/1024) as MBytes
FROM v$sesstat s, v$statname n, v$process p, v$session vs
WHERE s.statistic# = n.statistic#
AND n.name LIKE '%pga memory%'
AND s.sid=vs.sid
AND vs.paddr=p.addr
AND s.value > 20000000 /* --remove this line to view all process size */
order by spid,memory;

输出示例

.1542347985523.png

3.2 Hanganalyze

登录数据库

sqlplus '/ as sysdba'

当无法正常登录数据库时

sqlplus -prelim '/ as sysdba'

单实例

oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait 15s before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit

RAC

oradebug setmypid
oradebug setinst all
oradebug unlimit
oradebug -g def hanganalyze 3
-- Wait 15s before getting the second hanganalyze
oradebug -g def hanganalyze 3
oradebug tracefile_name
exit

11202后,如果生成的Trace文件报错。

HANG ANALYSIS:
ERROR: Can not perform hang analysis dump without a process state object and a session state object.
( process=(nil), sess=(nil) )

解决方法

sqlplus - prelim / as sysdba
oradebug setospid < use an existing process id, for example PMON/SMON>
oradebug hanganalyze 3

3.3 Systemstate

登录数据库

sqlplus '/ as sysdba'

当无法正常登录数据库时

sqlplus -prelim '/ as sysdba'

命令如下

oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
-- Wait 15s before getting the second systemstate
oradebug dump systemstate 266
oradebug tracefile_name
exit

3.4 Processstate

登录数据库

sqlplus '/ as sysdba'

取到PID

SQL> select p.pid,s.program from v$session s,v$process p 
where s.paddr=p.addr and s.sid=121;

命令如下

SQL> oradebug setorapid XX
SQL> oradebug dump processstate 266
SQL> oradebug tracefile_name

3.5 DBMS_MONITOR

按照ACTION进行跟踪

begin
dbms_monitor.serv_mod_act_trace_enable
(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',action_name=>'Automatic Report Flush',
waits => true,binds => true);
end;
/
begin
dbms_monitor.serv_mod_act_trace_disable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Automatic Report Flush');
end;
/

4. GI信息采集

4.1 跟踪SRVCTL命令

export SRVM_TRACE=TRUE
script /tmp/srvctl_trace.txt
srvctl unresponsive command.
exit
export SRVM_TRACE=

4.2 CHM信息收集

ROOT用户打包收集信息

# diagcollection.pl -collect -crshome /space/sys_software/oracle/grid  -chmoshome /space/sys_software/oracle/grid -chmos -incidenttime 09/09/2017 15:10:00 -incidentduration 00:30

GRID用户收集信息

$ oclumon dumpnodeview -n db03 -v -s "2017-09-09 15:10:00" -e "2017-09-09 15:30:00"

5. OS级别跟踪

5.1 系统调用跟踪汇总

1.png
2.png

5.2 Linux

生成跟踪输出

strace -o /tmp/test_sqlplus.txt sqlplus / as sysdba
strace -frT -o /tmp/strace-lsnr.log -p <PID>

生成跟踪输出,包括子进程,输出文件为test_sqlplus.txt.<pid>。

strace -f -o /tmp/test_sqlplus.txt sqlplus / as sysdba

参数:

-f  包括子进程
-ff 每个子进程一个输出文件
-T  包括时间
-o  输出文件

5.3 AIX

生成跟踪输出

truss -aefdDo /tmp/test_sqlplus.txt sqlplus / as sysdba
truss -aefdDo /tmp/test_sqlplus.txt -p <PID>

生成调用时间分析

truss -c sqlplus '/ as sysdba'
truss -c <PID>

查看进程的环境变量

ps eauwww <PID>

5.4 HP-UX

跟踪特定进程

tusc -aef -o /tmp/tusc-lsnr.log -T "%H:%M:%S" -p <PID >