MonthAugust 2014

WAITEVENT: “log file sync” Reference Note (文档 ID 34592.1)

WAITEVENT: “log file sync” Reference Note (文档 ID 34592.1)

Versions:7.0 – 11.1 Documentation: 11g 10g
When a user session(foreground process) COMMITs (or rolls back), the session’s redo information needs to be flushed to the redo logfile. The user session will post the LGWR to write all redo required from the log buffer to the redo log file. When the LGWR has finished it will post the user session. The user session waits on this wait event while waiting for LGWR to post it back to confirm all redo changes are safely on disk.

This may be described further as the time user session/foreground process spends waiting for redo to be flushed to make the commit durable. Therefore, we may think of these waits as commit latency from the foreground process (or commit client generally).

Continue reading

Troubleshooting ‘Log File Sync’ Waits

I have been contacted by one of our customers to provide reference information on troubleshooting Oracle Log File Sync waits.

I think that this information worth short blog post.

Reasons:
Log File Sync waits occur when sessions wait for redo data to be written to disk
typically this is caused by slow writes
or committing too frequently in the application
CPU overburning(very high demand => LGWR on run queue)
improper Operating System configuration(check 169706.1)
BUGs in Oracle(especially with RAC option) and 3rd Party software(like ODM/DISM)

Recommendations:
tune LGWR process to get good throughput, especially when ‘log file parallel write‘ high too:
do not put redo logs on RAID 5 without good write cache
do not put redo logs on Solid State Disk (SSD)
It looks like last recommendatin was based on old experience working with SSD disk, which is obsolete now and even Oracle recommends using SSD disks for REDO logs(1566935.1 Implementing Oracle E-Business Suite 12.1 Databases on Oracle Database Appliance):

“Move REDO log files to +REDO diskgroup on Solid State Disks (SSDs).”

if CPUs are overburned(check runqueue with vmstat):
check for non-oracle system activity, like GZIP or BZIP2 running in business hours…
lower instance’s CPU usage(for example, tune SQL for LIOs)
increase LGWR priority(renice or _high_priority_processes),
decrease COMMITs count for applications with many short transactions
use COMMIT [BATCH] NOWAIT(10g+) when possible
do some processing with NOLOGGING(or may be even with _disable_logging=TRUE if just testing performance benchmark/impact), but think about database recoverability
lower system’s CPU usage or increase LGWR priority
check if there is some 3rd party software, or utilities like RMAN, activity on the same disks as redo logs placed, like trace/systemstate dump files, e.t.c
trace LGWR as the last option for troubleshooting OS/3rd party issues 😉

References:
34592.1 WAITEVENT: “log file sync” Reference Note
34583.1 WAITEVENT: “log file parallel write” Reference Note
1376916.1 Troubleshooting: log file sync’ Waits
223117.1 Troubleshooting I/O-related waits
857576.1 How to Minimise Waits for ‘Log File Sync’
1064487.1 Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)
1318709.1 AIX: Things To Check When Seeing Long Log File Sync Time in 11.2.
1205673.1 ‘Log File Sync’ problem on a Sun Server: A Typical Source for LOGFILE SYNC Performance Problems
1523164.1 SPARC: Reducing High Waits on ‘log file sync’ on Oracle Solaris SPARC by Increasing Priority of Log Writer
13551402.8 High “log file parallel write” and “log file sync” after upgrading 11.2 with Veritas/Symantec ODM
1278149.1 Intermittent Long ‘log file sync’ Waits, LGWR Posting Long Write Times, I/O Portion of Wait Minimal
1229104.1 LOG FILE SYNC WAITS SPIKES DURING RMAN ARCHIVELOG BACKUPS
1462942.1 Adaptive Switching Between Log Write Methods can Cause ‘log file sync’ Waits
Kevin Closson: “Manly Men Only Use Solid State Disk For Redo Logging. LGWR I/O is Simple, But Not LGWR Processing”
Jeremy Schneider: “Adaptive Log File Sync: Oracle, Please Don’t Do That Again”
Riyaj Shamsudee: “Tuning ‘log file sync’ wait events”
Gwen Shapira: “De-Confusing SSD (for Oracle Databases)”
Guy Harrison: “Using Solid State Disk to optimize Oracle databases”
SSD Performance Blog

Adaptive Log File Sync

Disclaimer: Much of what follows is pure speculation on my part. It could be completely wrong, and I’m putting it out there in the hopes that it’ll eventually be proven one way or the other.

The Summary

  • Underscore parameter _use_adaptive_log_file_sync
    • Default value changed in 11.2.0.3 from FALSE to TRUE
    • Dynamic parameter
  • Enables a new method of communication for LGWR to notify foreground processes of commit
    • Old method used semaphores, LGWR had to explicitly “post” every waiting process
    • New method has the FG processes sleep and “poll” to see if commit is complete
    • Advantage is to free LGWR from CPU work required to inform lots of processes about commits
  • LGWR dynamically switches between old and new method based on load and responsiveness
    • Method can switch frequently at runtime, max frequency is 3 switches per minute (configurable)
    • Switch is logged in LGWR tracefile, we have seen several switches per day
  • Few problems in general, possible issues seem to be in RAC and/or the switching process itself
    Continue reading

Clusterware启动顺序

Cluster启动顺序大致可以解释为这样的,
INIT也就是init.ohasd (with respawn),该脚本启动了OHASD进程 (Oracle High Availability Services Daemon).  这个daemon又派生了4个重要进程.

Level 1: OHASD Spawns:

  • cssdagent – Agent responsible for spawning CSSD.
  • orarootagent – Agent responsible for managing all root owned ohasd resources.
  • oraagent – Agent responsible for managing all oracle owned ohasd resources.
  • cssdmonitor – Monitors CSSD and node health (along wth the cssdagent).

Level 2: OHASD rootagent spawns:

  • CRSD – Primary daemon responsible for managing cluster resources.
  • CTSSD – Cluster Time Synchronization Services Daemon
  • Diskmon
  • ACFS (ASM Cluster File System) Drivers

Level 2: OHASD oraagent spawns:

  • MDNSD – Used for DNS lookup
  • GIPCD – Used for inter-process and inter-node communication
  • GPNPD – Grid Plug & Play Profile Daemon
  • EVMD – Event Monitor Daemon
  • ASM – Resource for monitoring ASM instances

Level 3: CRSD spawns:

  • orarootagent – Agent responsible for managing all root owned crsd resources.
  • oraagent – Agent responsible for managing all oracle owned crsd resources.

Level 4: CRSD rootagent spawns:

  • Network resource – To monitor the public network
  • SCAN VIP(s) – Single Client Access Name Virtual IPs
  • Node VIPs – One per node
  • ACFS Registery – For mounting ASM Cluster File System
  • GNS VIP (optional) – VIP for GNS

Level 4: CRSD oraagent spawns:

  • ASM Resouce – ASM Instance(s) resource
  • Diskgroup – Used for managing/monitoring ASM diskgroups.
  • DB Resource – Used for monitoring and managing the DB and instances
  • SCAN Listener – Listener for single client access name, listening on SCAN VIP
  • Listener – Node listener listening on the Node VIP
  • Services – Used for monitoring and managing services
  • ONS – Oracle Notification Service
  • eONS – Enhanced Oracle Notification Service
  • GSD – For 9i backward compatibility
  • GNS (optional) – Grid Naming Service – Performs name resolution

11.2_clusterware

DataGuard的cascading standby(1拖N的模式)

在Oracle11.2.0.2版本后,dataguard支持级联模式传输日志,即日志传输可以从A到B,B到C,B到D,等等,无穷无尽

cascading standby可以分担主库传输日志到多个备库的压力。

一个cascading standby可以传输日志到多达30个standby库,这些库可以使物理standby,逻辑standby,快照standby中的任意一种。

cascading standby会有一定的延迟,因为cascading的动作并不是主库发出的,而是cascading standby库收到主库日志后,本地归档了standby日志,日志才能级联发送到其他standby库,对于前面提到的日志可以从A到B,B到C,B到D,这个cascading standby库就是B库,cascade standby库指C,D。

Continue reading

利用快照(物化视图)同步数据

为了维护整个系统应用表的一致性,可能客户会提出了这样的业务需求,对于系统的应用表采用统一维护,即在一台服务器上维护。例如在位置1(数据库rptdw,用户kr)上维护应用表,其他位置(数据库etldb1,用户etlmon和数据库etldb2,用户etlmon)可以直接使用这些应用表,也就是说在位置1具有对应用表插入、删除和更新的能力,而在其他地方只能有查询的能力。

snapshot快照正好满足此要求,注意此快照非ORA-01555报错里的快照过旧的快照
而是SNAPSHOT,9i以后叫物化视图
Continue reading

oracle11g物理STANDBY做failover

Failover是Dataguard的一个重要功能特性,在重大灾难发生的时候,此特性无疑成为了DBA们的救命稻草。因此对于failover必须牢记切换方法。

failover步骤:
1. 停止日志应用
alter database recover managed standby database cancel;
2. 关闭standby日志传输
alter database recover managed standby database finish force;
Continue reading

© 2019 Init dba

Theme by Anders NorenUp ↑