CategoryLGWR

Troubleshooting ‘Log File Sync’ Waits

I have been contacted by one of our customers to provide reference information on troubleshooting Oracle Log File Sync waits.

I think that this information worth short blog post.

Reasons:
Log File Sync waits occur when sessions wait for redo data to be written to disk
typically this is caused by slow writes
or committing too frequently in the application
CPU overburning(very high demand => LGWR on run queue)
improper Operating System configuration(check 169706.1)
BUGs in Oracle(especially with RAC option) and 3rd Party software(like ODM/DISM)

Recommendations:
tune LGWR process to get good throughput, especially when ‘log file parallel write‘ high too:
do not put redo logs on RAID 5 without good write cache
do not put redo logs on Solid State Disk (SSD)
It looks like last recommendatin was based on old experience working with SSD disk, which is obsolete now and even Oracle recommends using SSD disks for REDO logs(1566935.1 Implementing Oracle E-Business Suite 12.1 Databases on Oracle Database Appliance):

“Move REDO log files to +REDO diskgroup on Solid State Disks (SSDs).”

if CPUs are overburned(check runqueue with vmstat):
check for non-oracle system activity, like GZIP or BZIP2 running in business hours…
lower instance’s CPU usage(for example, tune SQL for LIOs)
increase LGWR priority(renice or _high_priority_processes),
decrease COMMITs count for applications with many short transactions
use COMMIT [BATCH] NOWAIT(10g+) when possible
do some processing with NOLOGGING(or may be even with _disable_logging=TRUE if just testing performance benchmark/impact), but think about database recoverability
lower system’s CPU usage or increase LGWR priority
check if there is some 3rd party software, or utilities like RMAN, activity on the same disks as redo logs placed, like trace/systemstate dump files, e.t.c
trace LGWR as the last option for troubleshooting OS/3rd party issues 😉

References:
34592.1 WAITEVENT: “log file sync” Reference Note
34583.1 WAITEVENT: “log file parallel write” Reference Note
1376916.1 Troubleshooting: log file sync’ Waits
223117.1 Troubleshooting I/O-related waits
857576.1 How to Minimise Waits for ‘Log File Sync’
1064487.1 Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)
1318709.1 AIX: Things To Check When Seeing Long Log File Sync Time in 11.2.
1205673.1 ‘Log File Sync’ problem on a Sun Server: A Typical Source for LOGFILE SYNC Performance Problems
1523164.1 SPARC: Reducing High Waits on ‘log file sync’ on Oracle Solaris SPARC by Increasing Priority of Log Writer
13551402.8 High “log file parallel write” and “log file sync” after upgrading 11.2 with Veritas/Symantec ODM
1278149.1 Intermittent Long ‘log file sync’ Waits, LGWR Posting Long Write Times, I/O Portion of Wait Minimal
1229104.1 LOG FILE SYNC WAITS SPIKES DURING RMAN ARCHIVELOG BACKUPS
1462942.1 Adaptive Switching Between Log Write Methods can Cause ‘log file sync’ Waits
Kevin Closson: “Manly Men Only Use Solid State Disk For Redo Logging. LGWR I/O is Simple, But Not LGWR Processing”
Jeremy Schneider: “Adaptive Log File Sync: Oracle, Please Don’t Do That Again”
Riyaj Shamsudee: “Tuning ‘log file sync’ wait events”
Gwen Shapira: “De-Confusing SSD (for Oracle Databases)”
Guy Harrison: “Using Solid State Disk to optimize Oracle databases”
SSD Performance Blog

Adaptive Log File Sync

Disclaimer: Much of what follows is pure speculation on my part. It could be completely wrong, and I’m putting it out there in the hopes that it’ll eventually be proven one way or the other.

The Summary

  • Underscore parameter _use_adaptive_log_file_sync
    • Default value changed in 11.2.0.3 from FALSE to TRUE
    • Dynamic parameter
  • Enables a new method of communication for LGWR to notify foreground processes of commit
    • Old method used semaphores, LGWR had to explicitly “post” every waiting process
    • New method has the FG processes sleep and “poll” to see if commit is complete
    • Advantage is to free LGWR from CPU work required to inform lots of processes about commits
  • LGWR dynamically switches between old and new method based on load and responsiveness
    • Method can switch frequently at runtime, max frequency is 3 switches per minute (configurable)
    • Switch is logged in LGWR tracefile, we have seen several switches per day
  • Few problems in general, possible issues seem to be in RAC and/or the switching process itself
    Continue reading

© 2019 Init dba

Theme by Anders NorenUp ↑