I recently had a case with a customer concerned over the time taken to dd an Oracle redo log file even though Oracle had been shutdown. Finding this strange I thought I’d use TNF probes to see what was happening.
#!/bin/sh -x # Script to collect TNF data whilst dd is happening on Oracle log file # # Andy H 19/09/06 # TMPDIR=${TMPDIR:-/var/tmp/tnf} output_file=$TMPDIR/ktrace.tnf command="dd if=/u01/oracle/oradata/TEST/redo01.log of=/u01/oracle/oradata/TEST/tmp.tmp" start_prex() { echo "Enabling kernel tracing" prex -k <$TMPDIR/pid.out wait $pid fi } # Extract kernel trace buffer ktrace_xtract() { echo "Extracting kernel trace buffer" tnfxtract $output_file || fail "Could not extract kernel trace buffer" ls -l $output_file } # Get kernel symbols ktrace_sym() { echo "Collecting kernel symbol information" /usr/ccs/bin/nm -fx /dev/ksyms | tr "|" " " > $TMPDIR/nm.out } # tar and gzip data ktrace_gzip() { echo "Packaging collected data" /usr/sbin/tar cf /tmp/tnf.tar $TMPDIR /usr/bin/gzip /tmp/tnf.tar echo "Collection complete, please send in /tmp/tnf.tar.gz for analysis" } start_prex run_command ktrace_xtract ktrace_sym stop_prex ktrace_gzip
Following using some clever sorting awk script from Tim Uglow I compared the TNF output from the normal dd timing and then the slow one.
64.922000 - 2562 1 0x300042edc20 0 page_unmap vnode: 0x600062ee440 offset: 157253632 64.924500 - 2562 1 0x300042edc20 0 page_unmap vnode: 0x600062ee440 offset: 157261824 64.927100 - 2562 1 0x300042edc20 0 page_unmap vnode: 0x600062ee440 offset: 157270016 64.929200 - 2562 1 0x300042edc20 0 page_unmap vnode: 0x600062ee440 offset: 157278208 64.931500 - 2562 1 0x300042edc20 0 page_unmap vnode: 0x600062ee440 offset: 157286400 101.904300 - 2562 1 0x300042edc20 0 syscall_end rval1: 4 rval2: 0 errno: 0 creat64 98.822400 No error 101.908200 - 2562 1 0x300042edc20 0 thread_queue tid: 0x300042edc20 cpuid: 0 priority: 0 queue_length: 0x0 101.908500 - 2562 1 0x300042edc20 0 thread_state tid: 0x300042edc20 state: 8 waiting for CPU 101.924000 - 2562 1 0x300042edc20 0 syscall_start sysnum: 137 sysconfig 101.924500 - 2562 1 0x300042edc20 0 syscall_end rval1: 8192 rval2: 18466816 errno: 0 sysconfig 0.000500 No error 101.928700 - 2562 1 0x300042edc20 0 syscall_start sysnum: 17 brk 101.931200 - 2562 1 0x300042edc20 0 syscall_end rval1: 0 rval2: 134217728 errno: 0 brk 0.002500 No error 101.932500 - 2562 1 0x300042edc20 0 syscall_start sysnum: 17 brk 101.941300 - 2562 1 0x300042edc20 0 syscall_end rval1: 0 rval2: 134217728 errno: 0 brk 0.008800 No error 101.943200 - 2562 1 0x300042edc20 0 thread_state state: 4 asleep in user data fault 101.945400 - 2562 1 0x300042edc20 0 address_fault address: 0x28000 fault_type: 0 access: 1 101.947900 - 2562 1 0x300042edc20 0 anon_zero address: 0x28000 101.967700 - 2562 1 0x300042edc20 0 thread_state state: 0 in userland 101.978500 - 2562 1 0x300042edc20 0 syscall_start sysnum: 98 sigaction 101.980600 - 2562 1 0x300042edc20 0 syscall_end rval1: 0 rval2: 0 errno: 0 sigaction 0.002100 No error 101.985800 - 2562 1 0x300042edc20 0 syscall_start sysnum: 3 read 102.000000 - 2562 1 0x300042edc20 0 syscall_end rval1: 512 rval2: 8193 errno: 0 read 0.014200 No error 102.009800 - 2562 1 0x300042edc20 0 syscall_start sysnum: 4 write 102.073000 - 2562 1 0x300042edc20 0 syscall_end rval1: 512 rval2: 8962 errno: 0 write 0.063200 No error 102.074500 - 2562 1 0x300042edc20 0 syscall_start sysnum: 3 read 102.079700 - 2562 1 0x300042edc20 0 syscall_end rval1: 512 rval2: 8193 errno: 0 read 0.005200 No error 102.082300 - 2562 1 0x300042edc20 0 syscall_start sysnum: 4 write 102.088200 - 2562 1 0x300042edc20 0 syscall_end rval1: 512 rval2: 8962 errno: 0 write 0.005900 No error 102.089200 - 2562 1 0x300042edc20 0 syscall_start sysnum: 3 read 102.093600 - 2562 1 0x300042edc20 0 syscall_end rval1: 512 rval2: 8193 errno: 0 read 0.004400 No error 102.096200 - 2562 1 0x300042edc20 0 syscall_start sysnum: 4 write 102.129600 - 2562 1 0x300042edc20 0 syscall_end rval1: 512 rval2: 8962 errno: 0 write 0.033400 No error 102.130900 - 2562 1 0x300042edc20 0 syscall_start sysnum: 3 read 102.136100 - 2562 1 0x300042edc20 0 syscall_end rval1: 512 rval2: 8193 errno: 0 read 0.005200 No error 102.138600 - 2562 1 0x300042edc20 0 syscall_start sysnum: 4 write
So each of the read() / writes() happen one after each other…..but what about on the so called broken one:
59.433900 - 2635 1 0x300041792c0 3 page_unmap vnode: 0x600062ee440 offset: 157253632 59.436200 - 2635 1 0x300041792c0 3 page_unmap vnode: 0x600062ee440 offset: 157261824 59.438300 - 2635 1 0x300041792c0 3 page_unmap vnode: 0x600062ee440 offset: 157270016 59.440300 - 2635 1 0x300041792c0 3 page_unmap vnode: 0x600062ee440 offset: 157278208 59.442600 - 2635 1 0x300041792c0 3 page_unmap vnode: 0x600062ee440 offset: 157286400 95.378500 - 2635 1 0x300041792c0 3 syscall_end rval1: 4 rval2: 0 errno: 0 creat64 91.600000 No error 95.380600 - 2635 1 0x300041792c0 3 thread_queue tid: 0x300041792c0 cpuid: 3 priority: 0 queue_length: 0x0 95.380800 - 2635 1 0x300041792c0 3 thread_state tid: 0x300041792c0 state: 8 waiting for CPU 95.398100 - 2635 1 0x300041792c0 3 syscall_start sysnum: 137 sysconfig 95.398600 - 2635 1 0x300041792c0 3 syscall_end rval1: 8192 rval2: 18466816 errno: 0 sysconfig 0.000500 No error 95.404300 - 2635 1 0x300041792c0 3 syscall_start sysnum: 17 brk 95.406100 - 2635 1 0x300041792c0 3 syscall_end rval1: 0 rval2: 134217728 errno: 0 brk 0.001800 No error 95.407300 - 2635 1 0x300041792c0 3 syscall_start sysnum: 17 brk 95.415700 - 2635 1 0x300041792c0 3 syscall_end rval1: 0 rval2: 134217728 errno: 0 brk 0.008400 No error 95.417700 - 2635 1 0x300041792c0 3 thread_state state: 4 asleep in user data fault 95.419500 - 2635 1 0x300041792c0 3 address_fault address: 0x28000 fault_type: 0 access: 1 95.421600 - 2635 1 0x300041792c0 3 anon_zero address: 0x28000 95.440600 - 2635 1 0x300041792c0 3 thread_state state: 0 in userland 95.453200 - 2635 1 0x300041792c0 3 syscall_start sysnum: 98 sigaction 95.454800 - 2635 1 0x300041792c0 3 syscall_end rval1: 0 rval2: 0 errno: 0 sigaction 0.001600 No error 95.460900 - 2635 1 0x300041792c0 3 syscall_start sysnum: 3 read 95.483700 - 2635 1 0x300041792c0 3 strategy device: 118,22 block: 2262752 size: 0x200 buf: 0x60008105278 flags: 0x2200061 95.522200 - 2635 1 0x300041792c0 3 thread_block reason: 0x60008105338 stack:95.523200 - 2635 1 0x300041792c0 3 thread_state state: 7 asleep for other reason 95.523400 - 2635 1 0x300041792c0 3 thread_state tid: 0x2a10001fcc0 state: 1 in syscall/pagefault 95.939700 - 2635 1 0x300041792c0 17 syscall_end rval1: 512 rval2: 8193 errno: 0 read 0.478800 No error 95.972400 - 2635 1 0x300041792c0 17 syscall_start sysnum: 4 write 96.070000 - 2635 1 0x300041792c0 17 syscall_end rval1: 512 rval2: 8962 errno: 0 write 0.097600 No error 96.072000 - 2635 1 0x300041792c0 17 syscall_start sysnum: 3 read 96.078800 - 2635 1 0x300041792c0 17 strategy device: 118,22 block: 2262753 size: 0x200 buf: 0x60008105278 flags: 0x2200061 96.105100 - 2635 1 0x300041792c0 17 thread_block reason: 0x60008105338 stack: 96.106100 - 2635 1 0x300041792c0 17 thread_state state: 7 asleep for other reason 96.106300 - 2635 1 0x300041792c0 17 thread_state tid: 0x2a1005d1cc0 state: 1 in syscall/pagefault
Interesting we appear to be blocking for a reason….ahhh following from a quick chat with John Alderson and reading his blog entry this now becomes clear.
Basically directio is still persistant on the file after Oracle is shutdown. DIRECTIO_ON allows you to force directio per-file and is used potentially by databases if their app tunable is so set (which in our case it is). The flag is set in the inode’s i_flag and hangs around cached within the inode in the DNLC and will effect files until the inode is flushed from the DNLC.