debugging delete with referential integrity

I have been working on a performance issue on an 11.2.0.4 RAC database. I really wanted to get to the bottom of the issue and when I did not know how to proceed any further I turned to twitter to ask for a brainstorm of suggestions and things to try.

This started a tremendous chain of tweets from a bunch of very bright people that came up with their own theories and suggestions. But since twitter is very limited in showing tests, code and results I would like to keep these in this blog post, maybe this can be of help to some people later.

There is a job that is supposed to clean up lots of old records in a number of tables. One of them is a parent table to 26 others that have foreign key (FK) contraints to that one parent. The job deletes all the childs first and then removes the parent instead of doing it with a cascading delete. This job caught my attention when this delete failed with an ORA-01555 error like this:

ORA-01555 caused by SQL statement below (SQL ID: 3s7tjx9jgp9wq, Query Duration=44341 sec, SCN: 0x0d0e.20888160):
delete from parent where parent.jobid <= 29211404 and id not in (select lost_child.parentid from lost_child)

So for some reason this ran for more than 12 hours and then bailed because it ran out of UNDO. The select part and predicate of this are not the problem, I was able to confirm that this runs just fine (ran that as a select and noted a few of the ids). Then I tried a single-row delete like this:

delete from parent where id = 8569043636980;

... and waited 15 minutes for this to finish. During this time I checked a few things. First of all, the execution plan of that delete is really simple and easy:

============
Plan Table
============
----------------------------------------+-----------------------------------+
| Id  | Operation           | Name      | Rows  | Bytes | Cost  | Time      |
----------------------------------------+-----------------------------------+
| 0   | DELETE STATEMENT    |           |       |       |     3 |           |
| 1   |  DELETE             | CONTEXT   |       |       |       |           |
| 2   |   INDEX UNIQUE SCAN | CONTEXT_PK|     1 |    15 |     3 |  00:00:01 |
----------------------------------------+-----------------------------------+

No issue there. I checked if all the child tables had an index built on their referencing column as was also suggested by a number of people:

Without such an index the database needs to perform a full scan and some nasty locks. Yes, all child columns have an index on them that was also usable etc. I later even verified that a SELECT FROM CHILD WHERE PARENTID = xxx was using an efficient index range scan.

Good point, bad stats (like saying the index is tiny) might lead to a wrong access path. I verified that the stats on the index and object were close enough to reality and also double-checked by just gathering more up-to-date stats with a high sample percentage just to be on the safe side. This did not change the behaviour.

Next I used snapper to check what that session was actually doing and waiting on. Snapper is one of my favourite tools. It is just one sql script to run (no need for installation) and just samples v$session and or v$sessstat to generate a simple but clean output. Instead of snapper I could have just aswell used a query like this repeatedly for a similar result in this case:

SELECT wait_event, p1, p2 from v$session where username = 'BROST';

But this is how it looked like in snapper:

SQL> @snapper ash=sql_id+wait_event+p1+p2 3 1 user=brost
Sampling SID user=brost with interval 3 seconds, taking 1 snapshots...

-- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

------------------------------------------------------------------------------------
Active%| SQL_ID          | EVENT                           | P1          | P2
------------------------------------------------------------------------------------
    2% | 6th6t45cj194z   | db file sequential read         | 61         | 1366560
    2% | 6th6t45cj194z   | db file sequential read         | 61         | 1366822
    2% | 6th6t45cj194z   | db file sequential read         | 61         | 1366690
    2% | 6th6t45cj194z   | db file sequential read         | 61         | 1366627
    2% | 6th6t45cj194z   | db file sequential read         | 61         | 1366374
    2% | 6th6t45cj194z   | db file sequential read         | 61         | 1367008
    2% | 6th6t45cj194z   | db file sequential read         | 61         | 1366176

--  End of ASH snap 1, end=2015-10-01 15:21:12, seconds=3, samples_taken=24

PL/SQL procedure successfully completed.

So this session spent most of it's time on index reads. P1 and P2 for this event are file# and block# and I checked which object this was accessing by looking at the output of this:

SQL> select segment_name from dba_extents where file_id = 61 and block_id <= 1366560 and block_id+blocks >= 1366560;

SEGMENT_NAME
--------------------------------------------------------------------
SOMECHILD_FK_IDX

This is much more index IO than what was expected. The delete of course has to check if there are still rows in the child table that reference the parent but that should be easy enough with just one small index range scan that should access maybe 3 or 4 index blocks. Not hundreds or thousands.

I verified this with a SQL trace of my next session which looked like this:

PARSING IN CURSOR #140005611652528 len=50 dep=0 uid=84 oct=7 lid=84 tim=1443698726379928 hv=630041575 ad='144d67dbe8' sqlid='72x2hahksvaz7'
delete from parent where id = 8569043855342
END OF STMT
PARSE #140005611652528:c=4000,e=8109,p=0,cr=4,cu=0,mis=1,r=0,dep=0,og=1,plh=723637861,tim=1443698726379928
WAIT #140005611652528: nam='gc cr grant 2-way' ela= 104 p1=66 p2=1120784 p3=1 obj#=571538 tim=1443698726380207
WAIT #140005611652528: nam='db file sequential read' ela= 6011 file#=66 block#=1120784 blocks=1 obj#=571538 tim=1443698726386252
WAIT #140005611652528: nam='db file sequential read' ela= 1263 file#=66 block#=1121580 blocks=1 obj#=571538 tim=1443698726387568
WAIT #140005611652528: nam='Disk file operations I/O' ela= 2 FileOperation=2 fileno=6 filetype=2 obj#=571538 tim=1443698726387638
WAIT #140005611652528: nam='db file sequential read' ela= 188 file#=61 block#=828449 blocks=1 obj#=571537 tim=1443698726387920
WAIT #140005611652528: nam='db file sequential read' ela= 1481 file#=61 block#=831159 blocks=1 obj#=571537 tim=1443698726389449
WAIT #140005611652528: nam='db file sequential read' ela= 14945 file#=63 block#=2641659 blocks=1 obj#=571442 tim=1443698726404494
WAIT #140005611652528: nam='db file sequential read' ela= 196 file#=61 block#=1198513 blocks=1 obj#=571442 tim=1443698726404775
WAIT #140005611652528: nam='db file sequential read' ela= 164 file#=61 block#=994333 blocks=1 obj#=571442 tim=1443698726404982
WAIT #140005611652528: nam='db file sequential read' ela= 5051 file#=61 block#=994382 blocks=1 obj#=571442 tim=1443698726410061
WAIT #140005611652528: nam='db file sequential read' ela= 224 file#=61 block#=512028 blocks=1 obj#=571491 tim=1443698726410329
WAIT #140005611652528: nam='gc current grant 2-way' ela= 128 p1=63 p2=2688356 p3=16777217 obj#=571491 tim=1443698726410502
WAIT #140005611652528: nam='db file sequential read' ela= 202 file#=63 block#=2688356 blocks=1 obj#=571491 tim=1443698726410734
WAIT #140005611652528: nam='db file sequential read' ela= 194 file#=61 block#=579075 blocks=1 obj#=571480 tim=1443698726411003
WAIT #140005611652528: nam='db file sequential read' ela= 9158 file#=61 block#=580112 blocks=1 obj#=571480 tim=1443698726420196
WAIT #140005611652528: nam='db file sequential read' ela= 8910 file#=61 block#=579219 blocks=1 obj#=571480 tim=1443698726429149
WAIT #140005611652528: nam='gc current grant 2-way' ela= 193 p1=66 p2=3775612 p3=16777217 obj#=571478 tim=1443698726429393
WAIT #140005611652528: nam='db file sequential read' ela= 321 file#=66 block#=3775612 blocks=1 obj#=571478 tim=1443698726429741
WAIT #140005611652528: nam='db file sequential read' ela= 183 file#=63 block#=2585605 blocks=1 obj#=571478 tim=1443698726429970
WAIT #140005611652528: nam='db file sequential read' ela= 9740 file#=66 block#=5408 blocks=1 obj#=571478 tim=1443698726439753
WAIT #140005611652528: nam='gc current grant 2-way' ela= 167 p1=66 p2=511508 p3=16777217 obj#=571467 tim=1443698726440012
WAIT #140005611652528: nam='db file sequential read' ela= 211 file#=66 block#=511508 blocks=1 obj#=571467 tim=1443698726440250
WAIT #140005611652528: nam='db file sequential read' ela= 18852 file#=63 block#=2684478 blocks=1 obj#=571467 tim=1443698726459140
WAIT #140005611652528: nam='db file sequential read' ela= 6443 file#=63 block#=2684564 blocks=1 obj#=571467 tim=1443698726465668
WAIT #140005611652528: nam='db file sequential read' ela= 1335 file#=61 block#=529269 blocks=1 obj#=571463 tim=1443698726467098
WAIT #140005611652528: nam='db file sequential read' ela= 9838 file#=61 block#=529221 blocks=1 obj#=571463 tim=1443698726476967
WAIT #140005611652528: nam='db file sequential read' ela= 8840 file#=61 block#=529729 blocks=1 obj#=571463 tim=1443698726485844
WAIT #140005611652528: nam='db file sequential read' ela= 9246 file#=63 block#=2704838 blocks=1 obj#=571463 tim=1443698726495134
WAIT #140005611652528: nam='gc current grant 2-way' ela= 177 p1=61 p2=574851 p3=16777217 obj#=571452 tim=1443698726495361
WAIT #140005611652528: nam='db file sequential read' ela= 3437 file#=61 block#=574851 blocks=1 obj#=571452 tim=1443698726498822
WAIT #140005611652528: nam='gc current grant 2-way' ela= 208 p1=61 p2=574867 p3=16777217 obj#=571452 tim=1443698726499075
WAIT #140005611652528: nam='db file sequential read' ela= 15642 file#=61 block#=574867 blocks=1 obj#=571452 tim=1443698726514744
WAIT #140005611652528: nam='gc current grant 2-way' ela= 151 p1=66 p2=2048228 p3=16777217 obj#=571450 tim=1443698726514985
WAIT #140005611652528: nam='db file sequential read' ela= 4806 file#=66 block#=2048228 blocks=1 obj#=571450 tim=1443698726519852
WAIT #140005611652528: nam='gc current grant 2-way' ela= 139 p1=66 p2=2048252 p3=16777217 obj#=571450 tim=1443698726520071
WAIT #140005611652528: nam='db file sequential read' ela= 7082 file#=66 block#=2048252 blocks=1 obj#=571450 tim=1443698726527177
WAIT #140005611652528: nam='db file sequential read' ela= 322 file#=61 block#=1366366 blocks=1 obj#=571445 tim=1443698726527554
WAIT #140005611652528: nam='db file sequential read' ela= 359 file#=61 block#=1366303 blocks=1 obj#=571445 tim=1443698726527967
WAIT #140005611652528: nam='db file sequential read' ela= 157 file#=61 block#=1366304 blocks=1 obj#=571445 tim=1443698726528189
WAIT #140005611652528: nam='db file sequential read' ela= 121 file#=61 block#=1366305 blocks=1 obj#=571445 tim=1443698726528332
WAIT #140005611652528: nam='db file sequential read' ela= 139 file#=61 block#=1366306 blocks=1 obj#=571445 tim=1443698726528491
WAIT #140005611652528: nam='db file sequential read' ela= 254 file#=61 block#=1366243 blocks=1 obj#=571445 tim=1443698726528793
WAIT #140005611652528: nam='db file sequential read' ela= 187 file#=61 block#=1366244 blocks=1 obj#=571445 tim=1443698726529035
*******
--- 42000 more waits for the same obj#
*******
EXEC #140005611652528:c=3179000,e=49561351,p=43636,cr=4,cu=798556,mis=0,r=1,dep=0,og=1,plh=723637861,tim=1443698775941316
STAT #140005611652528 id=1 cnt=0 pid=0 pos=1 obj=0 op='DELETE  CONTEXT (cr=4 pr=43636 pw=0 time=49561244 us)'
STAT #140005611652528 id=2 cnt=1 pid=1 pos=1 obj=571538 op='INDEX UNIQUE SCAN CONTEXT_PK (cr=4 pr=2 pw=0 time=7538 us cost=3 size=15 card=1)'
WAIT #140005611652528: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=571445 tim=1443698775941408

The interesting thing to look at here are the waits and the obj# that are being waited on. We can see that we are fetching a few blocks for an obj# at a time and then move on to the next one. Those are the lookups on some of the other 25 child tables and are what I would expect. But then we see more than 42000 waits on blocks of that index that I already found through snapper.

The question now is: why is the database not just doing an index range scan but a full scan, how can I see this and what do I do about it? I have tried 10053 and 10054 traces but they only show me the parsing of the main delete statement not the work that has to be done for the referential integrity checks. I quickly confirmed with Mauro that there are indeed no recursive statements as these would be easy to track.

After playing around with this a lot, the deletes get executed rather quickly now but that is just because the index is cached in the buffer cache by now. Autotrace still shows that we are touching a ton of datablocks:

SQL> set autotrace on
SQL> delete from parent where id = 8569043609010;

1 row deleted.

Execution Plan
----------------------------------------------------------
Plan hash value: 723637861

-------------------------------------------------------------------------------
| Id  | Operation	   | Name	| Rows	| Bytes | Cost (%CPU)| Time    |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |		|     1 |    15 |     3   (0)| 00:00:01|
|   1 |  DELETE 	   | CONTEXT	|	|	|	     |         |
|*  2 |   INDEX UNIQUE SCAN| CONTEXT_PK |     1 |    15 |     3   (0)| 00:00:01|
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"=8569043609010)

Statistics
----------------------------------------------------------
	 15  recursive calls
     799221  db block gets
	  8  consistent gets
	 56  physical reads
	928  redo size
	838  bytes sent via SQL*Net to client
	803  bytes received via SQL*Net from client
	  3  SQL*Net roundtrips to/from client
	  1  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

I was able to reproduce that number in a number of runs. I would expect a few hundred logical IOs, a few for each of the 26 indexes being used but certainly not hundreds of thousands. I then checked how big the index is and to my surprise it is even bigger than the number of blocks we read on execution. So we are not really reading all of the blocks of it. Could this be a lead?

SQL> select sum(blocks) from dba_extents where segment_name = 'SOMECHILD_FK_IDX';

SUM(BLOCKS)
-----------
    1347840

Martin Widlake provided an explanation for this by saying that only leaf blocks need to be read during a full scan. And according to the statistics, the index only has about 400.000 leaf blocks. That would mean the index might have some issues and explains the numbers above but does not explain why we are reading so much of this one index.

Here is just another test showing that the index on the child is OK and can easily be used for that kind of lookup:

SQL> select * from CHILD where parentid = 8569043843215;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3409892154

-----------------------------------------------------------------------------------------
| Id  | Operation		    | Name	| Rows	| Bytes | Cost (%CPU)| Time	|
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		|     1 |    71 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHILD     |     1 |    71 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN	    | CHILD_IDX	|     1 |	|     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PARENTID"=8569043843215)

Statistics
----------------------------------------------------------
	  0  recursive calls
	  0  db block gets
	  5  consistent gets
	  0  physical reads
	  0  redo size
	916  bytes sent via SQL*Net to client
	509  bytes received via SQL*Net from client
	  1  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	  0  rows processed

SQL> delete from PARENT where id = 8569043843215;

1 row deleted.

Execution Plan
----------------------------------------------------------
Plan hash value: 723637861

---------------------------------------------------------------------------------
| Id  | Operation	   | Name	| Rows	| Bytes | Cost (%CPU)| Time	|
---------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |		|     1 |    15 |     3   (0)| 00:00:01 |
|   1 |  DELETE 	   | PARENT	|	|	|	     |		|
|*  2 |   INDEX UNIQUE SCAN| PARENT_PK  |     1 |    15 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=8569043843215)

Statistics
----------------------------------------------------------
	 15  recursive calls
     798612  db block gets
	  8  consistent gets
     371635  physical reads
	828  redo size
	836  bytes sent via SQL*Net to client
	803  bytes received via SQL*Net from client
	  3  SQL*Net roundtrips to/from client
	  1  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

Mauro demonstrated a similar case where the issue was delayed block cleanout that led to heavy IO on a child index. But I was able to confirm that block cleanout is not the issue here. To check for that you look at v$mystat after executing the statement which was a good lead by itself because I rarely use that method and look at those session stats. These are the stats after a single row delete from the parent (I removed all the stats that I think are completely irrelevant for this problem to make it a bit more readable):

SQL> delete from parent where id = 8569043842942;

1 row deleted.

SQL> select n.name , s.value from v$mystat s, v$statname n where s.statistic# = n.statistic# order by 1;

NAME								      VALUE
---------------------------------------------------------------- ----------
CPU used by this session					       1731
CPU used when call started					       1731
CR blocks created							  0
DB time 							      35750
buffer is not pinned count						 28
buffer is pinned count							 14
calls to get snapshot scn: kcmgss					 22
calls to kcmgas 							  1
calls to kcmgcs 							  5
calls to kcmgrs 							  0
cleanout - number of ktugct calls					  0
cleanouts and rollbacks - consistent read gets				  0
cleanouts only - consistent read gets					  0
commit cleanouts							  1
commit cleanouts successfully completed 				  1
consistent gets 							 63
consistent gets - examination						 12
consistent gets direct							  0
consistent gets from cache						 63
consistent gets from cache (fastpath)					 49
current blocks converted for CR 					  0
data blocks consistent reads - undo records applied			  0
db block changes							 13
db block gets							     798615
db block gets direct							  0
db block gets from cache					     798615
db block gets from cache (fastpath)					 20
db corrupt blocks detected						  0
db corrupt blocks recovered						  0
deferred (CURRENT) block cleanout applications				  0
deferred CUR cleanouts (index blocks)					  0
dirty buffers inspected 						469
execute count								 24
file io wait time						  348691109
free buffer inspected						     455080
free buffer requested						     443541
hot buffers moved to head of LRU				     307755
index fast full scans (direct read)					  0
index fast full scans (full)						  0
index fast full scans (rowid ranges)					  0
index fetch by key							  1
index reclamation/extension switch					  0
index scans kdiixs1							 31
opened cursors cumulative						 25
opened cursors current							  2
parse count (hard)							  1
parse count (total)							 25
physical read total IO requests 				     443540
physical read total bytes					 3633479680
recursive calls 							194
redo size							       2564
sorts (disk)								  0
sorts (memory)								 17
sorts (rows)								 56
table fetch by rowid							  4
table fetch continued row						  0
table scan blocks gotten						  3
table scan rows gotten							123
table scans (cache partitions)						  0
table scans (direct read)						  0
table scans (long tables)						  0
table scans (rowid ranges)						  0
table scans (short tables)						  4
user calls								 21

This just confirms the number of logical IOs and shows that the time spent by this session is due to IO waits. According to this we are not full scanning any indexes though. I also produced a fresh tracefile after a night of sleeping on it during which most blocks got flushed out of the buffer cache by other things. I had yet another look at that new tracefule before trying to build a reproducible test case. Which I guess is the next thing to dive further into this even though I would like to avoid trying to reproduce this in a lab environment which is always a lot of work and might or might not lead to a result.

I tried to reproduce and went through the trouble of dumping and importing those objects into another system and re-ran the deletes. No issues there, everything was just like I would have expected. Which is good on one hand (means I am not crazy or have wrong expectations) but also annoying since I am still not much closer to finding the root cause.

In the end I had to give up the approach of drilling down until I find the root cause. I would have loved to see why so many index blocks were being accessed but could not find anything in any of the traces I performed. My suspicion (especially after being unable to reproduce with the same data but different physical layout) was that this has something to do with how the index looks like. I did an online index rebuild of that object, performed another delete and yay, now it is running like expected with only a handful of logical reads.

SQL> delete from PARENT where id = 8569043675642;

1 row deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 723637861

---------------------------------------------------------------------------------
| Id  | Operation	   | Name	| Rows	| Bytes | Cost (%CPU)| Time	|
---------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |		|     1 |    15 |     3   (0)| 00:00:01 |
|   1 |  DELETE 	   | PARENT	|	|	|	     |		|
|*  2 |   INDEX UNIQUE SCAN| PARENT_PK  |     1 |    15 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=8569043675642)


Statistics
----------------------------------------------------------
	 15  recursive calls
	886  db block gets
	  8  consistent gets
	  1  physical reads
	784  redo size
	849  bytes sent via SQL*Net to client
	803  bytes received via SQL*Net from client
	  3  SQL*Net roundtrips to/from client
	  1  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

Leave a Reply

Your email address will not be published. Required fields are marked *