相同SQL在不同实例结果竟然不同,你知道吗?

1 问题背景

这是一个遗留的老库,4节点12.2的RAC。我们每天都会通过EM对数据库所有PDB的表空间使用量进行巡检,针对使用率较高的表空间将和业务方进行沟通并扩容。但是最近发现一个非常奇怪的现象,即根据表空间当前使用数据量和数据文件自动增长的最大值比值得出的已用空间使用率(Available Space Used(%))没有产生变化了,但是其余的值比如已用的分配空间占用率(Allocated Space Used (%))、分配大小(Allocated Size (GB))、已用空间(Space Used(GB))、数据文件数量(Datafiles)等其他数据却又是变化的,且可以通过这些数值又可以人工算出正确的已用空间使用率结果:而奇怪的是,进入PDB中又会发现已用用空间使用率结果是正确的。

2 前期排查

其实这个页面的后台语句。

复制
select * from ( WITH df AS ( SELECT con_id, tablespace_name, SUM(bytes) bytes, COUNT(*) cnt, DECODE(SUM(DECODE(autoextensible, NO, 0, 1)), 0, NO, YES) autoext FROM cdb_data_files GROUPBY con_id, tablespace_name), um AS (SELECT con_id, tablespace_name, used_space ub, used_percent FROM cdb_tablespace_usage_metrics), pdb AS (SELECT con_id, NAMEFROM v$containers) SELECT p.NAME, d.tablespace_name, TO_CHAR (u.used_percent, 99999990.00), NVL ((a.bytes - NVL (f.bytes, 0)) / a.bytes * 100, 0), a.autoext, NVL (a.bytes, 0) / 1024 / 1024 / 1024, NVL (a.bytes - NVL (f.bytes, 0), 0) / 1024 / 1024 / 1024, NVL (f.bytes, 0) / 1024 / 1024 / 1024, d.STATUS, a.cnt, d.contents, d.extent_management, d.segment_space_management FROM cdb_tablespaces d, df a, um u, pdb p, (SELECT CON_ID, tablespace_name, SUM(bytes) bytesFROM cdb_free_space GROUPBY CON_ID, tablespace_name) f WHERE d.tablespace_name = a.tablespace_name (+) AND d.tablespace_name = f.tablespace_name (+) AND d.tablespace_name = u.tablespace_name (+) ANDNOT d.contents = UNDO ANDNOT (d.extent_management = LOCALAND d.contents = TEMPORARY) AND p.con_id = d.con_id AND p.con_id = u.con_id (+) AND p.con_id = a.con_id (+) AND p.con_id = f.con_id (+) UNIONALL SELECT p.NAME, d.tablespace_name, TO_CHAR (u.used_percent, 99999990.00), NVL ((u.ub * d.block_size) / tf.bytes * 100, 0), tf.autoext, NVL (tf.bytes, 0) / 1024 / 1024 / 1024, NVL (u.ub * d.block_size, 0) / 1024 / 1024 / 1024, (NVL (tf.bytes, 0) - NVL (u.ub * d.block_size, 0)) / 1024 / 1024 / 1024, d.STATUS, tf.cnt, d.contents, d.extent_management, d.segment_space_management FROM cdb_tablespaces d, um u, pdb p, ( SELECT con_id, tablespace_name, SUM(bytes) bytes, COUNT(*) cnt, DECODE(SUM(DECODE(autoextensible, NO, 0, 1)), 0, NO, YES) autoext FROM cdb_temp_files GROUPBY con_id, tablespace_name) tf WHERE d.tablespace_name = tf.tablespace_name (+) AND d.tablespace_name = u.tablespace_name (+) AND d.extent_management = LOCAL AND d.contents = TEMPORARY AND p.con_id = d.con_id AND p.con_id = u.con_id (+) AND p.con_id = tf.con_id (+) UNIONALL SELECT p.NAME, d.tablespace_name, TO_CHAR (u.used_percent, 99999990.00), NVL ((u.ub * d.block_size) / a.bytes * 100, 0), a.autoext, NVL (a.bytes, 0) / 1024 / 1024 / 1024, NVL (u.ub * d.block_size, 0) / 1024 / 1024 / 1024, (NVL (a.bytes, 0) - NVL (u.ub * d.block_size, 0)) / 1024 / 1024 / 1024, d.STATUS, a.cnt, d.contents, d.extent_management, d.segment_space_management FROM cdb_tablespaces d, df a, um u, pdb p WHERE d.tablespace_name = a.tablespace_name (+) AND d.tablespace_name = u.tablespace_name (+) AND d.contents = UNDO AND p.con_id = d.con_id AND p.con_id = u.con_id (+) AND p.con_id = a.con_id (+) ) orderby3;1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68.69.70.71.72.73.74.75.76.77.78.79.80.81.82.83.84.85.86.87.88.89.90.91.92.93.94.95.96.97.98.99.100.101.102.103.104.105.106.107.108.

这里在外面嵌套了一层用于排序。因为计算结果没有更新,一开始的排查方向是EM的缓存没有清理,但是对EM的各项设置进行检查后,并没有发现相关问题。随即在各个节点上执行该SQL,发现在节点1上执行结果有问题,在EM上将表空间查询操作指定到其他实例结果也是正确的。随即又开了个和数据库相关的SR。

3 深入排查

在数据库SR的指引下,收集了SQLHC的相关诊断信息,然后给了一大堆hint:

复制
select /*+ BEGIN_OUTLINE_DATA IGNORE_OPTIM_EMBEDDED_HINTS OPTIMIZER_FEATURES_ENABLE(12.2.0.1) DB_VERSION(12.2.0.1) ALL_ROWS OUTLINE_LEAF(@"SEL$07BDC5B4") MERGE(@"SEL$4" >"SEL$3") OUTLINE_LEAF(@"SEL$ABDE6DFF") MERGE(@"SEL$6" >"SEL$5") OUTLINE_LEAF(@"SEL$DFD66ADD") MERGE(@"SEL$CF5359D5" >"SEL$7") OUTLINE_LEAF(@"SEL$22C746FF") OUTLINE_LEAF(@"SEL$513E9771") OUTLINE_LEAF(@"SEL$522E92D8") OUTLINE_LEAF(@"SEL$42DFC41A") MERGE(@"SEL$12" >"SEL$11") OUTLINE_LEAF(@"SEL$1F78930A") MERGE(@"SEL$10" >"SEL$2") OUTLINE_LEAF(@"SEL$513E9770") OUTLINE_LEAF(@"SEL$522E92D7") OUTLINE_LEAF(@"SEL$29F99543") MERGE(@"SEL$16" >"SEL$15") OUTLINE_LEAF(@"SEL$1CF66C63") MERGE(@"SEL$14" >"SEL$13") OUTLINE_LEAF(@"SEL$22C746FE") MATERIALIZE(@"SEL$07BDC5B4") OUTLINE_LEAF(@"SEL$513E976F") MATERIALIZE(@"SEL$ABDE6DFF") OUTLINE_LEAF(@"SEL$522E92D6") MATERIALIZE(@"SEL$DFD66ADD") OUTLINE_LEAF(@"SEL$DC4B4145") MERGE(@"SEL$18" >"SEL$17") OUTLINE_LEAF(@"SET$1") OUTLINE_LEAF(@"SEL$1") OUTLINE(@"SEL$3") OUTLINE(@"SEL$4") OUTLINE(@"SEL$5") OUTLINE(@"SEL$6") OUTLINE(@"SEL$7") OUTLINE(@"SEL$CF5359D5") MERGE(@"SEL$9" >"SEL$8") OUTLINE(@"SEL$07BDC5B4") MERGE(@"SEL$4" >"SEL$3") OUTLINE(@"SEL$ABDE6DFF") MERGE(@"SEL$6" >"SEL$5") OUTLINE(@"SEL$DFD66ADD") MERGE(@"SEL$CF5359D5" >"SEL$7") OUTLINE(@"SEL$11") OUTLINE(@"SEL$12") OUTLINE(@"SEL$2") OUTLINE(@"SEL$10") OUTLINE(@"SEL$15") OUTLINE(@"SEL$16") OUTLINE(@"SEL$13") OUTLINE(@"SEL$14") OUTLINE(@"SEL$17") OUTLINE(@"SEL$18") OUTLINE(@"SEL$8") OUTLINE(@"SEL$9") NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1") PQ_CONCURRENT_UNION(@"SET$1") NO_ACCESS(@"SEL$DC4B4145" "P"@"SEL$17") FULL(@"SEL$DC4B4145" "K"@"SEL$18") NO_ACCESS(@"SEL$DC4B4145" "A"@"SEL$17") NO_ACCESS(@"SEL$DC4B4145" "U"@"SEL$17") LEADING(@"SEL$DC4B4145" "P"@"SEL$17" "K"@"SEL$18" "A"@"SEL$17" "U"@"SEL$17") USE_HASH(@"SEL$DC4B4145" "K"@"SEL$18") USE_HASH(@"SEL$DC4B4145" "A"@"SEL$17") USE_HASH(@"SEL$DC4B4145" "U"@"SEL$17") PQ_DISTRIBUTE(@"SEL$DC4B4145" "K"@"SEL$18" HASH HASH) PQ_DISTRIBUTE(@"SEL$DC4B4145" "A"@"SEL$17" HASH HASH) PQ_DISTRIBUTE(@"SEL$DC4B4145" "U"@"SEL$17" HASH HASH) NO_ACCESS(@"SEL$1CF66C63" "P"@"SEL$13") FULL(@"SEL$1CF66C63" "K"@"SEL$14") NO_ACCESS(@"SEL$1CF66C63" "U"@"SEL$13") NO_ACCESS(@"SEL$1CF66C63" "TF"@"SEL$13") LEADING(@"SEL$1CF66C63" "P"@"SEL$13" "K"@"SEL$14" "U"@"SEL$13" "TF"@"SEL$13") USE_HASH(@"SEL$1CF66C63" "K"@"SEL$14") USE_HASH(@"SEL$1CF66C63" "U"@"SEL$13") USE_HASH(@"SEL$1CF66C63" "TF"@"SEL$13") PQ_DISTRIBUTE(@"SEL$1CF66C63" "K"@"SEL$14" HASH HASH) PQ_DISTRIBUTE(@"SEL$1CF66C63" "U"@"SEL$13" HASH HASH) PQ_DISTRIBUTE(@"SEL$1CF66C63" "TF"@"SEL$13" HASH HASH) PX_JOIN_FILTER(@"SEL$1CF66C63" "TF"@"SEL$13") NO_ACCESS(@"SEL$1F78930A" "P"@"SEL$2") FULL(@"SEL$1F78930A" "K"@"SEL$10") NO_ACCESS(@"SEL$1F78930A" "A"@"SEL$2") NO_ACCESS(@"SEL$1F78930A" "U"@"SEL$2") NO_ACCESS(@"SEL$1F78930A" "F"@"SEL$2") LEADING(@"SEL$1F78930A" "P"@"SEL$2" "K"@"SEL$10" "A"@"SEL$2" "U"@"SEL$2" "F"@"SEL$2") USE_HASH(@"SEL$1F78930A" "K"@"SEL$10") USE_HASH(@"SEL$1F78930A" "A"@"SEL$2") USE_HASH(@"SEL$1F78930A" "U"@"SEL$2") USE_HASH(@"SEL$1F78930A" "F"@"SEL$2") PQ_DISTRIBUTE(@"SEL$1F78930A" "K"@"SEL$10" HASH HASH) PQ_DISTRIBUTE(@"SEL$1F78930A" "A"@"SEL$2" HASH HASH) PQ_DISTRIBUTE(@"SEL$1F78930A" "U"@"SEL$2" HASH HASH) PQ_DISTRIBUTE(@"SEL$1F78930A" "F"@"SEL$2" HASH HASH) PX_JOIN_FILTER(@"SEL$1F78930A" "F"@"SEL$2") FULL(@"SEL$522E92D8" "T1"@"SEL$522E92D8") FULL(@"SEL$513E9771" "T1"@"SEL$513E9771") FULL(@"SEL$22C746FF" "T1"@"SEL$22C746FF") FULL(@"SEL$42DFC41A" "K"@"SEL$12") GBY_PUSHDOWN(@"SEL$42DFC41A") USE_HASH_AGGREGATION(@"SEL$42DFC41A") FULL(@"SEL$522E92D7" "T1"@"SEL$522E92D7") FULL(@"SEL$513E9770" "T1"@"SEL$513E9770") FULL(@"SEL$29F99543" "K"@"SEL$16") GBY_PUSHDOWN(@"SEL$29F99543") USE_HASH_AGGREGATION(@"SEL$29F99543") FULL(@"SEL$522E92D6" "T1"@"SEL$522E92D6") FULL(@"SEL$513E976F" "T1"@"SEL$513E976F") FULL(@"SEL$22C746FE" "T1"@"SEL$22C746FE") FULL(@"SEL$DFD66ADD" "X$CON"@"SEL$9") FULL(@"SEL$ABDE6DFF" "K"@"SEL$6") FULL(@"SEL$07BDC5B4" "K"@"SEL$4") USE_HASH_AGGREGATION(@"SEL$07BDC5B4") USE_PARTITION_WISE_GBY(@"SEL$07BDC5B4") END_OUTLINE_DATA */ * from ( WITH df AS ( SELECT ...1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68.69.70.71.72.73.74.75.76.77.78.79.80.81.82.83.84.85.86.87.88.89.90.91.92.93.94.95.96.97.98.99.100.101.102.103.104.105.106.107.108.109.110.111.112.113.114.115.116.117.118.119.120.121.122.123.124.125.126.

对比不同节点的SQL的实际执行计划的Outline信息(这里不做展示),这里是通过hint的方式SQL执行将指向到正确的执行计划,输出结果也回归正常。SR的回复则是:

实例 1 hard parse 出来的执行计划产生了错误结果,这个执行计划可能是一个不正确的执行计划。一个错误的执行计划有可能产生错误的记录条数,也可能产生正确的结果条数,但是每个记录中的 sum / count 数据项却不正确。这都是错误执行计划可能导致的结果。

4 尝试解决

既然执行计划有误,SQL PLAN会缓存在Shared Pool中,那么是不是可以通过清理Shared Pool的执行计划缓存来解决这一问题呢:

复制
-- 查询语句的相关信息 SELECT sql_text, plan_hash_value, address, hash_value FROM v$sqlarea WHERE sql_id = 1fr0p0hnav1bq;1.2.3.4.
复制
-- 清理执行计划缓存 -- EXEC DBMS_SHARED_POOL.PURGE(ADDRESS,HASH_VALUE, C); EXEC DBMS_SHARED_POOL.PURGE(0000000A32100428,682460534, C);1.2.3.

再次查询,结果恢复正常:目前得到的消息,这一现象仅会出现在对系统视图、元数据的复杂查询中。将对应几条语句的执行计划缓存都清理过后,直接执行语句没问题了,但EM显示还是有点问题,相关问题还得继续处理。

总结

这是一个比较奇怪的从EM中发现的现象,目前已解决了数据库层面的问题。

THE END