sql - How to find bad references in a table in Oracle -
i have data problem need clean up. have 2 tables storing "package" information, 1 table documents , 1 table audit information. have entries in package tables reference documents no longer exist , have been replaced (same name different id) , want write query find bad ones , new document should replace them. thing linking these 2 string value in audit table stores document name (not id).
i've setup sample schema here: http://sqlfiddle.com/#!4/997bda/1
- package_s single values package in our application
- package_r repeating values package in our application
- (these joined same value in id column)
- audit_info audit information in package
- docs documents can attached package
this query finds packages bad attachments (may more 1 per package)
select distinct ps.pkgname, pr.doc_list package_s ps, package_r pr ps.id = pr.id , not exists ( select 1 docs pr.doc_list = id ) order 1,2 asc ;
i need build query following rules:
- i need return @ least package id, position value , new document id (i build update statement put new document id in row matching package id / position in package_r table)
- the way document name audit information is: substr(description,0,instr(description,'[')-2)
- if document added , removed, should ignored (string_1)
- string_2 must not 'supporting'
- the new document must match state = 'master' latest = 1 pub = '0'
right have semi-working script works on per package basis, problem affecting 2000+ packages. find audit entries don't match documents correctly attached package , search names in document table. problem since there no direct link between package , document tables, if there multiple problem attachments on 1 package, each "new" document returned once per position value, i.e.
package id bad doc id position new doc id p1 d1 -1 d1-new p1 d1 -1 d4-new p1 d4 -2 d1-new p1 d4 -2 d4-new
it doesn't matter new id goes position value, duplication result problem makes hard mass generate update scripts, manual filtering required.
this complex , unique data issue, appreciated.
this query works according informations provided:
with ai ( select a1.audited_id id, dc.id doc_id, dc.docname, row_number() on (partition a1.audited_id order dc.id) rn audit_info a1 join docs dc on dc.state = 'master' , dc.latest = 1 , dc.pub = '0' , dc.docname = substr(a1.description, 1, instr(a1.description, '[')-2) string_1 = 'added' , string_2 <> 'supporting' , not exists ( select * audit_info a2 a2.audited_id = a1.audited_id , string_1 = 'removed' , a2.description = a1.description ) , not exists ( -- here matching docs eliminated select 1 package_r pr pr.id = a1.audited_id , pr.doc_list = dc.id ) ), p ( select ps.id, ps.pkgname, pr.doc_list, pr.position, row_number() on (partition ps.id order doc_list) rn package_s ps join package_r pr on pr.id = ps.id not exists ( select * docs pr.doc_list = docs.id ) ) select p.id, p.pkgname, p.doc_list, p.position , ai.docname, ai.doc_id p join ai on ai.id = p.id , p.rn = ai.rn order p.id, p.doc_list, ai.doc_id
output:
id pkgname doc_list position docname doc_id -- ------- -------- -------- ------- ------ p1 000001 d3 -3 doc3 d3-new p1 000001 d4 -4 doc4 d4-new p2 000002 d5 -2 doc5 d5-new p4 000004 d6 -1 doc6 d6-new
edit: answers issues reported in comments
it identifying packages not have bad values, , doc_list column blank,
note query (my subquery p
) identyfing packages query, added counter there. guess process/application or manually cleared column doc_list
in package_r
. if don't want such entries, add condition and trim(doc_list) not null
in subquery p
.
for ones gets right on package part (they have bad value) bringing wrong docname/doc_id replace bad value with, different doc_id in list.
i understand partially. can add such entries examples (in fiddle or edit question , add problematic input rows , expected output them?)
"it doesn't matter new id goes position value". assignment made way - if had 2 old docs names "abc", "def" , corrected docs have names "xxa", "de12" linked "abc"->"de12" , "def"->"xxa" (alphabetical ordering seems more rational totally random). make assigning random change order ...
order null
in both row_number() functions.
Comments
Post a Comment