sql - How to find bad references in a table in Oracle -


i have data problem need clean up. have 2 tables storing "package" information, 1 table documents , 1 table audit information. have entries in package tables reference documents no longer exist , have been replaced (same name different id) , want write query find bad ones , new document should replace them. thing linking these 2 string value in audit table stores document name (not id).

i've setup sample schema here: http://sqlfiddle.com/#!4/997bda/1

  • package_s single values package in our application
  • package_r repeating values package in our application
    • (these joined same value in id column)
  • audit_info audit information in package
  • docs documents can attached package

this query finds packages bad attachments (may more 1 per package)

select distinct ps.pkgname, pr.doc_list package_s ps, package_r pr ps.id = pr.id , not exists (   select 1 docs   pr.doc_list = id   ) order 1,2 asc ; 

i need build query following rules:

  • i need return @ least package id, position value , new document id (i build update statement put new document id in row matching package id / position in package_r table)
  • the way document name audit information is: substr(description,0,instr(description,'[')-2)
  • if document added , removed, should ignored (string_1)
  • string_2 must not 'supporting'
  • the new document must match state = 'master' latest = 1 pub = '0'

right have semi-working script works on per package basis, problem affecting 2000+ packages. find audit entries don't match documents correctly attached package , search names in document table. problem since there no direct link between package , document tables, if there multiple problem attachments on 1 package, each "new" document returned once per position value, i.e.

package id      bad doc id      position      new doc id p1              d1              -1            d1-new p1              d1              -1            d4-new p1              d4              -2            d1-new p1              d4              -2            d4-new 

it doesn't matter new id goes position value, duplication result problem makes hard mass generate update scripts, manual filtering required.

this complex , unique data issue, appreciated.

this query works according informations provided:

with ai (   select a1.audited_id id, dc.id doc_id, dc.docname,       row_number() on (partition a1.audited_id order dc.id) rn     audit_info a1     join docs dc       on dc.state = 'master' , dc.latest = 1 , dc.pub = '0'         , dc.docname = substr(a1.description, 1, instr(a1.description, '[')-2)     string_1 = 'added' , string_2 <> 'supporting'       , not exists (         select * audit_info a2           a2.audited_id = a1.audited_id , string_1 = 'removed'             , a2.description = a1.description )       , not exists (  -- here matching docs eliminated         select 1 package_r pr            pr.id = a1.audited_id , pr.doc_list = dc.id ) ), p (   select ps.id, ps.pkgname, pr.doc_list, pr.position,       row_number() on (partition ps.id order doc_list) rn     package_s ps       join package_r pr on pr.id = ps.id     not exists ( select * docs pr.doc_list = docs.id )    ) select p.id, p.pkgname, p.doc_list, p.position     , ai.docname, ai.doc_id   p join ai on ai.id = p.id , p.rn = ai.rn   order p.id, p.doc_list, ai.doc_id 

output:

id  pkgname  doc_list  position  docname  doc_id --  -------  --------  --------  -------  ------ p1  000001   d3              -3  doc3     d3-new p1  000001   d4              -4  doc4     d4-new p2  000002   d5              -2  doc5     d5-new p4  000004   d6              -1  doc6     d6-new 

edit: answers issues reported in comments

it identifying packages not have bad values, , doc_list column blank,

note query (my subquery p) identyfing packages query, added counter there. guess process/application or manually cleared column doc_list in package_r. if don't want such entries, add condition and trim(doc_list) not null in subquery p.

for ones gets right on package part (they have bad value) bringing wrong docname/doc_id replace bad value with, different doc_id in list.

i understand partially. can add such entries examples (in fiddle or edit question , add problematic input rows , expected output them?)

"it doesn't matter new id goes position value". assignment made way - if had 2 old docs names "abc", "def" , corrected docs have names "xxa", "de12" linked "abc"->"de12" , "def"->"xxa" (alphabetical ordering seems more rational totally random). make assigning random change order ... order null in both row_number() functions.


Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

datatable - Matlab struct computations -