sql - How to join a large table (1M+) with a reference table in Teradata? -


i have following 2 tables-
tablea (3 million rows; "data table")
tableb (2300 rows; "reference table")

schemas -
tablea - id, field1, field2, field3, num_of_actual_items, num_of_possible_items
tableb - field1, field2, field3, num_of_possible_items

background
there 14000 unique ids in tablea.
[select count(distinct id) tablea] gives 14000 answer.

there 2300 possible combinations of field1, field2 , field3 , of them corresponding num_of_possible_items listed in tableb.

tablea not contain entries id when num_of_actual_items given "field1-field2-field3" combination 0. this trying fix.

ideally tablea should have 32,200,000 rows (14000 ids x 2300 combinations). query have follows:

select a1.id, a1.field1, a1.field2, a1.field3, a1.num_of_actual_items,  a1.num_of_possible_items tablea a1 union select distinct a2.id, b.field1, b.field2, b.field3, 0  num_of_actual_items, b.num_of_possible_items tablea a2, tableb b a2.field1 || a2.field2 || a2.field3 <> b.field1 || b.field2 ||  b.field3 

the above query give 2 rows each id , corresponding field1-field2-field3 combination (one real num_of_actual_items , 1 artificially added 0 entry).
in next step, duplicate rows can removed doing aggregate (max of num_of_actual_items) , problem solved.

however, solution works when tablea has 5000-10000 rows. when try full 3 million rows, "out of spool space" error.

the above query populates volatile table , have created indices on field1,field2 & field3 , collected stats on 3.

any ideas can optimize/change query such large volumes?
using teradata.

edit: have added suggestions andrew (limiting rows don't match) , dnoeth (returning distinct rows opposed all) still no avail.
keep running out of spool space.

the 2nd query within union cross join results in 2300 * 3000000 rows before distinct.

you must apply distinct before join:

select a1.id, a1.field1, a1.field2, a1.field3, a1.num_of_actual_items,  a1.num_of_possible_items tablea a1 union select a2.id, b.field1, b.field2, b.field3, 0 num_of_actual_items,   b.num_of_possible_items (select distinct id tablea) a2, tableb b 

Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

datatable - Matlab struct computations -