sql - How to join a large table (1M+) with a reference table in Teradata? -
i have following 2 tables-
tablea (3 million rows; "data table")
tableb (2300 rows; "reference table")
schemas -
tablea - id, field1, field2, field3, num_of_actual_items, num_of_possible_items
tableb - field1, field2, field3, num_of_possible_items
background
there 14000 unique ids in tablea.
[select count(distinct id) tablea] gives 14000 answer.
there 2300 possible combinations of field1, field2 , field3 , of them corresponding num_of_possible_items listed in tableb.
tablea not contain entries id when num_of_actual_items given "field1-field2-field3" combination 0. this trying fix.
ideally tablea should have 32,200,000 rows (14000 ids x 2300 combinations). query have follows:
select a1.id, a1.field1, a1.field2, a1.field3, a1.num_of_actual_items, a1.num_of_possible_items tablea a1 union select distinct a2.id, b.field1, b.field2, b.field3, 0 num_of_actual_items, b.num_of_possible_items tablea a2, tableb b a2.field1 || a2.field2 || a2.field3 <> b.field1 || b.field2 || b.field3
the above query give 2 rows each id , corresponding field1-field2-field3 combination (one real num_of_actual_items , 1 artificially added 0 entry).
in next step, duplicate rows can removed doing aggregate (max of num_of_actual_items) , problem solved.
however, solution works when tablea has 5000-10000 rows. when try full 3 million rows, "out of spool space" error.
the above query populates volatile table , have created indices on field1,field2 & field3 , collected stats on 3.
any ideas can optimize/change query such large volumes?
using teradata.
edit: have added suggestions andrew (limiting rows don't match) , dnoeth (returning distinct rows opposed all) still no avail.
keep running out of spool space.
the 2nd query within union cross join results in 2300 * 3000000 rows before distinct.
you must apply distinct
before join:
select a1.id, a1.field1, a1.field2, a1.field3, a1.num_of_actual_items, a1.num_of_possible_items tablea a1 union select a2.id, b.field1, b.field2, b.field3, 0 num_of_actual_items, b.num_of_possible_items (select distinct id tablea) a2, tableb b
Comments
Post a Comment