database - Fastest way to move data from one table to another in Postgres -
background info
on postgres server have few tables accessed business intelligence applications ideally should stay available of time. tables flushed , reloaded our etl pipeline on nightly basis (i know... due legacy setup can't use incremental updates here). loading process takes pretty long , not bullet-proof stable yet.
to increase availability of tables, decided use staging tables load downstream data our etl pipeline first, , if loaded successfully, copy data on actual production tables.
here copy function created purpose:
create or replace function guarded_copy(src text, dest text) returns void $$ declare c1 integer; c2 integer; begin execute format('select count(*) %i', src) c1; execute format('select count(*) %i', dest) c2; if c1>=c2 execute format('truncate table %i cascade;', dest); execute format('insert %i select * %i;', dest, src); end if; end $$ language plpgsql volatile;
the idea truncate dest
table , load data src
table, if src
table(staging table) has more rows dest
table(actual production table). , works.
note actual production table(dest
) has constraints , indexes, while staging table(src
) configured no indexes or constraints speed loading process etl.
problem
the issue function above data copy can expensive because of indexes , constraints on dest
table.
question
- what's better way achieve same goal?
- i thinking of dropping/disabling index on
dest
before data copy step , add right after. how in sql function? - i thinking swapping 2 tables renaming them, requires indexes on 1 table being copied other. how can inside function?
edit 1
postgres version:
postgresql 9.2.6 on x86_64-unknown-linux-gnu, compiled gcc (ubuntu/linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
table constraints:
on table dest
have (unique) primary key on column id
, , indexes on timestamp columns.
edit 2
this question , this question helped. option 3 above, think code below close want.
create or replace function guarded_swap(src text, dest text) returns void $$ declare c1 integer; c2 integer; _query text; begin execute format('select count(*) %i', src) c1; execute format('select count(*) %i', dest) c2; if c1>=c2 -- create indexes in src table _query in select format('%s;', replace(pg_get_indexdef(ix.indexrelid), dest, src)) pg_class t, pg_class i, pg_index ix t.oid = ix.indrelid , i.oid = ix.indexrelid , t.relkind = 'r' , i.relkind = 'i' , t.oid= dest::regclass order t.relname, i.relname loop execute _query; end loop; -- drop indexes in dest table _query in select format('drop index %s;', i.relname) pg_class t, pg_class i, pg_index ix t.oid = ix.indrelid , i.oid = ix.indexrelid , t.relkind = 'r' , i.relkind = 'i' , t.oid= dest::regclass order t.relname, i.relname loop execute _query; end loop; -- create constraints in src table _query in select format ('alter table %s add constraint %s %s;', src, replace(conname, dest, src), pg_get_constraintdef(oid)) pg_constraint contype = 'p' , conrelid = dest::regclass loop execute _query; end loop; -- drop constraints in dest table _query in select format ('alter table %s drop constraint if exists %s;', dest, conname) pg_constraint conrelid = dest::regclass loop execute _query; end loop; -- swap table names execute format('alter table %i rename %i;', dest, concat(dest, '_old')); execute format('alter table %i rename %i;', src, dest); execute format('alter table %i rename %i;', concat(dest, '_old'), src); end if; end $$ language plpgsql volatile;
edit 3
another thought: pks , fks might unnecessary on tables used analytics purposes. indexes concern here.
create or replace function guarded_swap(src text, dest text) returns void $$ declare c1 integer; c2 integer; _idx_name text; _query text; _qs text[]; begin execute format('select count(*) %i', src) c1; execute format('select count(*) %i', dest) c2; if c1>=c2 -- drop indexes in dest table _idx_name, _query in select i.relname, format('%s;', pg_get_indexdef(ix.indexrelid)) pg_class t, pg_class i, pg_index ix t.oid = ix.indrelid , i.oid = ix.indexrelid , t.relkind = 'r' , i.relkind = 'i' , t.oid= dest::regclass order t.relname, i.relname loop _qs := array_append(_qs, _query); execute format('drop index if exists %s;', _idx_name); end loop; -- swap table names execute format('alter table %i rename %i;', dest, concat(dest, '_old')); execute format('alter table %i rename %i;', src, dest); execute format('alter table %i rename %i;', concat(dest, '_old'), src); -- build indexes in dest table foreach _query in array _qs loop execute _query; end loop; end if; end $$ language plpgsql volatile;
if don't have other objects depending on tables (like views , foreign keys), it's simple operation drop existing table dest
, rename new table src
.
that's item 3. in list, without consideration:
but requires indexes on 1 table being copied other.
you don't "copy" index. create new identical indexes on new table src
in background. table ready , switcheroo matter of milliseconds. however, if go route concurrent load on table, must prepared of these error messages in concurrent transactions:
error: not open relation oid 123456
related answer detailed explanation , sample code on dba.se:
Comments
Post a Comment