batch processing - Call a pipeline from a pipeline in Amazon Data Pipeline -
my team @ work looking replacement rather expensive etl tool that, @ point, using glorified scheduler. of integrations offered etl tool have improved using our own python code, need scheduling ability. 1 option looking @ data pipeline, piloting.
my problem thus: imagine have 2 datasets load - products , sales. each of these datasets requires number of steps load (get source data, call python script transform, load redshift). however, product needs loaded before sales runs, need product cost, etc calculate margin. possible have "master" pipeline in data pipeline calls products first, waits successful completion, , calls sales? if so, how? i'm open other product suggestions if data pipeline not well-suited type of workflow. appreciate help
i think can relate use case. how, data pipeline not kind of dependency management on own. can simulated using file preconditions.
in example, child pipelines may depend on file being present (as precondition) before starting. master pipeline create trigger files based on logic executed in activities. child pipeline may create other trigger files start subsequent pipeline downstream.
another solution use simple workflow product . has features looking - need custom coding using flow sdk.
Comments
Post a Comment