Tuesday, January 26, 2016

Keyed & Keyless Partitions in IBM DataStage

This post is about the IBM DataStage Partition methods:

Keyless Partitioning Rows distributed independently of data values	Keyed Partitioning Rows distributed based on values in specified keys
Same: Existing Partition is not altered Round Robin: Rows are evenly processed among partitions Random: a row is assigned based on random algorithm Entire: Each partition gets entire dataset (rows are duplicated)	Hash: Rows with same key column values go to same partition Modulus: Assign each row of an input dataset to a partition, as determined by specified numeric key column Range: Similar to Hash, but partition mapping is user-determined and partitions are ordered DB2: Matches DB2 EEE partitioning

Auto Partitioning;

# DataStage ETL Framework inserts partition algorithm necessary to ensure correct results.
- Generally preference is given to ROUND-ROBIN or SAME, before any stage with "Auto" partitioning
- Inserts HASH on stages that require matched key values (e.g: Join, Merge, Remove Duplicates)
- Inserts ENTIRE on Normal (not Sparse) Lookup reference links.
NOT always appropriate for MPP/clusters

Since DataStage has limited awareness of your data and business rules, explicityly specify HASH partitioning when needed, that is, when processing requires groups of related records.

- DataStage has no visibility into Transformer logic

- Hash is required before Sort and Aggregator stages
Auto generally chooses Round Robin when going from sequential to parallel.
It generally chooses Same when going from parallel to parallel.

Learn Datastage, ETL, Data warehousing, SQL, PLSQL and more

Tuesday, January 26, 2016

Keyed & Keyless Partitions in IBM DataStage

0 comments:

Post a Comment

Popular Posts

Recent Posts

Pages

Text Widget

Blog Archive

About Me

Visits

Blogger news

Blogroll

Tuesday, January 26, 2016

Related Posts:

0 comments:

Post a Comment

Popular Posts

Recent Posts

Pages

Text Widget

Blog Archive

About Me

Visits

Blogger news

Blogroll