1. Reduce the number of transformations. There is always overhead
involved in moving data between transformations.
2. Consider more shared memory for large number of
transformations. Session shared memory between 12MB and 40MB should suffice.
3. Calculate once, use many times.
- Avoid
calculating or testing the same value over and over.
- Calculate it
once in an expression, and set a True/False flag.
- Within an
expression, use variable ports to calculate a value than can be used
multiple times within that transformation.
- Delete
unnecessary links between transformations to minimize the amount of data
moved, particularly in the Source Qualifier.
- This is also
helpful for maintenance. If a transformation needs to be reconnected, it
is best to only have necessary ports set as input and output to reconnect.
- In lookup
transformations, change unused ports to be neither input nor output. This
makes the transformations cleaner looking. It also makes the generated SQL
override as small as possible, which cuts down on the amount of cache
necessary and thereby improves performance.
- The engine
automatically converts compatible types.
- Sometimes data
conversion is excessive. Data types are automatically converted when types
are different between connected ports. Minimize data type changes between
transformations by planning data flow prior to developing the mapping.
- Plan for
reusable transformations upfront.
- Use variables.
Use both mapping variables as well as ports that are variables. Variable
ports are especially beneficial when they can be used to calculate a
complex expression or perform a disconnected lookup call only once instead
of multiple times
- Use mapplets to
encapsulate multiple reusable transformations.
- Use mapplets to
leverage the work of critical developers and minimize mistakes when
performing similar functions.
- Reduce the
number of non-essential records that are passed through the entire
mapping.
- Use active
transformations that reduce the number of records as early in the mapping
as possible (i.e., placing filters, aggregators as close to source
as possible).
- Select
appropriate driving/master table while using joins. The table with the
lesser number of rows should be the driving/master table for a faster
join.
- Redesign
mappings to utilize one Source Qualifier to populate multiple targets.
This way the server reads this source only once. If you have different
Source Qualifiers for the same source (e.g., one for delete and one for
update/insert), the server reads the source for each Source Qualifier.
- Remove or reduce
field-level stored procedures.
- If you use
field-level stored procedures, the PowerCenter server has to make a call
to that stored procedure for every row, slowing performance.
4. Only connect what is used.
5. Watch the data types.
6. Facilitate reuse.
7. Only manipulate data that needs to be moved and transformed.
8. Utilize single-pass reads.
9. Sort the input data before passing to Joiner and Aggregate
transformation.
10. In Lookup using customize query instead of default query. (Use
'--' to overwrite lookup default order by clause).
11. Avoid using un-neccessary columns/port in sql query.
12. Filter un-neccessary data as closer to the source qualifier.
(In case of Relational database include filter condition to the sql query).
13. In Joiner consider lesser value of data as Master Table.
14. In-case of mapping partition place aggregate transformation
before the partition point.
15. Use Router instead of having multiple Filter transformations.
good stuff technically.
ReplyDeleteand please help us with allocation of memory taking sample data(10k/1M) records. Respective allocations of index cache, data cache in joiner/rank, aggregator, sorter transformation, or DTM/buffer block size recommendations