The
Aggregator Transformation is an Active and Connected transformation.
The
Aggregator transformation is used to performs aggregate calculations, such as
averages, count first, last, max, median, min, percentile, stddev, sums and
variance.
The
Integration Service performs aggregate calculations as it reads and stores data
group and row data in an aggregate cache. The Aggregator transformation is
unlike the Expression transformation but is use to Aggregator transformation to
perform calculations on groups.
The
aggregate is also you use conditional clauses to filter rows, providing more
flexibility than SQL language.
Components of the Aggregator Transformation
The
Aggregator is an active transformation that changes the number of rows in the
pipeline. The Aggregator transformation has the following components and
options:
- Aggregate
cache. The Integration Service
stores data in the aggregate cache until it completes aggregate
calculations. The Integration Service stores group values in an index
cache and it stores row data in the data cache.
- Aggregate
expression. Enter an expression in an
output port. The expression can include non-aggregate expressions and
conditional clauses.
- Group
by port. Indicate how to create
groups. You can configure an input, input/output, output, or variable port
for the group. When grouping data, the Aggregator transformation outputs
the last row of each group unless otherwise specified.
- Sorted
input. Select this option to
improve session performance. To use sorted input, you must pass data to
the Aggregator transformation sorted by group by port, in ascending or
descending order.
Note: The Integration Service uses memory to process an
Aggregator transformation with sorted ports. The Integration Service does not
use cache memory. You do not need to configure cache memory for Aggregator
transformations that use sorted ports.
1. Aggregate Caches:
When
you run a session that having an Aggregator transformation, the Integration
Service creates the index and the data caches in memory to process the
transformation. If the Integration Service requires more space, it stores
overflow values in cache files.
You
can configure the index and the data caches in the Aggregator transformation or
in the session properties. Or, you can configure the Integration Service to
determine the cache size at run time.
a.
Index Cache:
The index cache holds group information from the group by ports. If we are
using Group By on Port like 'DEPTNO', then this Index cache will
stores values of Grouped ports like 10, 20, 30 etc.
b.
Data Cache:
The DATA CACHE size is generally larger than the INDEX CACHE.
It
will store information like
- Variable port (if any created)
- Non-Grouped Input/Output Port
- Port for which aggregate
functions are used
- Non group by input ports used
in non-aggregate output expression
2. Aggregate Expressions
An
aggregate expression can include conditional clauses and non-aggregate
functions. The expression can also include one aggregate function within
another aggregate function, such as: MAX( COUNT( ITEM ))
The
result of an aggregate expression varies based on the group by ports in the
transformation. If you're not defined any group by port(s) then the Integration
service will return only one row as out. It finds the total quantity of items
sold:
SUM(
QUANTITY )
If
you use the same expression, and you group by the ITEM port, the Integration
Service returns the total quantity of items sold, by item. You can create an aggregate
expression in any output port and use multiple aggregate ports in a
transformation.
You
can nest one aggregate function within another aggregate function. When you use
any of these functions, you must use them in an expression within an Aggregator
transformation.
Aggregate
Function are averages, count first, last, max, median, min, percentile, stddev,
sums and variance.
Normal Aggregate Functions:
max(salary)
or min(salary), avg(salary), sum(salary), count(employee)...etc.
Nested Aggregate Functions:
You
can include multiple single-level or multiple nested functions in different
output ports in an Aggregator transformation. You cannot include both
single-level and nested functions in an Aggregator transformation.
If
an Aggregator transformation contains a single-level function in any output
port, you cannot use a nested function in any other port in that
transformation. When you include single-level and nested functions in the same
Aggregator transformation, then the mapping or mapplet which be invalid.
Example:
- max(salary) is valid
- min(max(salary)) is in-valid.
- max(count(product)) is valid
- min(max(count(product))) is
in-valid.
If you want both single-level and nested functions, create
separate Aggregator transformations.
Conditional Clauses
We
use conditional clauses in the aggregate expression to reduce the number of
rows used in the aggregation. The conditional clause can be any clause that
evaluates to TRUE or FALSE.
Example:
The following expression is to calculate the total commissions of employees who
exceeded their quarterly quota:
Non-Aggregate Functions
We
can also use non-aggregate functions in the aggregate expression.
Example:
The following expression returns the highest number of items sold for each item
(grouped by item). If no items were sold, the expression returns 0.
Null Values in Aggregate Functions
To
handle the Null values in Aggregate functions you have to configure the
Integration Service. You can choose to treat null values in aggregate functions
as NULL or zero. By default, the Integration Service treats null values as NULL
in aggregate functions.
The
Aggregator transformation lets you define groups for aggregations, rather than
performing the aggregation across all input data. The Group by port is the one
which indicate the Integration Service how to create group.
Example:
Finding the total company sales, you can find the total sales grouped by
region.
To
define a group for the aggregate expression, go to port tab and select the
appropriate input, input/output, output, and variable ports and select it as
group by in the Aggregator transformation.
The
Integration Service produces one row for each group. If you do not group
values, the Integration Service returns one row for all input rows. The
Integration Service typically returns the last row of each group (or the last
row received).
4. Using Sorted Input
We
use Sorted Input to improve session performance, If you want to use sorted
input we must sort the input data by group by port in ascending or descending
order before we are passing it to aggregator transformation.
By
using this sorted input option we are telling to aggregator transformation that
we have already sorted incoming data by group by port.
To
use this option "Go the Port Tab" and select "Sorted
Input".
In-case
you have checked sorted input option and the input data is not sorted before
passing to it. Then the session will get failed.
Do
not use sorted input if either of the following conditions are true:
- The aggregate expression uses
nested aggregate functions.
- The session uses incremental
aggregation.
If
you use sorted input and do not sort data correctly, the session fails.
Note:
In
case Flat File sources, use the Sorter transformation to sort data in the
mapping before passing it to the Aggregator transformation.
In
case relational source you can include the order by clause in sql override in
source qualifier or you can also use the number of sorted port in source
qualifier.
Sample
Mapping:
We
will be create a mapping using EMP table from scott to get SUM_OF_SALARY,
MAX_SALARY, MIN_SALARY, COUNT_OF_EMPLOYEE in each department.
1.
Connect and Open the folder if not already opened.
2.
Select Tools --> Mapping Designer
3.
Select Mappings --> Create
It
will pop-up "Mapping Name". Enter the mapping name of your
choice" and Click on 'OK'. Example: " m_emp_salary_aggr".
4.
Drag the Source and Target definitions into workspace if they are already
exist. If not click here to know how to create or import Table definitions.
Source:
Emp
Target:
EMP_SALAY_AGGR a) That will appear you 'Select the transformation type to create:'
b)
Select 'Aggregator' from drop down and 'Enter a new name for this
transformation:' as "aggr_emp_salary"
c) Click 'Create' and
'Done'
or
a) Click
on Aggregator Transformation icon marked below in below snapshot.
b)
Click in the workspace in Mapping Designer.
c) Select AGGTRANS in workspace and Right Click --> Edit.
d)
In Transformation tab --> Click on 'Rename' highlighted above which will
pop-up 'Rename Transformation'. Enter the Transformation Name:
"aggr_emp_salary"
e)
Click on 'OK'
f) Click on
'Apply' and 'OK'.
6).
Drag required ports from 'SQ_EMP' Source Qualifier to 'aggr_emp_salary'
Aggregator Transformation.
7)
Select 'aggr_emp_salary' Aggregator Transformation and Right Click --> Edit
--> Ports Tab.
a) Create 4 output ports SUM_OF_SALARY, MAX_SALARY, MIN_SALARY and COUNT_OF_EMPLOYEE
a) Create 4 output ports SUM_OF_SALARY, MAX_SALARY, MIN_SALARY and COUNT_OF_EMPLOYEE
b)
Open Expression Edit for "SUM_OF_SALARY"
Click
'Ok'
Click 'Ok'
Click 'Ok'
Click
'Ok'
Click
'Apply' and 'Ok'.
8). Drag required
port from 'aggr_emp_salary' to EMP_SALAY_AGGR' Target port as below:
9)
Select 'Mapping' from Menu --> Validate.
10)
Select 'Repository' from Menu --> Save.
Note:
The Integration Service uses memory to
process an Aggregator transformation with sorted ports. The Integration Service
does not use cache memory. You do not need to configure cache memory for
Aggregator transformations that use sorted ports.
Tips:
- Sorted input reduces the amount
of data cached during the session and improves session performance. Use
this option with the Sorter transformation to pass sorted data to the
Aggregator transformation.
- Limit the number of connected
input/output or output ports to reduce the amount of data the Aggregator
transformation stores in the data cache.
nice blog
ReplyDeletevery informative...
ReplyDeleteThank u so much for the detailed and nice explanation...
ReplyDeleteGood post! Thanks for sharing this information I appreciate it. God bless!
ReplyDeleteaggregate spend
All the major points have been covered....Many thanks..nice one
ReplyDeleteThanks a lot .. Articles are all really good.. Appreciate your help.
ReplyDeleteGOWTHAM ITS VERY NICE YAAR
ReplyDeleteAwesome!
ReplyDeleteHi GOWTHAM
ReplyDeleteim new to informatica could help me out how to use first ,last in aggregator with small example
regard's
Reavan
Hi Reavan,
ReplyDeleteGenerally we use this first and last functions based on the requirement. By default informatica while return last value from the group.
By using the First function is used to get the first value from the group and the Last function is used to get the Last value from the group. If you want you can use filter condition for that group. This is an optional.
Thanks,
Gowtham
Hi Gowtham,
ReplyDeletePlease share the document with above details so that its easy to read it offline.
Also , we are unable to open the images above.
Thanks
Avinash
Hi Avyankush,
DeleteIf your using chrome then you save all this pages into pdf.
This is very good information
ReplyDeleteinformatica online training, informatica training in bangalore, informaitca training
Hi Gowtham Sir,
ReplyDeleteIf u disable the cache type and write any query what happen at that time
if u not select any group by port then aggregatore what return to the another transformation
Hi Siva Kumar,
DeleteYour 1st question is not clear to me.
If you have not selected any group by ported then the aggregator will return the last row from the input records.
Hi ,
ReplyDeletemax(count(product)) is valid -- How is this valid ?