The Aggregate Operator |
|
The Aggregate operator processes multiple input records as a group, producing a new set of data.
The scope of the Aggregate operator is different to that of the Formula operator. The latter deals with a single row of data, ie a distinct record. The Aggregate operator, however, makes comparisons and performs calculations across rows of data, eg aggregating several different results for a single student.
Therefore, the Aggregate operator contracts the data fed into it. It allows you to take multiple records and combine them according to your requirements, outputting the combined information.
# |
Feature |
---|---|
1 |
Enter a useful description of the function of the Aggregate operator, as this is what will appear in the Aggregate operator box in the Steps pane. |
2 |
The Available fields pane contains all of the fields that are available to the Aggregate operator from the upstream operators. These may include user defined fields from other operators, such as a Formula operator. These fields are selected from to form your Group by fields. |
3 |
Fields from the Available fields pane are moved into the Group by pane by clicking the icon between the two panes. Double clicking the field will also insert it. |
4 |
In the Group by pane the fields by which the data is to be aggregated, or grouped are entered, their order being irrelevant. In this example, the data is being grouped by student code and the year. Academic cycle fields usually need to be included in order to be able to display cycle specific data downstream, eg the student’s year level, although the field Ct.YRLV.Name or Ct.YRLV.Code could have been used for this purpose also. On the other hand, including the academic cycle may restrict the scope available to a downstream Expand Operator which you require to extract data across years. |
5 |
In the Formula expressions table, you create new fields from the data coming into the aggregate operator. Each new field is given a name, which should be unique and easily identifiable. A calculation is entered, that determines what data is aggregated and under what conditions, the result of which being held in the new field. The new aggregated fields become available in the downstream operators, eg the Print operator. You may enter each formula directly into the cell in the table or click the Browse button to bring up the Calculation Editor where you can compose the formula. |
6 |
In most case, the Only output summary information checkbox is ticked. This means that only the aggregated rows of data will be output. Otherwise, every row of data which contributed to the aggregation is output, including the aggregated new fields. Where you want to include the raw data as well as the aggregated fields, deselect this checkbox. For example, you may have an Aggregate operator that counts the number of As, Bs, Cs, etc for each student, across all of their subjects. If you wanted to include details of the subjects of the student and the grade for each, as well as the count fields in the Print operator, you would deselect the Only output summary information checkbox. |
The Group by pane is where you insert the fields by which you want to aggregate or combine the multiple rows of data, ie for each unique combination of the fields inserted, the calculations are to be performed. Take the example where you want to count the numbers of As, Bs, Cs, etc:
The fields used for grouping are extremely important for downstream operators also, as they determine the data that will be available to these, eg for inclusion in the Print operator. This is because the Aggregate operator contracts the data stream which is fed into it, so that the output is limited, based on the grouping fields used. For example, if grouping by student and you do not include the YEAR.Name or similar annual cycle field in the Group by pane, then student fields which vary across years, eg the student’s year level and home group may not be available for inclusion in any of the downstream operators such as the Print operator.
|
It is not always necessary to set up new fields in the Aggregate operator as there are certain situations where the Aggregate operator is used to simply summarise the data coming into it for the purposes of downstream operators, and in particular the Expand operator. For example, the data from the Source operator may be restricted to a particular year or a particular set of results for students. In order to be able to expand this information to other years’ results or less specific information, we need to contract back to a less specific level. This requires an understanding of entities, whereby the Students entity is the least specific and the Results entity the most detailed entity. When the source data is so restricted and so specific, there is nothing else that can be expanded upon. On the other hand, if all we know is the student’s code, all data about the student code, for all years, is potentially available for us to expand upon.
|