Calculate Column Aggregate
Contents
Calculate Column Aggregate
This document has been revised for Presence version 3.6.
The Column Aggregate Node performs the following statistical functions on a column of data in the Data Table:
- Mean
- Sum
- Minimum
- Maximum
- Count
- Median ^{1}
- Standard Deviation ^{1}
^{1} Features available in Presence 3.6 and later.
To include this Node in your Task drop a "Calculate Column Aggregate" Task Element into your Task design from the Data Access Task Elements list, after which you will be presented with the following dialog box:
Configuration Options
There are six configuration options to set for this Node:
- Data Column - Choose the Data Column that you wish to perform the calculation on.
- Aggregate Type - Choose which type of aggregate function you wish to perform
- Variable - The name variable which should store the result
- Variable Scope - see Variables for a description of variable scope.
- Rules - set the Rules for runtime evaluation (see #Applying Rules below)
- Treat non-numerics as zero - tells the runtime engine how to treat non-numeric values - see the section on #Non-Numeric values below.
Calculation Types
Average
Statistical mean. This is calculated by adding together all values in the column and dividing the result by the number of records.
Example: The average of 10,15,12,11,19,7,18 is 13.142857
Median
This is the middle number if all numbers are arranged in ascending order (assuming an odd number of values). If there is an even number of values the mean is taken of the two middle values.
Examples:
The median of 10, 15, 12, 11, 19, 7, 18 is 11
The median of 6, 10, 15, 22, 48, 52 is 13.5 (the mean of 15 and 22)
Standard Deviation
Standard deviation is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the "average" (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data are spread out over a large range of values.
This is achieved as follows:
- Calculate the mean
- For each value calculate the difference between that value and the mean
- Square that number
- Add them all together
- Divide by total number of values
- Take square root
Example: The standard deviation of 2,4,6,8,10 is 2.8243
Which equals (where m = mean, or 6.0):
- (2-m)^{2} = -4^{2} = 16
+ (4-m)^{2} = -2^{2} = 4
+ (6-m>^{2} = 0^{2} = 0
+ (8-m)^{2} = 2^{2} = 4
+ (10-m)^{2} = 4^{2} = 16
= 40
40 / 5 = 8
Square root of 8 = 2.8243
Sum
This is the total of all the values in the data range.
Example: The sum of 10,15,12,11,19,7,18 is 92.
Minimum
This is the lowest value in the data range.
Example: The minimum of 10,15,12,11,19,7,18 is 7.
Maximum
This is the highest value in the data range.
Example: The maximum of 10,15,12,11,19,7,18 is 19.
Count
This is a count of all records.
Non-Numeric values
By default for all operations other than "Count", all non-numeric values will be ignored when calculating results. For example, consider the following values:
- 7
- 10
- 14
- 13
- 19
- Steve
- 27
- 32
- 11
- Frank
- NULL
If Presence is asked to calculate a statistical function over these values, only the numeric values will be considered. The string values ("Steve" and "Frank") and the NULL value will be discarded prior to the calculation.
However if "Treat non-numerics as zero" is checked these values will be converted to zero before performing the calculation, resulting in:
- 7
- 10
- 14
- 13
- 19
- 0
- 27
- 32
- 11
- 0
- 0
Applying Rules
Rules allow you to selectively include results in the final calculation. For example, suppose we want to calculate the average age of a group of cinema goers but don't want to include anyone under the age of 12. In this instance we'd set up a Rule to exclude anyone that falls into that criteria. The conditions would be:
- var{AGE} / is / greater than or equal to / 12
Or, we could state the same thing as:
- var{AGE} / is not / less than / 12
Both equate to the same condition but are expressed differently.
To establish this rule, click on the "Set Rules" button, which will launch the Rules dialog (see below). The column named "AGE" is visible in the left hand pane. Drag this into the main window in the main clause, and a New Condition dialog box is shown. Clicking "Okay" will create the condition and the Rule will now be in place.
See also: Presence Rules.
Data Filter | Require Columns | Append Data Column | Multiple Column Appender
Drop Column | Drop Row(s) | Calculate Column Aggregate | Dataset Splitter | Merge Data
Create Data Table | Clear Data Table | Sort Data Table | Drop Duplicate Rows | Store Data Table | Retrieve Data Table
Task Elements > Data Table Nodes > Calculate Column Aggregate