Calculate Column Aggregate

From PresenceWiki
Revision as of 13:07, 20 January 2011 by Mattpryor (Talk | contribs)

Jump to: navigation, search

Calculate Column Aggregate

This document has been revised for Presence version 3.6.

The Column Aggregate Node performs the following statistical functions on a column of data in the Data Table:

  • Mean
  • Sum
  • Minimum
  • Maximum
  • Count
  • Median 1
  • Standard Deviation 1

1 Features available in Presence 3.6 and later.

To include this Node in your Task drop a "Calculate Column Aggregate" Task Element into your Task design from the Data Access Task Elements list, after which you will be presented with the following dialog box:


http://www.international-presence.com/wikidocs/images/column_aggregate_dialog.png


There are four configuration options to set for this Node:

  • Data Column - Choose the Data Column that you wish to perform the calculation on.
  • Aggregate Type - Choose which type of aggregate function you wish to perform
  • Variable - The name variable which should store the result
  • Variable Scope - see Variables for a description of variable scope.
  • Rules - set the Rules for runtime evaluation (see #Applying Rules below)

Average

Statistical mean. This is calculated by adding together all values in the column and dividing the result by the number of records.

Example: The average of 10,15,12,11,19,7,18 is 13.142857

Median

This is the middle number if all numbers are arranged in ascending order (assuming an odd number of values). If there is an even number of values the mean is taken of the two middle values.

Examples:

The median of 10, 15, 12, 11, 19, 7, 18 is 11

The median of 6, 10, 15, 22, 48, 52 is 13.5 (the mean of 15 and 22)

Standard Deviation

Standard deviation is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the "average" (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data are spread out over a large range of values.

This is achieved as follows:

  1. Calculate the mean
  2. For each value calculate the difference between that value and the mean
  3. Square that number
  4. Add them all together
  5. Divide by total number of values
  6. Take square root

Example: The standard deviation of 2,4,6,8,10 is 2.8243

Which equals (where m = mean, or 6.0):

(2-m)2 = -42 = 16

+ (4-m)2 = -22 = 4

+ (6-m>2 = 02 = 0

+ (8-m)2 = 22 = 4

+ (10-m)2 = 42 = 16

= 40

40 / 5 = 8

Square root of 8 = 2.8243

Sum

This is the total of all the values in the data range.

Example: The sum of 10,15,12,11,19,7,18 is 92.

Minimum

This is the lowest value in the data range.

Example: The minimum of 10,15,12,11,19,7,18 is 7.

Maximum

This is the highest value in the data range.

Example: The maximum of 10,15,12,11,19,7,18 is 19.

Count

This is a count of all records.

Non-Numeric values

All non-numeric values will be ignored when calculating these results.

Applying Rules

Rules allow you to selectively include results in the final calculation. For example, suppose we want to calculate the average age of a group of cinema goers but don't want to include anyone under the age of 12. In this instance we'd set up a Rule to exclude anyone that falls into that criteria. The conditions would be:

var{AGE} >= 12

Data Filter | Require Columns | Append Data Column | Multiple Column Appender

Drop Column | Drop Row(s) | Calculate Column Aggregate | Dataset Splitter | Merge Data

Create Data Table | Clear Data Table | Sort Data Table | Drop Duplicate Rows | Store Data Table | Retrieve Data Table


Task Elements > Data Table Nodes > Calculate Column Aggregate