Difference between revisions of "Calculate Column Aggregate"

From PresenceWiki
Jump to: navigation, search
(Standard Deviation)
 
(13 intermediate revisions by one other user not shown)
Line 18: Line 18:
  
  
http://www.international-presence.com/wikidocs/images/column_aggregate_dialog.png
+
[[file:column_aggregate_dialog.png]]
  
 +
=== Configuration Options ===
  
There are four configuration options to set for this Node:
+
There are six configuration options to set for this Node:
  
 
* '''Data Column''' - Choose the Data Column that you wish to perform the calculation on.
 
* '''Data Column''' - Choose the Data Column that you wish to perform the calculation on.
Line 28: Line 29:
 
* '''Variable Scope''' - see [[Variables]] for a description of variable scope.
 
* '''Variable Scope''' - see [[Variables]] for a description of variable scope.
 
* '''Rules''' - set the [[Rules]] for runtime evaluation (see [[#Applying Rules]] below)
 
* '''Rules''' - set the [[Rules]] for runtime evaluation (see [[#Applying Rules]] below)
 +
* '''Treat non-numerics as zero''' - tells the runtime engine how to treat non-numeric values - see the section on [[#Non-Numeric values]] below.
  
=== Average ===
+
=== Calculation Types ===
 +
 
 +
==== Average ====
  
 
Statistical mean. This is calculated by adding together all values in the column and dividing the result by the number of records.
 
Statistical mean. This is calculated by adding together all values in the column and dividing the result by the number of records.
Line 35: Line 39:
 
Example: The average of 10,15,12,11,19,7,18 is 13.142857
 
Example: The average of 10,15,12,11,19,7,18 is 13.142857
  
=== Median ===
+
==== Median ====
  
 
This is the middle number if all numbers are arranged in ascending order (assuming an odd number of values). If there is an even number of values the mean is taken of the two middle values.
 
This is the middle number if all numbers are arranged in ascending order (assuming an odd number of values). If there is an even number of values the mean is taken of the two middle values.
  
Example: The median of 10,15,12,11,19,7,18 is 11
+
Examples:  
  
=== Standard Deviation ===
+
The median of 10, 15, 12, '''11''', 19, 7, 18 is 11
 +
 
 +
The median of 6, 10, '''15, 22''', 48, 52 is 13.5 (the mean of 15 and 22)
 +
 
 +
==== Standard Deviation ====
  
 
Standard deviation is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the "average" (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data are spread out over a large range of values.
 
Standard deviation is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the "average" (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data are spread out over a large range of values.
Line 74: Line 82:
 
Square root of 8 = 2.8243
 
Square root of 8 = 2.8243
  
=== Sum ===
+
==== Sum ====
  
 
This is the total of all the values in the data range.
 
This is the total of all the values in the data range.
Line 80: Line 88:
 
Example: The sum of 10,15,12,11,19,7,18 is 92.
 
Example: The sum of 10,15,12,11,19,7,18 is 92.
  
=== Minimum ===
+
==== Minimum ====
  
 
This is the lowest value in the data range.
 
This is the lowest value in the data range.
Line 86: Line 94:
 
Example: The minimum of 10,15,12,11,19,7,18 is 7.
 
Example: The minimum of 10,15,12,11,19,7,18 is 7.
  
=== Maximum ===
+
==== Maximum ====
  
 
This is the highest value in the data range.
 
This is the highest value in the data range.
Line 92: Line 100:
 
Example: The maximum of 10,15,12,11,19,7,18 is 19.
 
Example: The maximum of 10,15,12,11,19,7,18 is 19.
  
=== Count ===
+
==== Count ====
  
 
This is a count of all records.
 
This is a count of all records.
  
=== Non-Numeric values ===
+
== Non-Numeric values ==
  
All non-numeric values will be ignored when calculating these results.
+
By default for all operations other than "Count", all non-numeric values will be ignored when calculating results. For example, consider the following values:
 +
 
 +
* 7
 +
* 10
 +
* 14
 +
* 13
 +
* 19
 +
* Steve
 +
* 27
 +
* 32
 +
* 11
 +
* Frank
 +
* NULL
 +
 
 +
If Presence is asked to calculate a statistical function over these values, only the numeric values will be considered. The string values ("Steve" and "Frank") and the NULL value will be discarded prior to the calculation.
 +
 
 +
However if "Treat non-numerics as zero" is checked these values will be converted to zero before performing the calculation, resulting in:
 +
 
 +
* 7
 +
* 10
 +
* 14
 +
* 13
 +
* 19
 +
* '''0'''
 +
* 27
 +
* 32
 +
* 11
 +
* '''0'''
 +
* '''0'''
  
 
==== Applying Rules ====
 
==== Applying Rules ====
Line 104: Line 140:
 
Rules allow you to selectively include results in the final calculation. For example, suppose we want to calculate the average age of a group of cinema goers but don't want to include anyone under the age of 12. In this instance we'd set up a Rule to exclude anyone that falls into that criteria. The conditions would be:
 
Rules allow you to selectively include results in the final calculation. For example, suppose we want to calculate the average age of a group of cinema goers but don't want to include anyone under the age of 12. In this instance we'd set up a Rule to exclude anyone that falls into that criteria. The conditions would be:
  
:var{AGE} >= 12
+
:var{AGE} / is / greater than or equal to / 12
 +
 
 +
Or, we could state the same thing as:
 +
 
 +
:var{AGE} / is not / less than / 12
 +
 
 +
Both equate to the same condition but are expressed differently.
 +
 
 +
To establish this rule, click on the "Set Rules" button, which will launch the Rules dialog (see below). The column named "AGE" is visible in the left hand pane. Drag this into the main window in the main clause, and a New Condition dialog box is shown. Clicking "Okay" will create the condition and the Rule will now be in place.
 +
 
 +
[[file:calc_column_aggregate_rule.png]]
 +
 
 +
See also: Presence [[Rules]].
 +
 
 +
 
  
 
{{DataTableNodes}}
 
{{DataTableNodes}}

Latest revision as of 13:25, 21 August 2015

Calculate Column Aggregate

This document has been revised for Presence version 3.6.

The Column Aggregate Node performs the following statistical functions on a column of data in the Data Table:

  • Mean
  • Sum
  • Minimum
  • Maximum
  • Count
  • Median 1
  • Standard Deviation 1

1 Features available in Presence 3.6 and later.

To include this Node in your Task drop a "Calculate Column Aggregate" Task Element into your Task design from the Data Access Task Elements list, after which you will be presented with the following dialog box:


Column aggregate dialog.png

Configuration Options

There are six configuration options to set for this Node:

  • Data Column - Choose the Data Column that you wish to perform the calculation on.
  • Aggregate Type - Choose which type of aggregate function you wish to perform
  • Variable - The name variable which should store the result
  • Variable Scope - see Variables for a description of variable scope.
  • Rules - set the Rules for runtime evaluation (see #Applying Rules below)
  • Treat non-numerics as zero - tells the runtime engine how to treat non-numeric values - see the section on #Non-Numeric values below.

Calculation Types

Average

Statistical mean. This is calculated by adding together all values in the column and dividing the result by the number of records.

Example: The average of 10,15,12,11,19,7,18 is 13.142857

Median

This is the middle number if all numbers are arranged in ascending order (assuming an odd number of values). If there is an even number of values the mean is taken of the two middle values.

Examples:

The median of 10, 15, 12, 11, 19, 7, 18 is 11

The median of 6, 10, 15, 22, 48, 52 is 13.5 (the mean of 15 and 22)

Standard Deviation

Standard deviation is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the "average" (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data are spread out over a large range of values.

This is achieved as follows:

  1. Calculate the mean
  2. For each value calculate the difference between that value and the mean
  3. Square that number
  4. Add them all together
  5. Divide by total number of values
  6. Take square root

Example: The standard deviation of 2,4,6,8,10 is 2.8243

Which equals (where m = mean, or 6.0):

(2-m)2 = -42 = 16

+ (4-m)2 = -22 = 4

+ (6-m>2 = 02 = 0

+ (8-m)2 = 22 = 4

+ (10-m)2 = 42 = 16

= 40

40 / 5 = 8

Square root of 8 = 2.8243

Sum

This is the total of all the values in the data range.

Example: The sum of 10,15,12,11,19,7,18 is 92.

Minimum

This is the lowest value in the data range.

Example: The minimum of 10,15,12,11,19,7,18 is 7.

Maximum

This is the highest value in the data range.

Example: The maximum of 10,15,12,11,19,7,18 is 19.

Count

This is a count of all records.

Non-Numeric values

By default for all operations other than "Count", all non-numeric values will be ignored when calculating results. For example, consider the following values:

  • 7
  • 10
  • 14
  • 13
  • 19
  • Steve
  • 27
  • 32
  • 11
  • Frank
  • NULL

If Presence is asked to calculate a statistical function over these values, only the numeric values will be considered. The string values ("Steve" and "Frank") and the NULL value will be discarded prior to the calculation.

However if "Treat non-numerics as zero" is checked these values will be converted to zero before performing the calculation, resulting in:

  • 7
  • 10
  • 14
  • 13
  • 19
  • 0
  • 27
  • 32
  • 11
  • 0
  • 0

Applying Rules

Rules allow you to selectively include results in the final calculation. For example, suppose we want to calculate the average age of a group of cinema goers but don't want to include anyone under the age of 12. In this instance we'd set up a Rule to exclude anyone that falls into that criteria. The conditions would be:

var{AGE} / is / greater than or equal to / 12

Or, we could state the same thing as:

var{AGE} / is not / less than / 12

Both equate to the same condition but are expressed differently.

To establish this rule, click on the "Set Rules" button, which will launch the Rules dialog (see below). The column named "AGE" is visible in the left hand pane. Drag this into the main window in the main clause, and a New Condition dialog box is shown. Clicking "Okay" will create the condition and the Rule will now be in place.

Calc column aggregate rule.png

See also: Presence Rules.



Data Filter | Require Columns | Append Data Column | Multiple Column Appender

Drop Column | Drop Row(s) | Calculate Column Aggregate | Dataset Splitter | Merge Data

Create Data Table | Clear Data Table | Sort Data Table | Drop Duplicate Rows | Store Data Table | Retrieve Data Table


Task Elements > Data Table Nodes > Calculate Column Aggregate