POP Scanner

From PresenceWiki
Revision as of 06:41, 15 August 2014 by Mattpryor (Talk | contribs)

Jump to: navigation, search

This is an expansion of the Mail Scanner page and provides more detailed information. We need to consolidate the two at some point.


The Pop scanner allows integration of incoming emails into a Presence Task.

The POP Scanner allows you to specify one or more mailboxes to scan either using a static account name or dynamic variables - that is, using account names and details which have been extracted from a database.

The scanning process then allows you to select either to output all e-mail messages, or to filter based on conditions - much the way as you can query an ODBC/JDBC database or XML file with Presence.

The output from the POP Scanner can then be integrated with a Presence Task in exactly the same way as other data sources to provide alerts or data integration facilities.

Example uses of a Pop Scanner may include but are not limited to the following:

  • Online Email Client - Use Presence to Pickup and Publish emails to a Browser based system. Coupled with the Send Email Action everything you need is ready to build your own on line Email system.
  • Spam Filtering - Use Presence to pickup and remove identified (via a SQL or XML query) spam mail. Once emails have been filtered, Presence can then be used to forward the emails that have not been identified as spam to the intended recipients.
  • Bi-directional Alerting - A Presence Task could be setup to send an email asking a question, based on the reply a task could start a process, gather more information (remote report requests) or simply store the response in a database.
  • Email to SMS forwarding - automatically push the content (or part of the content) of important e-mails out to mobile devices.

To add a Pop scanner to a Task you can select it from the right click menu or drag the Pop scanner icon from the Data Access list.

A Pop Scanner is an Anonymous Task Element meaning that when you create a POP Scanner it is only created for the current task and does not get created as a list entry in the same way an XML, SQL or Object monitor would. You can copy an anonymous Task element from one task to another using CTRL+C to copy and CTRL+V to paste. For more information see the section on Anonymous Task Elements.

When you double click or drop a Pop Scanner on a Task the Pop Scanner Editor dialog is shown, opening the Info tab:


In the info tab we can provide a Name and Description for this node. Account Details

In the Account Details section we provide the connection details for the POP Mail Server and which mailbox(es) we wish to query.

There are two ways in which the account details can be specified - using static mode or dynamic mode. There is also a debug mode which can be enabled. Static Mode

To use the pop scanner in Static mode you would select the Pop Server from the list of available POP Server Resources. If you have not already setup a POP resource you can use the Add New button to invoke the add POP Server resource dialog displayed right.

Assuming an existing POP server resource is being used, the values for Name, Account, Host and Port are populated automatically - the Password is not.

The password is not populated for security reasons and will ensure that use of a POP scanner in a task will only be available to those who know the password for the POP account.

Type the password to complete the Pop account connection details.


Dynamic Mode

In Dynamic mode you can choose to use values from a variable, database column or even a Presence function to define the connection to the mail server and which mailbox to query. To enable Dynamic Mode tick the Override Values box as shown below.


When the override values have been checked you can use the CTRL+Space key combination to view & select the variables and columns on the current Task’s Presence Context to provide the account details. Alternatively you can type in the details manually.

Once the override values tick box has been enabled another option becomes visible - Use Hidden Password. This option will enable you to see or mask the values in the password field. This is useful if you are using a Presence Variable or Column value as the password - in this case you would want to see the reference to this variable data instead of a password mask. For tightest security you would use a password-protected variable (see variable un-lockers for more information).

Again, for security reasons, once you use a masked password, turning off the Use Hidden Password option will not show the previously masked password. Info > Test Tab

This allows you to test the connection parameters and show informational messages on the status of the POP account. If the host cannot be contacted or any information is wrong then a message will be displayed. Once a connection is established the pop scanner will check the number of messages in the mailbox and display this count. No messages will be retrieved at this point.


Debug Mode

At the bottom of the Account section there is also a Debug from file checkbox. This will switch the displayed account information to a path location. You can then import and process POP files which have been exported either by a POP Scanner using the debug option or by the Presence iMPS e-Mail Gateway.

For testing purposes you may wish to save messages directly as MIME encoded files (output as *.eml files) and load them back into the POP scanner to establish the content when defining or debugging a certain tasks.

If the Debug option is being used the following options are displayed.


When you have one or more email files in a folder you can point the Path to this location and the pop scanner will retrieve messages from these mime files.

The Extensions comma separated filter list will ensure you only pickup mime encoded files, in this case we only want to load eml files (Outlook).

Timeout Options

Before a POP Scanner checks a mailbox, it makes an internal check to see if another task is currently using the POP Account Name/ Server Host/ Password combination, if this combination is in use it will keep trying until the lock has been released or a timeout occurs.

If POP Account /Debug Folder is in use, then timeout after: Set this spinner control for the number of minutes the POP scanner will continue to try and connect to a mailbox when a lock is detected.

Release lock on POP Account / debug Folder regardless, after: When the POP scanner establishes a connection on a mailbox it will apply an internal lock on it for the Account Name/ Server Host/ Password combination. Use this value to set the maximum minutes the account should be locked for. This option is to overcome potential task failures and will release the mailbox/file lock. Failed to Connect Options

If a POP account cannot be contacted either through network failure or incorrect connections parameters Presence can deal with this in two ways; It can throw an error back to the task or it can ignore the mailbox and continue progressing other mailboxes. It is recommended for single mailbox monitoring that Treat As Error is used, for a POP scanner that looks at multiple mailboxes use Ignore Errors.

If Ignore Errors is being used your task should deal with the possibility of blank or null values coming from the POP query - for example use a Decision Node to check for blank or null fields.

Scan Tab

On the scan tab you can specify options for filtering by the content of the received e-mails, and what to do with messages in the POP Mailbox after processing.

When Presence evaluates any e-mail message it considers the message to be made up of up to three separate components. Any one, or all three of these components can be scanned and filtered by the POP Scanner.

These three components are the Message Headers, Message Body and Message Parts.

Message Headers make up the information fields of the message and include (but are not limited to) Subject, Sender, Recipient, Return Path, unique Message ID and so on. Note that different e-mail packages may include different Headers. The bare minimum requirement for all e-mails (according to the RFC-822 internet standard) is to include Sender (FROM), Recipient (TO/BCC) and the Date of the message.

The Unique Message ID also should be present and this is added at the receiving mail server (not by the sending mail server). Other headers are optional and may or may not be present. Be aware of this before filtering on headers.

The Message Body is the text content of the e-mail. Although a single e-mail can include multiple body fields - for example, one for plain text, one for HTML formatted, the actual text content will usually remain the same. Presence allows you to treat these as separate items - searching either the plain text version or the HTML codes.

Message Parts are the separate components of a multipart MIME format message. In actual fact the message body fields are considered to be parts also, but for convenience Presence treats these as a separate component. Other Parts include attached documents, embedded images, and any other binary files.

The POP Scanner gives you the flexibility to filter based on any of these three components providing an incredibly powerful tool for filtering or searching incoming e-mail messages. Message Removal Options

When the POP scanner retrieves messages from a mailbox it can receive the messages without removing them from the server, this is useful during testing or if you want to simply leave the messages in the mailbox to be retrieved by a standard email client.

Note that in the case where you do leave the messages in the mailbox you should enable the Message 'Read Once' Options described below.


The Message Removal Options are as follows:

Delete in Test - When using the POP scanner Test tab to test retrieval of messages, these options control whether or not the original messages on the POP server are deleted during the test:

  • Never - Messages are not deleted - they remain permanently on the POP Server until deleted by the e-mail client.
  • Always - Messages are always deleted from the server after processing.
  • If Passes Scan - When message headers or parts are being monitored you can choose to scan these for search words and phrases, in this case only where the search term is found will the message be removed from the server.

Delete In Task - As above, but these options control deletion of POP messages after processing by the Presence Task rather than in the POP Scanner Test tab. These options take effect whenever the Presence Task is run - either in debug mode or on the live task queue. Limit Emails

When scanning emails for search terms, we can limit the number of emails considered for the search, we can also limit the number of emails that are returned.

Maximum of n Emails considered - only search through n emails for our search terms.

Maximum of n Emails returned - only return n emails from the mail box.

Be aware that messages which have not been deleted because they did not match the message removal options as described above are still included in the count of messages considered and returned. If you select to never remove POP messages you should therefore enable the Message 'Read Once' Options described below.

Message 'Read Once' Options

If you select the option 'Never Delete In Task' you should also enable the Read Once options. This ensures that the POP Scanner does not keep looking at messages which have already been processed.

Override Character Set

Generally speaking, received e-mails specify the character set used by the message in the MIME content and Presence will therefore use the correct character set.

If an email is read that does not contain a known character set then you can override the standard character set used by Presence by selecting from the list of options supported by the operating system. Generally you will not need this option because the required character set is usually mentioned in the mime content.

In cases where you do need to change the character set it is advisable to use Unicode.

Filter Conditions tabs

Headers Tab

Here we can specify filter conditions for the Message Header part of each e-mail message.

By default the most commonly used headers are displayed. These standard headers are:

  • Subject - The subject field of the email.
  • Return path - The real originating server of the message.
  • From - The supposed originator of the message.
  • To - the intended recipient of the message.

Alternatively click the Add button to add an alternative Message Headers from the list provided. However you should note that the list provided by the POP Scanner is not exhaustive, and if you wish to filter based on a message header that is not provided by our list you can add your own.

To add your own header select any from the list of Presence provided headers and type over the header identifier.

Also remember that an email message can still be valid without including all of the headers listed by Presence.

The Remove button will remove the header from the list.

Finally, if you have a POP mailbox containing a representative sample of the type of e-mails you will be receiving you can use the Populate button. This causes the POP Scanner to run a full scan of all the e-mail messages in the mailbox, identifying all of the message headers available.

Filter Words

On each of the headers you can add a search word or comma separated list of filter or search words. The search words will be compared regardless of the case of any text found. For example, comparing the search term ‘Hello do you want to make a million in the NEXT 5 Mins’ to "HELLO DO YOU WANT to MAKE a MILLION in the NEXT 5 MINS." would return a match. Contains

The Contains drop down list can be used to change how the search word or words are processed.

  • All - All of the search words listed must be present in the Message Header specified.
  • At Least One - Any of the search words listed must be present in the Message Header specified.
  • None - None of the search words listed must be present in the Message Header specified.

When a search term is evaluated for the headers you can choose to ignore the whole email, or just that particular Message Header. If you choose to ignore one particular Message Header, and you have included it on the output list, then the output will be a blank string.

If you have selected to ignore the email then the email will be ignored if the header IS there but the search conditions for it are not met.

Note that only headers listed on this Tab will be available for output as a column. So, if you need to output the senders name then you will need to add the FROM header, no search terms are required when you add new headers. Body Tab

An email body is just another part of an email like an attachment but mail clients recognize the body part as the part for user reading. In fact an e-mail may consist of multiple body parts of a message - a plain-text part, HTML part etc.

The POP scanner makes it easy to search and filter emails based on the content of the body. The body part will always be available as an output from the POP scanner. By default four rows for specifying Body filters have been added. This is for convenience only, however, as none actually need to be used. If you select to ignore an email unless certain conditions are met, and the email is not ignored because one of the bodies met the conditions, then even bodies that did not meet this condition would pass this filter


Content drop down list

The first drop down list can be used to change how the search word or words are processed.

  • All - All of the search words listed must be present in the Message Body specified.
  • At Least One - Any of the search words listed must be present in the Message Body specified.
  • None - None of the search words listed must be present in the Message Body specified.

'of' word search list

The word search area can contain one or more search words or phrases. Words and phrases can include the * character as a wildcard as in the following example.


'in the...' drop down list

This drop down list can be used to change how the search word or words are processed.

  • content - look for the search word(s) or phrases in the entire content of the Message Body - ie both the text content and the HTML code.
  • html <code> - look for the search word(s) or phrases only in the HTML code of the Message Body - ie between the <> characters.
  • html >text< - look for the search word(s) or phrases only in the plain text content of the Message Body - ie outside the <> characters.

'else ignore...' drop down list

Controls what to do in the event of not finding the specified search words or phrases.

  • ignore Email - Ignore the entire e-mail message.
  • ignore Body - Ignore just this particular Body part. In this instance the message will still be passed on for output. If you have specified the Body as one of the output columns on the Output tab then this field will be passed on as a blank string.

If you choose to ignore the email, Parts (on the next Tab) will still be considered.

If you have selected to ignore the email then the email will be ignored if the header IS there but the search conditions for it are not met.

The Remove button will remove the header from the list; inversely the Add Button will add a new Header to the list.

Parts Tab

As stated, an email may be composed of many parts which include the message body in plain text and HTML format (discussed in the previous section) but also includes any attachments, embedded images, etc.

The POP scanner allows you to search these parts and output them. For instance a part may contain an image or document that you want to capture.

To capture a document file for instance you would add an ‘application/octet-stream’ to your parts list and capture this information into a Presence Column or Variable. for upload to a database or xml file.

The POP scanner can capture each part or content type contained within an email, and these parts can then be output as Presence variables and columns.


As with the other tabs described the Parts defaults 4 rows for output. The first row (the * part) is looking for any part other than the body part, you can use a wild card to extract any attachments or any other parts from the email. If there are more than one attachment then multiple rows will be created for the email in output. If multi attachments are detected then Presence will produce duplicate rows for other types of data to be extracted from the email.

So if you had an email with two attachments then the output columns may look something like the following:

PS_Subject PS_Body PS_PartNumber

Re: your mortgage application Get a good mortgage with us at a very reasonable rate. 1 Re: your mortgage application Get a good mortgage with us at a very reasonable rate. 2

Of course, you would include the part name and data to actually get the data from the attachments.

The search and filtering options work in exactly the same way as the Header tab.

'Ignore the email anyway' option

If the parts mentioned in the parts tab aren't in the message anyway then these filters will not come into effect and the message will pass the parts filtering.

There is an option however to Ignore the email anyway if the 'ignore email' part is not found in the message. e.g. if were ignoring all emails that don't have text attachments starting with the name 'results' then we may also want to ignore all emails that don't have text attachments. Output Tab

The POP Scanner outputs data much the same way as the SQL Query, XML Query, and Object Monitor nodes. The output from a POP Scanner can be used in your task in exactly the same way. To identify columns output from the POP Scanner, each column name is pre-fixed with a 'PS_'.

The columns available for selection are dependant upon the headers, parts and bodies selected in the earlier screens. To select any of these columns simply drag from the left of the screen to the 'Columns included in output' working list.

When the columns have been selected you can change the sort order and change the column aliases (that is, the column name).


'Store Messages for debug' option

At the top of the tab these is an option to output the email file directly to a folder, you can use these files for testing purposes. Your Presence Support representative may ask you to enable this option for debugging. Test Tab

The Testing area of the POP scanner allows you to run a quick test against the POP account and verify that the results are as you would expect.


When you click the Test button the POP Scanner Test screen is presented. Click the Start button to begin querying the POP mailbox. The Progress window will display login messages, count of messages stored in the mailbox, and a final status.


The columns you have selected in the Output Tab will be displayed as requested.

Note that some of the columns (such as the body of the message) will usually be multi-line fields and that to view the full contents of the field you will need to double-click within the field you wish to view.

There are two additional tabs on the Test area; these are Results (a table of columns) and a summary.

The Summary expresses in plain English exactly what query you have defined over the POP accounts.