Difference between revisions of "Parse File Action"

From PresenceWiki
Jump to: navigation, search
(Parsing HTML or RTF Tables)
 
(47 intermediate revisions by 3 users not shown)
Line 1: Line 1:
http://www.international-presence.com/images/docs/fileparser/fileparser_header.png
+
[[file:fileparser_header.png]]
 
== parse File Action Node ==
 
== parse File Action Node ==
 +
 +
'''NOTE THIS NODE HAS BEEN REPLACED WITH THE [[Flat File Parser]]'''
  
 
The purpose of this node is to convert data found within a file into a dataset.
 
The purpose of this node is to convert data found within a file into a dataset.
Line 13: Line 15:
 
*CSV Table
 
*CSV Table
  
===== Parsing HTML or RTF Tables =====
+
== Parsing Text - An Example ==
 +
 
 +
Given the text below, lets say we want to extract specific data from it into a table:-
 +
 
 +
[[file:customer.png]]
 +
 
 +
For the OrderID, we have selected 5 characters after "OrderID=" and set the repeat to true (as there will be more than one).
 +
 
 +
We do the same for the CutomerID but without the repeat. A regular expression is then generated for this search.
 +
 
 +
We then set the Telephone column to be the characters between Tel= and \r\n (a new line).
 +
 
 +
Finally we add a search for County, although you will notice that there is no County text in the Customer.txt file.
 +
 
 +
[[file:customersscan.png]]
 +
 
 +
On the Error Handling tab we can decide on what action to take when the field cannot be found.
 +
 
 +
In this case we are looking for a County text that is not there,
 +
 
 +
We can get Presence to throw an exception, continue and ignore or add the column but set it to be empty.
 +
 
 +
[[file:customersexceptions.png]]
 +
 
 +
Finally when we run our test we get the following:-
 +
 
 +
[[file:customersresults.png]]
 +
 
 +
== Parsing HTML, CSV or RTF Tables - An Example ==
 +
 
 +
If your data is in a table format then it will be easier to parse as the data is already separated into cells.
  
 
Once we have chosen the type of file we are going to parse, and the url we can hit populate.
 
Once we have chosen the type of file we are going to parse, and the url we can hit populate.
  
This pre parses the table in the file and returns a list of available cells.  
+
This pre-parses the table in the file and adds a list of available cells to the table.  
  
You then probably need to delete the cells you aren't interested in, along with the cells that contain data as opposed to the cells that contain fields.
+
You should also see that the drop-down box in the Search1 cell now contains all the available cells.
 +
 
 +
You then probably need to delete the rows for the cells you aren't interested in, along with the cells that contain data as opposed to the cells that contain fields.
  
 
We can then get the parser to search for a cell by giving it either a location,
 
We can then get the parser to search for a cell by giving it either a location,
 
such as pt(0,0) or text to search for, such as TEAM NAME.
 
such as pt(0,0) or text to search for, such as TEAM NAME.
  
When it finds this cell it will then return:-
+
When it finds this cell we can then return:-
  
Cell Above
+
Cell Above - The cell above this cell will be added to the datatable.
  
Cell Right
+
Cell Right - The cell to the right of this cell will be added to the datatable.
  
Cell Below
+
Cell Below - The cell below this cell will be added to the datatable.
  
Cell Left
+
Cell Left - The cell to the left of this cell will be added to the datatable.
  
Chars At Cell
+
Chars At Cell - This cell itself will be added to the datatable.
  
 
Given the the rtf file we have below, let us say we want to extract the 2 Contact Telephone Numbers, the
 
Given the the rtf file we have below, let us say we want to extract the 2 Contact Telephone Numbers, the
 
Team Name and the Team Location.
 
Team Name and the Team Location.
  
http://www.international-presence.com/images/docs/fileparser/rtffile.png
+
[[file:rtffile.png]]
  
http://www.international-presence.com/images/docs/fileparser/fileparserscan.png
+
We would set up our scan criteria as follows:-
 +
 
 +
[[file:fileparserscan.png]]
  
 
Notice that "Repeat" is selected for "telephone Numbers"  
 
Notice that "Repeat" is selected for "telephone Numbers"  
  
This is because we know there is more than one telephone number and so we'd like Presence to keep returning them.
+
This is because we know there is more than one telephone number and so we'd like Presence to keep returning them unitl it runs out.
  
For each new Telephone number we can add a new row to the datatable or add a new column (in the form of TEL_1,TEL_2 etc).
+
By editing the repeat options, we can relate each telephone number to a new row in the datatable or a new column (in the form of TEL_1,TEL_2 etc).
  
http://www.international-presence.com/images/docs/fileparser/fileparseroutput.png
+
[[file:fileparseroutput.png]]
  
 
Here are the results
 
Here are the results
  
http://www.international-presence.com/images/docs/fileparser/fileparserresults.png
+
[[file:fileparserresults.png]]
 +
 
 +
=== Explanation of Search Type options ===
 +
 
 +
 
 +
 
 +
*<b>Cell Above</b> - Searches for the cell defined in Search1, then returns the data in the cell above it.
 +
 
 +
*<b>Cell Right</b> - Searches for the cell defined in Search1, then returns the data in the cell to the right of it.
 +
 
 +
*<b>Cell Below</b> - Searches for the cell defined in Search1, then returns the data in the cell below it.
 +
 
 +
*<b>Cell Left</b> - Searches for the cell defined in Search1, then returns the data in the cell to the left of it.
 +
 
 +
*<b>Chars At Cell</b> - Searches for the cell defined in Search1, then returns the data in it.
 +
 
 +
*<b>Chars Before</b> - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
 +
 
 +
*<b>Chars After</b> - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
 +
 
 +
*<b>Chars Between</b> - This will create a regular expression that will return the character between the strings specified in Search1 and Search2
 +
 
 +
*<b>Chars At Positions</b> - This will return the characters from the index specified in chars1 to the index specified in chars2.
 +
 
 +
=== Regular Expressions ===
  
Chars Before
+
In the case of <b>Chars Before</b>,<b>Chars After</b> and <b>Chars Between</b> a regular expression is created in the "Regular Expression Column"
 +
 +
Presence then uses the text in this "Regular Expression" column to search the file.
 +
 +
If you are familiar with regular expressions then you can edit this text for a more specific search.
 +
 +
There is also a button by the regular expression options, that allows you to enable/disable the following:-
 +
 +
<b>CASE_INSENSITIVE</b> Enables case-insensitive matching so Telephone would match with TELephone when checked.
 +
 +
<b>MULTILINE</b> This allows the search to continue over multiple lines.
 +
 +
<b>DOTALL</b> The regular expression '.' matches any character except a line terminator unless the DOTALL flag is specified.
  
Chars After
 
  
Chars Between
+
== See Also ==
  
Chars At Positions
+
{{Actions}}

Latest revision as of 15:43, 19 August 2015

Fileparser header.png

parse File Action Node

NOTE THIS NODE HAS BEEN REPLACED WITH THE Flat File Parser

The purpose of this node is to convert data found within a file into a dataset.

The files that we can convert are:-

  • RTF Table
  • RTF
  • Text
  • HTML
  • HTML Table
  • CSV Table

Parsing Text - An Example

Given the text below, lets say we want to extract specific data from it into a table:-

Customer.png

For the OrderID, we have selected 5 characters after "OrderID=" and set the repeat to true (as there will be more than one).

We do the same for the CutomerID but without the repeat. A regular expression is then generated for this search.

We then set the Telephone column to be the characters between Tel= and \r\n (a new line).

Finally we add a search for County, although you will notice that there is no County text in the Customer.txt file.

Customersscan.png

On the Error Handling tab we can decide on what action to take when the field cannot be found.

In this case we are looking for a County text that is not there,

We can get Presence to throw an exception, continue and ignore or add the column but set it to be empty.

Customersexceptions.png

Finally when we run our test we get the following:-

Customersresults.png

Parsing HTML, CSV or RTF Tables - An Example

If your data is in a table format then it will be easier to parse as the data is already separated into cells.

Once we have chosen the type of file we are going to parse, and the url we can hit populate.

This pre-parses the table in the file and adds a list of available cells to the table.

You should also see that the drop-down box in the Search1 cell now contains all the available cells.

You then probably need to delete the rows for the cells you aren't interested in, along with the cells that contain data as opposed to the cells that contain fields.

We can then get the parser to search for a cell by giving it either a location, such as pt(0,0) or text to search for, such as TEAM NAME.

When it finds this cell we can then return:-

Cell Above - The cell above this cell will be added to the datatable.

Cell Right - The cell to the right of this cell will be added to the datatable.

Cell Below - The cell below this cell will be added to the datatable.

Cell Left - The cell to the left of this cell will be added to the datatable.

Chars At Cell - This cell itself will be added to the datatable.

Given the the rtf file we have below, let us say we want to extract the 2 Contact Telephone Numbers, the Team Name and the Team Location.

Rtffile.png

We would set up our scan criteria as follows:-

Fileparserscan.png

Notice that "Repeat" is selected for "telephone Numbers"

This is because we know there is more than one telephone number and so we'd like Presence to keep returning them unitl it runs out.

By editing the repeat options, we can relate each telephone number to a new row in the datatable or a new column (in the form of TEL_1,TEL_2 etc).

Fileparseroutput.png

Here are the results

Fileparserresults.png

Explanation of Search Type options

  • Cell Above - Searches for the cell defined in Search1, then returns the data in the cell above it.
  • Cell Right - Searches for the cell defined in Search1, then returns the data in the cell to the right of it.
  • Cell Below - Searches for the cell defined in Search1, then returns the data in the cell below it.
  • Cell Left - Searches for the cell defined in Search1, then returns the data in the cell to the left of it.
  • Chars At Cell - Searches for the cell defined in Search1, then returns the data in it.
  • Chars Before - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
  • Chars After - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
  • Chars Between - This will create a regular expression that will return the character between the strings specified in Search1 and Search2
  • Chars At Positions - This will return the characters from the index specified in chars1 to the index specified in chars2.

Regular Expressions

In the case of Chars Before,Chars After and Chars Between a regular expression is created in the "Regular Expression Column"

Presence then uses the text in this "Regular Expression" column to search the file.

If you are familiar with regular expressions then you can edit this text for a more specific search.

There is also a button by the regular expression options, that allows you to enable/disable the following:-

CASE_INSENSITIVE Enables case-insensitive matching so Telephone would match with TELephone when checked.

MULTILINE This allows the search to continue over multiple lines.

DOTALL The regular expression '.' matches any character except a line terminator unless the DOTALL flag is specified.


See Also

Task Elements : Action Task Elements : Parse File Action

Send Email | Send SMS | Send Fax | Broadcast Messages
Read Text File | Read Binary File | Write Text File | Write Binary File | Parse File Action
Rename File | Copy File | Delete File | Parse File Action
Generate Bar Code | Read Bar Code
Dynamic Task Call | Call Native Program | FTP Upload | Scorecard Collector | Create Graph | AS400 Action
Socket Client Action | Socket Server Action
JMS Producer | JMS Consumer


Task Elements | Resources