Difference between revisions of "Parse File Action"

From PresenceWiki
Jump to: navigation, search
(Parsing Text - An Example)
(Parsing Text - An Example)
Line 24: Line 24:
  
 
We then set the Telelphone column to be the characters between Tel= and \r\n (a new line).
 
We then set the Telelphone column to be the characters between Tel= and \r\n (a new line).
 +
 +
Finally we add a search for County, although you will notice that there is no County text in the Customer.txt file.
  
 
http://www.international-presence.com/images/docs/fileparser/customersscan.png
 
http://www.international-presence.com/images/docs/fileparser/customersscan.png
 +
 +
On the Error Handling tab we can decide on what action to take when the field cannot be found.
 +
 +
In this case we are looking for a County text that is not there,
 +
 +
We can get Presence to throw an exception, continue and ignore or add the column but set it to be empty.
  
 
http://www.international-presence.com/images/docs/fileparser/customersexceptions.png
 
http://www.international-presence.com/images/docs/fileparser/customersexceptions.png
 +
 +
Finally when we run our test we get the following:-
  
 
http://www.international-presence.com/images/docs/fileparser/customersresults.png
 
http://www.international-presence.com/images/docs/fileparser/customersresults.png

Revision as of 11:50, 15 November 2010

http://www.international-presence.com/images/docs/fileparser/fileparser_header.png

parse File Action Node

The purpose of this node is to convert data found within a file into a dataset.

The files that we can convert are:-

  • RTF Table
  • RTF
  • Text
  • HTML
  • HTML Table
  • CSV Table

Parsing Text - An Example

Given the text below, lets say we want to extract specific data from it into a table:-

http://www.international-presence.com/images/docs/fileparser/customer.png

For the OrderID, we have selected 5 characters after "OrderID=" and set the repeat to true (as there will be more than one).

We do the same for the CutomerID but without the repeat. A regular expression is then generated for this search.

We then set the Telelphone column to be the characters between Tel= and \r\n (a new line).

Finally we add a search for County, although you will notice that there is no County text in the Customer.txt file.

http://www.international-presence.com/images/docs/fileparser/customersscan.png

On the Error Handling tab we can decide on what action to take when the field cannot be found.

In this case we are looking for a County text that is not there,

We can get Presence to throw an exception, continue and ignore or add the column but set it to be empty.

http://www.international-presence.com/images/docs/fileparser/customersexceptions.png

Finally when we run our test we get the following:-

http://www.international-presence.com/images/docs/fileparser/customersresults.png

Parsing HTML, CSV or RTF Tables - An Example

If your data is in a table format then it will be easier to parse as the data is already separated into cells.

Once we have chosen the type of file we are going to parse, and the url we can hit populate.

This pre-parses the table in the file and adds a list of available cells to the table.

You should also see that the dropdown box in the Search1 cell now contains all the available cells.

You then probably need to delete the rows for the cells you aren't interested in, along with the cells that contain data as opposed to the cells that contain fields.

We can then get the parser to search for a cell by giving it either a location, such as pt(0,0) or text to search for, such as TEAM NAME.

When it finds this cell we can then return:-

Cell Above - The cell above this cell will be added to the datatable.

Cell Right - The cell to the right of this cell will be added to the datatable.

Cell Below - The cell below this cell will be added to the datatable.

Cell Left - The cell to the left of this cell will be added to the datatable.

Chars At Cell - This cell itself will be added to the datatable.

Given the the rtf file we have below, let us say we want to extract the 2 Contact Telephone Numbers, the Team Name and the Team Location.

http://www.international-presence.com/images/docs/fileparser/rtffile.png

We would set up our scan criteria as follows:-

http://www.international-presence.com/images/docs/fileparser/fileparserscan.png

Notice that "Repeat" is selected for "telephone Numbers"

This is because we know there is more than one telephone number and so we'd like Presence to keep returning them unitl it runs out.

By editing the repeat options, we can relate each telephone number to a new row in the datatable or a new column (in the form of TEL_1,TEL_2 etc).

http://www.international-presence.com/images/docs/fileparser/fileparseroutput.png

Here are the results

http://www.international-presence.com/images/docs/fileparser/fileparserresults.png

Explanation of Search Type options

  • Cell Above - Searches for the cell defined in Search1, then returns the data in the cell above it.
  • Cell Right - Searches for the cell defined in Search1, then returns the data in the cell to the right of it.
  • Cell Below - Searches for the cell defined in Search1, then returns the data in the cell below it.
  • Cell Left - Searches for the cell defined in Search1, then returns the data in the cell to the left of it.
  • Chars At Cell - Searches for the cell defined in Search1, then returns the data in it.
  • Chars Before - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
  • Chars After - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
  • Chars Between - This will create a regular expression that will return the character between the strings specified in Search1 and Search2
  • Chars At Positions - This will return the characters from the index specified in chars1 to the index specified in chars2.

Regular Expressions

In the case of Chars Before,Chars After and Chars Between a regular expression is created in the "Regular Expression Column"

Presence then uses the text in this "Regular Expression" column to search the file.

If you are familiar with regular expressions then you can edit this text for a more specfic search.

There is also a button by the regular expression options, that allows you to enable/disable the following:-

CASE_INSENSITIVE Enables case-insensitive matching so Telephone would match with TELephone when checked.

MULTILINE This allows the search to continue over multiple lines.

DOTALL The regular expression '.' matches any character except a line terminator unless the DOTALL flag is specified.