Difference between revisions of "Parse File Action"

From PresenceWiki
Jump to: navigation, search
(Explanation of Search Type options)
(Explanation of Search Type options)
Line 92: Line 92:
 
  In the case of <b>Chars Before</b>,<b>Chars After</b> and <b>Chars Between</b> a regular expression is created in the "Regular Expression Column"
 
  In the case of <b>Chars Before</b>,<b>Chars After</b> and <b>Chars Between</b> a regular expression is created in the "Regular Expression Column"
 
   
 
   
  It is only this coulmn that ormation that is contained between '<' & '>' tags.
+
  It is only this column that ormation that is contained between '<' & '>' tags.
 
   
 
   
 
  >text< Content only Search - Will only search for data that is outside the normal Code tags for internet pages.
 
  >text< Content only Search - Will only search for data that is outside the normal Code tags for internet pages.

Revision as of 13:12, 12 November 2010

http://www.international-presence.com/images/docs/fileparser/fileparser_header.png

parse File Action Node

The purpose of this node is to convert data found within a file into a dataset.

The files that we can convert are:-

  • RTF Table
  • RTF
  • Text
  • HTML
  • HTML Table
  • CSV Table
Parsing HTML or RTF Tables

Once we have chosen the type of file we are going to parse, and the url we can hit populate.

This pre parses the table in the file and adds a list of available cells to the table.

You should also see that the dropdown box in the Search1 cell now contains all the available cells.

You then probably need to delete the rows for the cells you aren't interested in, along with the cells that contain data as opposed to the cells that contain fields.

We can then get the parser to search for a cell by giving it either a location, such as pt(0,0) or text to search for, such as TEAM NAME.

When it finds this cell it will then return:-

Cell Above

Cell Right

Cell Below

Cell Left

Chars At Cell

Given the the rtf file we have below, let us say we want to extract the 2 Contact Telephone Numbers, the Team Name and the Team Location.

http://www.international-presence.com/images/docs/fileparser/rtffile.png

http://www.international-presence.com/images/docs/fileparser/fileparserscan.png

Notice that "Repeat" is selected for "telephone Numbers"

This is because we know there is more than one telephone number and so we'd like Presence to keep returning them.

By editing the repeat options, we can relate each telephone number to a new row in the datatable or a new column (in the form of TEL_1,TEL_2 etc).

http://www.international-presence.com/images/docs/fileparser/fileparseroutput.png

Here are the results

http://www.international-presence.com/images/docs/fileparser/fileparserresults.png

Chars Before

Chars After

Chars Between

Chars At Positions

Explanation of Search Type options
  • Cell Above - Searches for the cell defined in Search1, then returns the data in the cell above it.
  • Cell Right - Searches for the cell defined in Search1, then returns the data in the cell to the right of it.
  • Cell Below - Searches for the cell defined in Search1, then returns the data in the cell below it.
  • Cell Left - Searches for the cell defined in Search1, then returns the data in the cell to the left of it.
  • Chars At Cell - Searches for the cell defined in Search1, then returns the data in it.
  • Chars Before - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
  • Chars After - This will create a regular expression that will return a number of characters specified in Chars1 before the search string
  • Chars Between - This will create a regular expression that will return the character between the strings specified in Search1 and Search2
  • Chars At Positions - This will return the characters from the index specified in chars1 to the index specified in chars2.


Regular Expressions

In the case of Chars Before,Chars After and Chars Between a regular expression is created in the "Regular Expression Column"

It is only this column that ormation that is contained between '<' & '>' tags.

>text< Content only Search - Will only search for data that is outside the normal Code tags for internet pages.