Data Extraction, Hands-On Example
FTForm has two operating sections:
- The first builds an XML file from existing input, normally Print Line files direct from your ERP or other systems. If your system outputs its own XML file, ignore this tutorial.
- The second formats from the XML input, and Form Building Tutorial covers this.
The remainder of this tutorial shows how to generate an XML output from your
input, and for this discussion we'll use a sample standard Purchase Order from
QAD. Please review what we're starting with here, QAD
PO with Notes, then load FTForm Plus. We'll turn this input
document into XML, in the process learning how to define Data Extraction.
Housekeeping: Please load QAD PO.zip
file and unzip it where you can access the tutorial files.
This is the initial screen, from this select New (where shown), then
set the Document Name to QAD PO Example, set Input as
shown and select Data file as QAD-PO.txt from the unzipped
QAD-PO.zip.
This shows Page 1 of the two page original file with most variations.
Always check with an expert system user to make sure you have all data
represented when defining XML from system data. The most common mistake with
FormTrap forms is running into data that it has no knowledge of.
Define the master first, in the area which you'll see open over the
text.
Shift the pink boundaries so it covers the entire header. In this
example, there is a Header Comment line, so highlight to the end of Remarks
(two lines down from the Remarks:
literal - see "with notes") and pull the top down to the first line ( P U R C H A S E O R D E R).
We need to "recognize" this page, so identify what makes it unique.
We'll use "P U R C H A S E"
(identifies this document), "Page: 1"
(Identifies this as the first page) and "Remarks:"
(confirms the document) to identify this page. Select New Field (see below)
and drag the I-Bar over text as shown.
This window opens, in which you give the field a Name.
The named text shows with it's name overhead.
Right click on the field, and select as shown to make this a "Rule". This rule
requires an "Exact match" with the highlighted field.
Rule fields show in Green rather than yellow and all three Rules need to be
true to recognize this area (i.e. master area).
Double-clicking on a field shows its properties, right-click and
selecting Properties of area allows you to see fields and rules for
this area (see later).
Define all of the other fields on the header and see below for the field list
and the finished header record.
You can "lock into" field creation by pressing the I-Bar icon in the Field group. Press again to
unlock from field creation.
Now test this using the Test
button at top right, you should see this:
OK, draw breath, make a cup of coffee, SAVE THIS PROJECT and let's
continue ...
Now we need to define what we DO NOT want. These are typical:
- First page detail heading (which comes after any Header Comments).
- Second and subsequent page headers.
- Inter-page connectors, typically Continued at the end of Page 1 and CONT or similar, commonly with a product code on Page 2 (not in the data file, ignore for this tutorial).
FormTrap allows deletion of redundant lines before processing the remainder of
the page. We'll define the subsequent page header(s) first.
Show Page 2 by pressing Show next page.
Select redundant area from Area, New pull down list.
Select down to and excluding the Detail Line Header, and set the
rules (these are similar to master, except the rule on Page Number is
NOT Page: 1 (change the last character of Match (evaluate what)
to 1 and tick Apply NOT to result of matching).
Note: Areas are normally placed on top of each other, with one "active"
at a time. This is difficult to work with, toggle to an active only
area view.
Remove the redundant area by pressing Normalize text icon .
Note that you can bring them back with View original text icon (in case you need to make
changes later). Press this, then return to Normalized view.
What else to get rid of? The Detail Heading on both pages is redundant
(and indistinguishable from Comments), so get rid of this as well:
Now we'll define Totals. These are part of the Master, so press New
(in Area), and select master record area. Move down the page and
select from the total comment line down to the "Authorized .." line. Now
define and add rules for the total area (use Non-Taxable and Authorized)
and define the other fields.
We'll explore the output field construction. Creat a new field for Taxable Amount and either pause after selecting the area or press Mouse Right and Properties of selected field. These two prompts "massage" the field before it is output:
- Normalize selects a RegEx expression that operates on the field. These can be modified, we'll stick with the ones provided during this tutorial. Numeric fields normally use Trim numeric, Date fields normally use None and Text fields may optionally use Trim. A description how your selection operates on the data is below Normalize.
- Parser selects how the field is parsed (inspected). Failure to parse stops FormTrap execution, so this MUST be correct.
This is the completed Total area, of Master.
Do the test again, you should see this, with all of the Trailer fields
included as Master fields:
Product Records and Areas
Next we'll generate the Product Area, so select New (area) and detail
record area (new detail record), and define just the test fields as a
date Slash and the Decimal Points as tests.
This defines the Rules for the Product ...
... and here is the Product record with all fields defined.
These icons indicate how the field is parsed and formatted for output. You
can review above (see Total record) or see here.
Look at the "with notes" file, we have up to four (and down to none) lines following Product. Here is the definition for the first (Revision) line, comprising one rule, on the literal "Revision:" with just the two fields:
Define the site line, optional line, with two fields, one as a rule testing
for Site:.
Define the Supplier line, optional line, with two fields, one as a rule
testing for Supplier Item:.
Here is the Manufacturer record, last of the optional lines and showing all
records so far. It has one rule and two additional fields.
The next line required to define a product is the product name, "KICKPLATE
...". This record is always present and is defined with rule
"not empty" on the single field.
Following this line may be optional comment lines, two tests, one for blanks
and the other for not empty.
Comments are to be treated as one long paragraph, so the individual lines are stored in the one output field name with a space between what were individual lines. See the difference between Prod Name and Product Comment as below under the Options: and Field merging: prompts.
This is the XML for the first and third products in the file:
A comparison at this point with Version 7 FTDesign is illustrative, it took
16 different structures to define this situation (4 optional lines, zero to 4
present), while here it takes just the one structure, with each element
defined just once, including Product Comments which can now be line wrapped if
required. The very superior efficiency of FTForm Plus data definition is
obvious. FTForm Plus also identifies floating totals, document comments and
similar structure easily.
Finally, we require the Header Comment defined, this is a master record
area and is recognized by testing the comment area for non-blank.
Ticking Header comment on master record stops looking for Header
Comment lines once the first of the detail lines is found.
The comment is defined similar to Product Comments as "Repetitive" and
"Space separated", meaning additional comment lines are added to a long
block of text.
This is the Header Comment record defined, as part of the final XML file
produced ...
... and the XML component of the file (in the master record):
View the output XML file for this tutorial by looking at the unzipped file xml_tutorial_output.txt.
A few comments on what you can do with RegEx
Regular Expressions are used as the expression logic in FormTrap due to its power to isolate anything you might require and for standards of its documentation.
The common Regular Expressions are included in the list shown by the Wizard, and you are free to add your own expressions to this list. Some expressions contain FIXME as a placeholder for "Equal to", "Not equal to", "Contains" and "Does not contain" and others. For these, please select, then change FIXME to whatever you want to test for (or not for).
Expression you may want to add are for dual purpose lines, such as these (from Invoices that are also Credits). To trap BOTH kind of totals, we use this as the RegEx.
Documentation of RegEx is on the web at this address: http://regular-expressions.info/
Support is not available under our Support and Upgrades contract with customers. If you have issues with RegEx and/or require new RegEx expressions made for you we will do this at our then current consulting rates, in 15 minutes increments (current rates for calendar 2018 are $50 / 15 minutes in USD, in AUD $80/15 minites incl. G.S.T.).