Home
Categories
Dictionary
Download
Project Details
Changes Log
What Links Here
License

Configuration



The configuration of the comparison is specified in an XML file which defines:
  • On which condition two XML nodes can be considered as representing the same element
  • On which condition two XML nodes can be considered as updated

Note that it is possible to not specify a configuration for the comparison. In this case any different content for a node will be considered as a different node.

Grammar

The schema of the file is nodeRules.xsd

File overview

The file defines rule elements which define for each rule:
  • On which XML nodes the rule is applied
  • What are the attributes of the node which will be used to specify the node identity
  • If there are attributes which only represent text descriptions
  • If there are attributes for which differences are not takne into account
  • If the ordering of these nodes is significant in the comparison

Comparison rules

The comparison is based on rules.
  • The rule to apply for the comparison is defined depending on the name of the XML node
  • If no rules can be applied on a node, then a default rule will be applied

rules
Two nodes which don't have the same node name will be considered as different regardless of the rule.

Root of the configuration file

The default rule specifies how nodes will be compared if no rule can be applied.

There will always be a default rule even if no defaultRule element is defined. For example here a default rule is defined, for which nodes will be found if their node names are identical:
   <nodeRules defaultComparisonMode="SameNodeName">
   </nodeRules>
The most simple way to define the default rule is through the attributes of the nodeRules root element of the configuration:
  • "defaultComparisonMode": specifies the default comparison mode between nodes (default is AnyDiffOther). See also comparison mode for more information
  • "orderIsSignificant": specifies if by default the ordering of the nodes is significant (default is true). See also nodes order for more information
  • "excludeCDATA": specifies if by default the CDATA content of nodes is taken into account (default is false). See also excluding CDATA for more information
  • "removeCDATANewLines": specifies if new lines in the CDATA content are removed by default for the comparison (default is false)

Default rule

The default rule is defined as a defaultRule element in the configuration file. It overrides the properties defined at the root of the configuration file.

The attributes of the default rule are:
  • "comparisonMode": specifies the comparison mode between nodes (default is AnyDiffOther). See also comparison mode for more information
  • "orderIsSignificant": specifies if the ordering of the nodes is significant (default is true). See also nodes order for more information
  • "excludeCDATA": specifies if the CDATA content of nodes is taken into account (default is false). See also excluding CDATA for more information
  • "removeCDATANewLines": specifies if new lines in the CDATA content are removed by default for the comparison (default is false)
The children of the default rule are:
  • "identification": specifies the attributes which will identify the node
  • "excludeAttributes": specifies the attributes which are excluded from the comparison (if there are any)
  • "descriptions": specifies the attributes which are considered as descriptions (if there are any)
For example here the identification of a node is specified by the name attribute by default:
   <nodeRules>
      <defaultRule>
         <identification>
            <attribute name="name" />            
         </identification>
      </defaultRule> 
   </nodeRules>

Rule

The rules are defined as a rule element in the configuration file. It overrides the properties defined at the root of the configuration file.

The attributes of a rule are:
  • "comparisonMode": specifies the comparison mode between nodes (default is AnyDiffOther). See also comparison mode for more information
  • "orderIsSignificant": specifies if the ordering of the nodes is significant (default is true). See also nodes order for more information
  • "excludeCDATA": specifies if the CDATA content of nodes is taken into account (default is false). See also excluding CDATA for more information
  • "removeCDATANewLines": specifies if new lines in the CDATA content are removed by default for the comparison (default is false)
The children of a rule are:
  • "extendsDefaultRule" or "extendsRule": specifies if the rule extends another rule
  • "appliesOn": specifies the names of the nodes for which the rule will be applied
  • "identification": specifies the attributes which will identify the node
  • "excludeAttributes": specifies the attributes which are excluded from the comparison (if there are any)
  • "descriptions": specifies the attributes which are considered as descriptions (if there are any)
  • "CDATA": specifies the rules to compare the nodes CDATA (if there is a CDATA content)
For example here the identification of a node with the "MyNode" name is specified by the name attribute:
   <nodeRules>
      <rule>
         <appliesOn>
            <nodeName name="MyNode" />            
         </appliesOn>         
         <identification>
            <attribute name="name" />            
         </identification>
      </rule> 
   </nodeRules>

Rules specification

Rule extension

The extendsDefaultRule element specifies if the rule extends the default rule.

For example here the rule extends the default rule by setting the order of nodes as not significant for the comparison:
   <nodeRules>
      <defaultRule>
         <identification>
            <attribute name="name" />            
         </identification>
      </defaultRule>
      <rule orderIsSignificant="false">
         <extendsDefaultRule />
         <appliesOn>
            <nodeName name="MyNode" />
         </appliesOn>  
      </rule>   
   </nodeRules>
The extendsRule element specifies if the rule extends another rule.

Rule node list

The appliesOn element specifies on which node names the rule is applied.

For example:
   <nodeRules>
      <rule>
         <appliesOn>
            <nodeName name="MyNode" />            
            <nodeName name="OtherNode" />                        
         </appliesOn>         
         <identification>
            <attribute name="name" />            
         </identification>
      </rule> 
   </nodeRules>

Nodes identification

The identification element specifies the attributes which define the node identification.

For example here name and id define the node identification for nodes which have the node name MyNode:
   <nodeRules>
      <rule>
         <appliesOn>
            <nodeName name="MyNode" />                               
         </appliesOn>         
         <identification>
            <attribute name="name" />    
            <attribute name="id" />           
         </identification>
      </rule> 
   </nodeRules>

If the identification is defined, the comparison mode will be considered as "OnAttributes" regardless of its specified value. if it is not defined, the nodes comparison will use the comparisonMode property value.

Excluded attributes

The excludeAttributes element specifies the attributes which will be excluded from the comparison.

For example here the value attribute is escluded from the description element:
   <rule>
      <appliesOn>
         <nodeName name="description" />
      </appliesOn>  
      <excludeAttributes>
         <attribute name="value" />    
      </excludeAttributes>
   </rule>

Description attributes

Main Article: Description attributes

The descriptions element specifies attributes which will be considered as descriptions. The comparison of these attributes can be a little more lax than the comparison for other attributes.

The attributes of this element are:
  • "removeNewLines": true if new lines must be removed for the comparison (false by default)
  • "trimType": specifies if the content must be right trimmed ("TrimRight"), left trimmed ("TrimLeft"), trimmed at the left and right ("Trim"), or not trimmed at all ("No") (it is "No" by default)
This element can have the following children:
  • The applyRegex children (optional) specify a regex which an be applied to the attribute value. More than one regex replacement can be defined
  • The description child (mandatory) specify the attribute of the element which will be considered as a description
For example:
   <rule>
      <appliesOn>
         <nodeName name="description" />
      </appliesOn>  
      <descriptions trimType="Trim">
         <applyRegex replaceFrom="\s+" replaceTo=" " />
         <applyRegex replaceFrom="&#10;" replaceTo="&#xA;" />            
         <description name="value" /> 
      </descriptions>
   </rule>

CDATA content

Main Article: CDATA content

The CDATA element specifies how the CDATA content will be compared.

The attributes of this element are:
  • "excludeCDATA": specifies if the CDATA content of nodes is taken into account (default is false). See also excluding CDATA for more information
  • "removeNewLines": true if new lines must be removed for the comparison (false by default)
  • "trimType": specifies if the content must be right trimmed ("TrimRight"), left trimmed ("TrimLeft"), trimmed at the left and right ("Trim"), or not trimmed at all ("No") (it is "No" by default)
This element can have the following children:
  • The applyRegex children (optional) specify a regex which an be applied to the CDATA content. More than one regex replacement can be defined
For example:
   <nodeRules>
      <defaultRule>
         <identification>
            <attribute name="name" />
         </identification>   
         <CDATA trimType="TrimRight">
            <applyRegex replaceFrom="\s+" replaceTo=" " />
         </CDATA>       
      </defaultRule>  
   </nodeRules>

Rules properties

Comparison mode

The comparisonMode property specifies the comparison mode. It can have the following values:
  • "AnyDiffOther": any differences between two nodes (node name or attributes values) will lead to considering that the two nodes are different
  • "SameNodeName": only the node name is taken into account to detect if the nodes are different. If two node names have the same node name but different attribute values, they will be considered updated
  • "OnAttributes": the list of identity attributes of the nodes will be used to detect if the nodes are the same or they are different nodes
For example, suppose the following nodes at the left and at the right:
<theNode name="titi"/>
<theNode name="toto"/>
They will be considered as:
  • Different nodes if the comparisonMode property is set to "AnyDiffOther"
  • The same nodes if the comparisonMode property is set to "SameNodeName". Note that in this case the nodes will still be considered as updated, except if the name attribute is escluded from the comparison

Nodes order

The orderIsSignificant property specifies if two nodes which represent the same element are considered as updated if they are in a different index in their parent.

For example here the child_1 node does not have the same index at the left and at the right:
  • root
    • parent
      • child_1
      • child_2
  • root
    • parent
      • child_2
      • child_1
  • If the orderIsSignificant property is true, they will be considered as updated
  • If the orderIsSignificant property is false, they will be considered as identical

Excluding CDATA

The excludeCDATA property specifies if the CDATA content of nodes is used for the comparison. It will not be used if its value is true.

This attribute can be specified tiher:
  • As an attribute at the rule level
  • As an attribute at the CDATA level
This definition:
   <nodeRules>
      <defaultRule excludeCDATA="true">
         <identification>
            <attribute name="name" />
         </identification>   
      </defaultRule>  
   </nodeRules>
is equivalent to:
   <nodeRules>
      <defaultRule>
         <identification>
            <attribute name="name" />
         </identification>   
         <CDATA excludeCDATA="true" />
      </defaultRule>  
   </nodeRules>

See also


Categories: core

xmldiff Copyright (c) 2024 Herve Girod. All rights reserved.