Direct use

DataSift provides a set of simple predefined validations and transformations which can be used in a direct manner, without configuration at all. This is the objective of the DataSiftValidationUtils and DataSiftTransformationUtils classes. For example:

      
boolean isInteger = DataSiftValidationUtils.stringIsInteger("-1232");
Integer myInteger = DataSiftTransformationUtils.stringAsInteger("-1232");
      

Note, however, that this way of using DataSift makes very little use of its power, and could even introduce a certain overhead in your application if your needs are reduced to this kind of occasional and simplistic data validation.

Normal use

Normal use of DataSift would mean instantiating an org.auelproject.datasift.DataSift object, adding specs, resolvers, validators or transfomers to it if necessary (either via configuration files or API calls) and using the configured instance to validate or transform the data we want (the target objects).

...
DataSift dataSift = new DataSift();
ValidationResult validationResult = dataSift.validate(myObject,"ValidationSpec");
if (validationResult.isSuccessful()) {
    ...
} else {
    ...
}
...
      

With this ValidationResult object, we can:

  • Know whether the validation has been completely successful or not, by calling isSuccessful().


  • If the validation has not been completely successful, we can know which validation items have been successful and which ones have not by calling the getSuccessfulValidationNames() and getFailedValidationNames() methods.


  • Get the map of validation results, a java.util.Map object which will have the names of the spec validator items as keys and a Boolean object as value for each entry.


...
DataSift dataSift = new DataSift();
TransformationResult transformationResult = dataSift.transform(myObject,"TransformationSpec");
if (transformationResult.isSuccessful()) {
    Map resultsByItem = transformationResult.getResults();
    ...
} else {
    ...
}
...
      

And with this TransformationResult object, we can:

  • Know whether the transformation has been completely successful or not, by calling isSuccessful().


  • If the transformation has not been completely successful, we can know which transformation items have been successful and which ones have not by calling the getSuccessfulTransformationNames() and getFailedTransformationNames() methods.


  • Get the map of transformation results, a java.util.Map object which will have the names of the spec transformation items as keys and the results of the transformations as values. If a transformation item has failed, the value in the map entry will be the generated TransformationNotPossibleException.


Process definitions: Specs

DataSift defines specs to determine the way in which an object will be processed during validation or transformation operations. These specs are structures with the following attributes:

  • (Optionally) A name which will identify the spec.


  • A resolver: This resolver will determine the way in which the data will be accessed in the object being processed (e.g. use the predefined BeanResolver to access the fields of a JavaBean). This resolver can optionally take a number of configuration parameters.


  • A number of processing items: Each of which will define a validation or transformation (depending on the spec type) operation to be performed on the target object. Each item declares:


    • Its name.


    • The validator (or transformer) to be applied (more on this later).


    • A number of configuration parameters for the selected validator, if they are needed.


    • A number of data parameters on which the validation will be applied. These parameters will be given a name and defined with a selector, which the resolver will use to find the data (for example, a selector "name" passed to the bean resolver will result in looking for a getName() method on the target object).


    • (Optionally) the name of its iterator item. This is meant for items that are nested into another item (an iterator). The iterator will trigger its nested validations or transformations on every item of the iteration target (array, Collection, Map...).


Both validation or transformation specs can be programmatically built by instantiating an org.auelproject.datasift.config.SpecConfig object, or declaratively in XML like this:

      
<validation-spec resolver="BeanResolver">
    
  <validation name="loginLength" validator="StringLength">
    <config name="acceptNull" value="false"/>
    <config name="minLength" value="4"/>
    <config name="maxLength" value="15"/>
    <data name="data" selector="login"/>
  </validation>
    
  <validation name="nameNotNull" validator="StringNotEmpty">
    <data name="data" selector="name"/>
  </validation>
    
  <validation name="ageNumber" validator="StringIsInteger">
    <data name="data" selector="age"/>
  </validation>
  
  <validation name="childrenAgesIter" validator="ArrayIterator">
    <config name="successCondition" value="AND"/>
    <data name="data" selector="childrenAges"/>
  </validation>
  
  <validation name="childreAgeIsInteger" validator="StringIsInteger" iterator="childrenAgesIter">
    <data name="data" selector="value"/>
  </validation>
      
</validation-spec>
      

Specs can be declared:

  • In the default specs file datasift-specs.xml.


  • In any other user defined specs file, which the framework will read on demand.


  • In a user defined single spec file, which the framework will try to load automatically.


  • Programmatically, adding a spec to the framework from the application code.


Default specs file: datasift-specs.xml

If, on initialization, DataSift finds a file called datasift-specs.xml in its classpath, the specs it may contain will be loaded and made available for any DataSift instance created from then on.

The format of the datasift-specs.xml file is like this (see also the examples bundled with the software):


<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE datasift-specs PUBLIC
    "-//AuelProject Datasift//DTD Datasift Specs Configuration 1.0//EN"
    "http://www.datasift.org/dtds/datasift-specs_1_0.dtd">

<datasift-specs>

  <validation-spec name="[anyName]" resolver="[anyResolver]">
  
    <resolver-config>
      <config name="[anyResolverConfigParam]" value="[anyValue]"/>
      ...
    </resolver-config>
    
    <validation name="[anyName]" validator="[anyValidator]">
      <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
      <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
      ...
      <data name="[anyName]" selector="[anyField]"/>
      ...
    </validation>
    
    <validation name="[anyName]" validator="[anyValidator]" iterator="[anyIteratorName]">
      <data name="[anyName]" selector="[anyField]"/>
    </validation>
    
    ...
    ...
      
  </validation-spec>
  
  ...
  ...

  <transformation-spec name="[anyName]" resolver="[anyResolver]">
  
    <resolver-config>
      <config name="[anyResolverConfigParam]" value="[anyValue]"/>
      ...
    </resolver-config>
    
    <transformation name="[anyName]" transformer="[anyTransformer]">
      <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
      <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
      ...
      <data name="data" selector="[anyField]"/>
      ...
    </transformation>
    
    <transformation name="[anyName]" transformer="[anyTransformer]" iterator="[anyIteratorName]">
      <data name="[anyName]" selector="[anyField]"/>
    </transformation>
      
  </transformation-spec>
  
  ...
  ...

</datasift-specs>
        

Loading user specs files on demand

At any moment, the user may ask a DataSift instance to load and parse (from the application classpath) a specs file created by him/her with the same structure as the default specs file. This will be done with:

        
DataSift dataSift = new DataSift();
dataSift.configureSpecsFromFile("mySpecsFile.xml");
        

...which will result in the specs defined in that file being registered and made accessible for any further use of this DataSift instance.

Automatically loading single spec files

Besides registering specs via the datasift-specs.xml or any other files defined by the user, when DataSift is asked to use a spec that it has not previously registered, it will try to load it from a single-spec file in the classpath with the same name of the requested spec. For example, if we executed the following code:

        
DataSift dataSift = new DataSift();
ValidationResult result = dataSift.validate(myObject,"ANewSpec");
        

...and the spec called "ANewSpec" had not been previously registered in that DataSift instance (in the default file, in this case), the framework would try to load a file called ANewSpec.dsv.xml from the classpath, parse it, and register the validation spec it contains under the name "AnewSpec" for further use.

The file extension for automatically loaded files is .dsv.xml for validation spec files, and .dst.xml for the transformation spec ones.

These single spec files have a very similar format to the previously seen, except for the lack of a spec name (the file name, without the extension will be used) and the fact that they can contain only one validation or transformation spec per file. This is the format for a single validation spec file:

        
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE validation-spec PUBLIC
    "-//AuelProject Datasift//DTD Datasift Validation Spec Configuration 1.0//EN"
    "http://www.datasift.org/dtds/datasift-validation-spec_1_0.dtd">

<validation-spec resolver="[anyResolver]">
  
  <resolver-config>
    <config name="[anyResolverConfigParam]" value="[anyValue]"/>
    ...
  </resolver-config>
    
  <validation name="[anyName]" validator="[anyValidator]">
    <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
    <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
    ...
    <data name="[anyName]" selector="[anyField]"/>
  </validation>
    
  <validation name="[anyName]" validator="[anyValidator]" iterator="[anyIteratorName]">
    <data name="[anyName]" selector="[anyField]"/>
  </validation>
      
</validation-spec>
        

...and for the single transformation spec files:

        
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE transformation-spec PUBLIC
    "-//AuelProject Datasift//DTD Datasift Transformation Spec Configuration 1.0//EN"
    "http://www.datasift.org/dtds/datasift-transformation-spec_1_0.dtd">

<transformation-spec resolver="[anyResolver]">
  
  <resolver-config>
    <config name="[anyResolverConfigParam]" value="[anyValue]"/>
    ...
  </resolver-config>
    
  <transformation name="[anyName]" transformer="[anyTransformer]">
    <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
    <config name="[anyTransformerConfigParam]" value="[anyValue]"/>
    ...
    <data name="data" selector="[anyField]"/>
  </transformation>
    
  <transformation name="[anyName]" transformer="[anyTransformer]" iterator="[anyIteratorName]">
    <data name="[anyName]" selector="[anyField]"/>
  </transformation>
      
</transformation-spec>
        

But automatically loading requested specs is not where this feature ends, as DataSift is able to load specs defined on a by-class manner, this is, specs defined to validate or transform objects of a specific class.

What this means is that, when called without specifying a spec name, DataSift will take the target object, examine its class, and look for a spec defined for that class in an XML file. For example, having the following code:

        
TransformationResult result = dataSift.transform(myObject);
        

...and being myObject an instance of class a.b.c.MyClass, DataSift will automatically look in its classpath for a file called a/b/c/MyClass.dst.xml and, if it exists, will parse it, register its spec and use it to transform myObject. The same will be appliable to validation operations.

Programmatically adding specs to the framework

Users can also add validation or transformation specs to a DataSift instance without configuration files, by creating a SpecConfig object and adding it to the DataSift:

        
SpecConfig specConfig = new SpecConfig();
specConfig.setName("mySpec");
specConfig.setResolver("BeanResolver");
        
SpecItemConfig specItemConfig = new SpecItemConfig();
specItemConfig.setName("uniqueItem");
specItemConfig.setEntityName("StringIsInteger");
specItemConfig.addDataParamConfig(
    EntityUtils.SINGLE_DATA_PARAMETER_NAME, "age");
        
specConfig.addSpecItemConfig(specItemConfig);

dataSift.addValidationSpecConfig(specConfig);
        

Resolvers

Predefined resolvers

DataSift provides a set of predefined resolvers that the developer can use without needing to define them in any configuration file. For example, BeanResolver (to get data from javabeans) or StringResolver (to use String objects as targets).

All the predefined resolvers and their configuration can be found in the reference documentation in the Predefined resolvers page.

Default resolvers configuration file

At startup, DataSift will look in its classpath for a file named datasift-resolvers.xml, which may contain resolver declarations, and load these declarations, making them available to all DataSift instances created from then on.

This default configuration file would look like this:

        
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE datasift-entities PUBLIC
    "-//AuelProject Datasift//DTD Datasift Entities Configuration 1.0//EN"
    "http://www.datasift.org/dtds/datasift-entities_1_0.dtd">

<datasift-entities>

  <resolvers>
  
    <resolver name="[anyName]" className="[anyClassName]">
      <config name="[anyParameter]" value="[anyDefaultValue]">
      <config name="[anyParameter]" value="[anyDefaultValue]">
      ...
    </resolver>      
        
    <resolver name="[anyName]" className="[anyClassName]"/>
        
    <resolver name="[anyName]" className="[anyClassName]"/>
        
  </resolvers>

</datasift-entities>
        

Note that resolvers can receive configuration parameters both here in the resolvers declaration files and later in the specs declaration files. This allows some of the resolver parameters to be defined in a global manner in the resolver declaration files and some others be defined in a per-spec manner in the spec declaration files.

User resolvers configuration files

In a similar way to what happened with specs, the user may ask a DataSift instance in any moment to load a configuration file with the same structure as the default one but other name, being the resolvers declared there registered only in that instance of DataSift.

DataSift dataSift = new DataSift();
dataSift.configureResolversFromFile("myResolversFile.xml");
        

Programmatically adding resolvers

Resolvers can also be programmatically registered in a DataSift instance by creating an object of class org.auelproject.datasift.config.ResolverConfig, in a way similar to:


ResolverConfig resolverConfig = new ResolverConfig();
resolverConfig.setName("MyResolver");
resolverConfig.setClassName("mypackage.MyResolverClass");
resolverConfig.addConfigParam("paramOne","aValue");
resolverConfig.addConfigParam("paramTwo","anotherValue");

dataSift.addResolverConfig(resolverConfig);        
        

Validators and Transformers

Predefined validators and transformers

As happens to resolvers, the framework provides a set of predefined validators and transformers that can be used out-of-the-box, without configuration at all. For example, validators like StringIsInteger (check if a String contains an integer number), StringLength (check if the length of a String is between a minimum and a maximum) or StringNotEmpty. Also transformers like StringAsInteger (transforms a String object into an Integer).

All the predefined validators and transformers and their configurations can be found in the reference documentation in the Predefined validators or Predefined transformers page.

Default configuration files

Validators and transformers can also be declared in XML files, and there are default configuration files that are read (and their entities registered) at startup.

These default files are named datasift-validators.xml for validators and datasift-resolvers.xml for resolvers. The validators default configuration file may would look like this:

        
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE datasift-entities PUBLIC
    "-//AuelProject Datasift//DTD Datasift Entities Configuration 1.0//EN"
    "http://www.datasift.org/dtds/datasift-entities_1_0.dtd">

<datasift-entities>

  <validators>
  
    <validator name="[anyName]" className="[anyClassName]">
      <config name="[anyParameter]" value="[anyDefaultValue]">
      <config name="[anyParameter]" value="[anyDefaultValue]">
      ...
    </validator>
      
    <validator name="[anyName]" className="[anyClassName]"/>
  
    <validator name="[anyName]" className="[anyClassName]"/>
        
  </validators>

</datasift-entities>
        

...and the default transformers file:

        
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE datasift-entities PUBLIC
    "-//AuelProject Datasift//DTD Datasift Entities Configuration 1.0//EN"
    "http://www.datasift.org/dtds/datasift-entities_1_0.dtd">

<datasift-entities>

  <transformers>
  
    <transformer name="[anyName]" className="[anyClassName]">
      <config name="[anyParameter]" value="[anyDefaultValue]">
      <config name="[anyParameter]" value="[anyDefaultValue]">
      ...
    </transformer>
  
    <transformer name="[anyName]" className="[anyClassName]"/>
  
    <transformer name="[anyName]" className="[anyClassName]"/>
        
  </transformers>

</datasift-entities>
        

User configuration files

At any moment the user can ask DataSift to load a file declaring new validators or transformers with:

        
DataSift dataSift = new DataSift();
dataSift.configureValidatorsFromFile("myValidatorsFile.xml");
dataSift.configureTransformersFromFile("myTransformersFile.xml");
        

But these files that register validators and tranformers (in fact, also resolvers) can also be mixed, so that the same file can be used by the developer to declare new resolvers, validators and transformers.

For example, we could create a file called myEntities.xml with the following contents:

        
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE datasift-entities PUBLIC
    "-//AuelProject Datasift//DTD Datasift Entities Configuration 1.0//EN"
    "http://www.datasift.org/dtds/datasift-entities_1_0.dtd">

<datasift-entities>

  <resolvers>
    <resolver name="[anyName]" className="[anyClassName]"/>
    <resolver name="[anyName]" className="[anyClassName]"/>
  </resolvers>

  <validators>
    <validator name="[anyName]" className="[anyClassName]"/>
    <validator name="[anyName]" className="[anyClassName]"/>
  </validators>

  <transformers>
    <transformer name="[anyName]" className="[anyClassName]"/>
    <transformer name="[anyName]" className="[anyClassName]"/>
  </transformer>
  
</datasift-entities>
        

...and then load its entities with:

        
DataSift dataSift = new DataSift();
dataSift.configureResolversFromFile("myEntities.xml");
dataSift.configureValidatorsFromFile("myEntities.xml");
dataSift.configureTransformersFromFile("myEntities.xml");
        

Programmatically adding validators and transformers

Finally, validators and transformers can be added to a DataSift instance the same way as resolvers do. For this, org.auelproject.datasift.config.ValidatorConfig or org.auelproject.datasift.config.TransformerConfig objects should be created and initialized, and later passed to the DataSift.


ValidatorConfig validatorConfig = new ValidatorConfig();
validatorConfig.setName("MyValidator");
validatorConfig.setClassName("mypackage.MyValidatorClass");
validatorConfig.addConfigParam("paramOne","aValue");
validatorConfig.addConfigParam("paramTwo","anotherValue");

dataSift.addValidatorConfig(validatorConfig);
        

TransformerConfig transformerConfig = new TransformerConfig();
transformerConfig.setName("MyTransformer");
transformerConfig.setClassName("mypackage.MyTransformerClass");
transformerConfig.addConfigParam("paramOne","aValue");
transformerConfig.addConfigParam("paramTwo","anotherValue");

dataSift.addTransformerConfig(transformerConfig);