If a user wants to use a third-party Tool for extracting additional features he needs to have the Adaptor for it that shall be a part of the Plugin for veraPDF software. After the Plugin is loaded by veraPDF software the Tool becomes available for the Features Reporter through the Extractor interface.
A Plugin is represented by a
.jar file with the Extractor class definition. This Extractor class is an extension of the FeaturesExtractor - the base class defining the interfaces for the Features Reporter.
The veraPDF software loads the Plugins on startup. It uses plugins.xml file for enabling and configuring the plugins. That config file should be placed near the app config file
.../verapdf/config/plugins.xml. The plugins config xml file should contain a root element
pluginsConfig. That root element contains child elements
plugin. Each of plugin element refers to one plugin. It can contain five children and one attribute. The attribute’s name is enabled and its value should be a
boolean true|false value. If this value is true, then the specified plugin will be used in features collection. The child elements of the plugin element are listed in the following table:
|name||The name of the plugin|
|version||The version of the plugin|
|description||The description of the plugin|
|pluginJar||A URL that specifies the plugin’s .jar file|
|attributes||This element contain a number of child elements attribute. Each of child element should contain two attributes: key and value. The values of that attributes will be used for passing to the plugins attributes map which plugin can use for additional purposes (specifying the path to some external binary, additional configurations for the plugins output and etc.)|
Example plugin configuration file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <pluginsConfig> <plugin enabled="true"> <name>Font Sample</name> <version>1.0</version> <description>The font sample plugin</description> <pluginJar>file:///home/userName/verapdf/plugins/fontSample.jar</pluginJar> </plugin> <plugin enabled="false"> <name>iccdump</name> <version>1.0</version> <description>Collecting icc profile’s features with Argyll iccdump</description> <pluginJar>file:///home/userName/verapdf/plugins/iccdump.jar</pluginJar> <attributes> <attribute key="cliPath" value="/home/userName/verapdf/plugins/iccdump"/> </attributes> </plugin> </pluginsConfig>
.jar file must contain a single Extractor implementation. If
.jar file contains more than one Extractor implementation the Plugin will not be loaded and the error will be logged. The created Extractor is registered in the Features Reporter.
When the Features Reporter collects features of a PDF object it also checks if there are any registered Extractors available for the object type. In this case the Reporter creates the dataset describing the object and passes it to each registered Extractor as argument for the method to get the custom features. Extractor transforms the dataset into the form that can be understood by the Tool and requests the Tool to process it. The Tool reports the processing results which are converted by the Extractor to the custom features description. The returned custom features are added to the Features Report.
This section describes:
- The expected format of the features reported by a Plugin
- The details of the Extractor implementation provided by a Plugin
- The classes and interfaces available for a Plugin developer
Custom features report structure
The additional features reported by third-party Tools are listed in the PDF Features Report together with the default set of features reported by veraPDF software. The
customFeatures element is automatically added to the features list of a PDF object whenever there is an Extractor registered in the Features Reporter for the corresponding object type.
The veraPDF software expects that Extractor returns the list of elements describing the custom features. They will be added as child elements to the
pluginFeatures element. This element is automatically added to the element
customFeatures as soon as the Extractor returns the custom features list.
For example, a Plugin defines the Extractor that for some specific ICC profile (object with type ICCPROFILE) returns two elements describing the custom features. The element names are
theCustomFeature2. The element values are
theFeatureValue2 accordingly. In this case in the Features Report the element
iccProfile for the ICC profile object will have the additional element
customFeatures with the following content:
<iccProfile id="someID"> ... <customFeatures> <pluginFeatures description="This plugin reports the features of the ICC profiles" name="plugin name" version=”1.0”> <theCustomFeature1>theFeatureValue1</theCustomFeature1> <theCustomFeature2>theFeatureValue2</theCustomFeature2> </pluginFeatures> </customFeatures> </iccProfile>
customFeatures element may contain many
pluginFeatures elements if there are multiple Extractors registered for this object type. The name and version attributes identify the Plugin that was used to generate the features list. The attributes contains the value obtained from the plugins config file.
Extractor is created from the class definition provided by the Plugin. Extractor is the extension of the base FeaturesExtractor class.
The base FeaturesExtractor class is private. However there are several Extractor class prototypes (abstract Extractor classes) which extend the base class. They are public so the user shall extend them to integrate with the Features Reporter.
There may be existing Extractors classes available that already adapt some third-party Tools. If none of existing Extractor classes can be used for some specific Tool the user needs to create and compile an additional extension of some abstract or existing Extractor class.
Extractor classes must have the empty constructor. If Extractor class uses other libraries and frameworks the user needs to make sure they are available for veraPDF software at the moment the Plugin is loaded.
The extractor can get all the necessary configuration attributes using the method that is implemented in FeaturesExtractor class:
|getAttributes()||Returns the map which contains all the attributes that were set in the plugins config file for specified plugin|
Each abstract Extractor defines the main method that is used by Features Reporter to get the custom features. This method must be implemented in any Extractor class definition provided by a Plugin. The name of the method depends on the PDF object type this Extractor supports. The FeaturesData object is passed to this method as argument. This object is the dataset that provides the information about the PDF object being processed.
As the result the method shall return a list of FeaturesTreeNode objects. Each of these objects is a root of a tree describing the custom features.
The method normally implements the following steps:
- Transform FeaturesData object into the input data for some CLI, dynamic library, web service or other type of Tool (for example, save data to temporary files in a specific format)
- Trigger the Tool to process it (for example, start CLI with corresponding arguments that include paths to temporary files with the input data)
- Transform the output of the Tool into the list of FeaturesTreeNode objects (for example, parse the CLI output XML file and add the required information as a child nodes of some root FeaturesTreeNode object)
- Return the generated list of FeaturesTreeNode objects to the Features Reporter