veraPDF CLI Validation
Choosing the right validation profile
Listing built in validation profiles
The veraPDF software comes with predefined sets of rules covering PDF/A and PDF/UA standards. These are known
as validation profiles and there’s one for each level and part of the PDF/A and PDF/UA
specification. You can list them by typing verapdf -l or verapdf.bat --list for Windows users. The -l
and --list
are interchangeable on all platforms.
You’ll be greeted with:
veraPDF supported PDF/A and PDF/UA profiles:
1a - PDF/A-1A validation profile
1b - PDF/A-1B validation profile
2a - PDF/A-2A validation profile
2b - PDF/A-2B validation profile
2u - PDF/A-2U validation profile
3a - PDF/A-3A validation profile
3b - PDF/A-3B validation profile
3u - PDF/A-3U validation profile
4 - PDF/A-4 validation profile
4e - PDF/A-4E validation profile
4f - PDF/A-4F validation profile
ua1 - PDF/UA-1 validation profile
ua2 - PDF/UA-2 validation profile
Choosing a built-in profile
You can specify a built in profile for validation using either the -f
or --flavour
options followed by the 2 character profile code.
For example, to validate a single PDF/A file from the corpus using the PDF/A-1B profile type
verapdf -f 1b corpus/veraPDF-corpus-staging/PDF_A-1b/6.6\ Actions/6.6.1\ General/veraPDF\ test\ suite\ 6-6-1-t02-pass-a.pdf
You should see something very similar to the following output:
<?xml version="1.0" encoding="utf-8"?>
<report>
<buildInformation>
<releaseDetails id="core" version="1.26.1" buildDate="2023-01-10T02:34:00Z"></releaseDetails>
<releaseDetails id="gui" version="1.26.1" buildDate="2023-01-13T11:30:00Z"></releaseDetails>
<releaseDetails id="validation-model" version="1.26.1" buildDate="2023-01-10T02:39:00Z"></releaseDetails>
</buildInformation>
<jobs>
<job>
<item size="10230">
<name>/home/cfw/verapdf/dev/corpus/veraPDF-corpus-staging/PDF_A-1b/6.6 Actions/6.6.1 General/veraPDF test suite 6-6-1-t02-pass-a.pdf</name>
</item>
<validationReport profileName="PDF/A-1B validation profile" statement="PDF file is compliant with Validation Profile requirements." isCompliant="true">
<details passedRules="102" failedRules="0" passedChecks="504" failedChecks="0"></details>
</validationReport>
<duration start="1485134290404" finish="1485134290797">00:00:00:393</duration>
</job>
</jobs>
<batchSummary totalJobs="1" failedToParse="0" encrypted="0" outOfMemory="0" veraExceptions="0">
<validationReports compliant="1" nonCompliant="0" failedJobs="0">1</validationReports>
<featureReports failedJobs="0">0</featureReports>
<repairReports failedJobs="0">0</repairReports>
<duration start="1485134290384" finish="1485134290846">00:00:00:462</duration>
</batchSummary>
</report>
This tells us that the file is valid through the <validationReport isCompliant="true">
attribute.
Letting veraPDF control the profile choice
Specifying a particular profile is useful if you’re expecting all of your PDF/A or PDF/UA documents to conform to a particular specification. In real word use you may not have the luxury of been able to decide on a single validation profile.
It’s possible to tell the veraPDF software to parse PDF files, examine the
metadata and select the appropriate PDF/A or PDF/UA profile for validation. This is
requested by specifying the special -f 0
or --flavour 0
option, or passing
no flavour option at all. There’s an invalid PDF/A file in the same corpus directory. you can validate it by typing:
verapdf -f 0 corpus/veraPDF-corpus-staging/PDF_A-1b/6.6\ Actions/6.6.1\ General/veraPDF\ test\ suite\ 6-6-1-t0 veraPDF test suite 6-6-1-t01-fail-a.pdf veraPDF test suite 6-6-1-t02-pass-a.pdf
or even
verapdf corpus/veraPDF-corpus-staging/PDF_A-1b/6.6\ Actions/6.6.1\ General/veraPDF\ test\ suite\ 6-6-1-t0 veraPDF test suite 6-6-1-t01-fail-a.pdf.
This time the output looks like:
<?xml version="1.0" encoding="utf-8"?>
<report>
<buildInformation>
<releaseDetails id="core" version="1.26.1" buildDate="2023-01-10T02:34:00Z"></releaseDetails>
<releaseDetails id="gui" version="1.26.1" buildDate="2023-01-13T11:30:00Z"></releaseDetails>
<releaseDetails id="validation-model" version="1.26.1" buildDate="2023-01-10T02:39:00Z"></releaseDetails>
</buildInformation>
<jobs>
<job>
<item size="6213">
<name>/home/cfw/verapdf/dev/corpus/veraPDF-corpus-staging/PDF_A-1b/6.6 Actions/6.6.1 General/veraPDF test suite 6-6-1-t01-fail-a.pdf</name>
</item>
<validationReport profileName="PDF/A-1B validation profile" statement="PDF file is not compliant with Validation Profile requirements." isCompliant="false">
<details passedRules="101" failedRules="1" passedChecks="358" failedChecks="1">
<rule specification="ISO 19005-1:2005" clause="6.6.1" testNumber="1" status="failed" passedChecks="0" failedChecks="1">
<description>The Launch, Sound, Movie, ResetForm, ImportData and JavaScript actions shall not be permitted.
Additionally, the deprecated set-state and no-op actions shall not be permitted. The Hide action shall not be permitted (Corrigendum 2)</description>
<object>PDAction</object>
<test>S == "GoTo" || S == "GoToR" || S == "Thread" || S == "URI" || S == "Named" || S == "SubmitForm"</test>
<check status="failed">
<context>root/document[0]/OpenAction[0](5 0 obj PDAction)</context>
</check>
</rule>
</details>
</validationReport>
<duration start="1485163106779" finish="1485163107279">00:00:00:500</duration>
</job>
</jobs>
<batchSummary totalJobs="1" failedToParse="0" encrypted="0" outOfMemory="0" veraExceptions="0">
<validationReports compliant="0" nonCompliant="1" failedJobs="0">1</validationReports>
<featureReports failedJobs="0">0</featureReports>
<repairReports failedJobs="0">0</repairReports>
<duration start="1485163106741" finish="1485163107379">00:00:00:638</duration>
</batchSummary>
</report>
This time the report tells us that the file is invalid through the <validationReport isCompliant="false">
attribute. It also shows the details
of the failed test.
Default flavour
Automatic flavour detection is based on the document conformance specified in the embedded XMP metadata. If this metadata is not available or invalid, the default validation flavour is applied.
It is PDF/A-1b by default, but the user can change it by using --defaultflavour
or -df
option, for example
verapdf --defaultflavour 2b test.pdf
Choosing custom profile
The user can validate PDF files against a custom validation profile by using option --profile
or -p
. For example
verapdf --profile profile.xml test.pdf
Defining validation report
Report format
By default, veraPDF generates report in xml format. The user can specify a different report format (text
, raw
, html
, json
) by using --format
option with argument text
, raw
, html
or json
accordingly.
Note. Before veraPDF release 1.24 raw
report was called xml
. Starting from veraPDF release 1.24 xml
and mrr
refer to the same report format, and mrr
report format is deprecated.
Don’t display all failed checks for a particular rule
Sometimes files fail validation checks many times for a particular
validation rule. This is particularly true for rules relating to fonts and
colour profiles. You can use the --maxfailuresdisplayed
option to control the
maximum number of failures reported for a particular rule.
The veraPDF software will continue to process all checks without terminating, it
just won’t report all the results for every rule. The default is to report one
hundred failed checks per rule. To change the limit to 10 add the --maxfailuresdisplayed 10
option.
Report successful checks as well as failures
By default veraPDF only reports failed checks. It is possible to report passed
checks by adding the --success
or --passed
option to the CLI. In order to
see the passed checks for one of the test files type:
verapdf --success corpus/veraPDF-corpus-staging/PDF_A-1b/6.6\ Actions/6.6.1\ General/veraPDF\ test\ suite\ 6-6-1-t02-pass-a.pdf
We won’t show the output here as it’s quite long. The lack of any -f
or
--flavour
option means that veraPDF will select the appropriate Validation
Profile, meaning it’s equivalent to -f 0
or --flavour 0
,
see automatic profile selection above.
Include console logs into the report
veraPDF CLI generates log messages into the standard error (stderr
) output. By default they will go into the same console output as the generated report and may corrupt it. To redirect these log messages to a file (for example, logs.txt
), use the command verapdf 2>logs.txt ...
.
The user can also include logs into the report by using option --addlogs
.
The level of displayed logs is specified in option --loglevel
. Available levels are:
- 0 - OFF,
- 1 - SEVERE,
- 2 - WARNING, SEVERE
- 3 - CONFIG, INFO, WARNING, SEVERE,
- 4 - ALL.
Profiles wiki
HTML report contains reference links to veraPDF validation rule wiki https://github.com/veraPDF/veraPDF-validation-profiles/wiki/. You are unlikely going to change this unless you intend to host your own local version of the veraPDF validation rule wiki by using --wikiPath
option.
Show validation errors in text report
Text report contains only failed rule numbers. If the numbers of passed rules are needed, --verbose
or -v
should be used.
Processing multiple PDF files
Process all PDF files in the folder
So far we’ve only validated single PDF/A documents. It’s easy to validate multiple PDF documents using the command line interface. You can do this by passing the name of a directory rather than a file. To validate both of the earlier examples in a single command type:
verapdf corpus/veraPDF-corpus-staging/PDF_A-1b/6.6\ Actions/6.6.1\ General/
You can also validate any files in sub-directories by passing the -r
or
--recurse
flag. If you want to validate all PDF files in the corpus directory
type:
verapdf --recurse corpus
This obviously generates a lot of output and takes a little time to run, the batch summary on the test machine is shown below for reference:
<batchSummary totalJobs="1526" failedToParse="0" encrypted="0" outOfMemory="0" veraExceptions="0">
<validationReports compliant="636" nonCompliant="890" failedJobs="0">1526</validationReports>
<featureReports failedJobs="0">0</featureReports>
<repairReports failedJobs="0">0</repairReports>
<duration start="1485171727902" finish="1485171827790">00:01:39:888</duration>
</batchSummary>
meaning the software took one minute and forty seconds to process one thousand and five hundred files.
ZIP archive validation
It is also possible to validate multiple PDF documents within a ZIP archive. If an input file has ZIP format, veraPDF recursively scans and validates PDF files in all subfolders within the archive. For example
verapdf test.zip
Non pdf extension
You can validate pdf files with non-pdf extension by passing the --nonpdfext
option. For example
verapdf somefile --nonpdfext.
Show file names
While doing batch processing you may see all processed file names in console by using option --debug
or -d
.
Optimizing validation processing
The test documents are deliberately quite small and there aren’t too many checks made during validation, five hundred or less in each case. Large PDF documents can mean that the software makes hundreds of thousands of tests, sometimes with thousands of failed checks. It’s possible to control various aspects of this process by using some of the CLI options.
Stop processing after a set number of failures
The --maxfailures
option tells veraPDF to halt processing after it encounters
a set number of failed checks, e.g --maxfailures 10
would mean stop after 10
failed checks. The default value is -1, meaning process all failures.
Disable error messages
By default, veraPDF contains detailed error messages for each error case in report. The user can disable these messages to speed up the validation by using option --disableerrormessages
.
Show progress
The user can see the current status of the validation job in console by using option --progress
.
Use several CPU processes in parallel
veraPDF can parallelize validation of multiple files in several processes. The number of processes can be specified by --processes
option (1 by default). Independent of the number of processes used a single report for the while batch job will be generated.
Other topics
Disabling validation
As demonstrated in the examples above veraPDF validation runs as a default
option. While convenient this is not always desirable. You can disable
validation by passing the -o
or --off
option. This is usually done during
feature-extraction, for example
verapdf --off --extract somefile.pdf.
Encrypted PDF
By default, verapdf is trying to decrypt encrypted PDF file using empty user password. You can validate encrypted pdf files with non-empty password by passing the --password
option.
For example verapdf --password "12345" encrypted.pdf.
Using config files in CLI
veraPDF CLI can reuse configuration files of GUI application by specifying --config
option. The set of configuration XML files is described in https://docs.verapdf.org/cli/config/. Note that any explicitly specified CLI parameters will override the corresponding parameters from the config files.