Developing with veraPDF

This is a quick start guide for developers wanting to work with veraPDF. You’ll need to know a little Java, Maven and git to follow the instructions. We’ve assumed you either want to:

  • integrate veraPDF into your own Java application; or
  • contribute to the veraPDF code base.

Whatever your destination, we’ll start the journey together. First you’ll need to decide which version of veraPDF you want to use and how you want to obtain it.

License

VeraPDF free software: you can redistribute it and/or modify it under the terms of either:

Getting veraPDF

There are two implementations of the veraPDF software library, one that uses a fork of the Apache PDFBox project as a PDF parser and validation model. Since releasing the PDFBox implementation the veraPDF consortium have developed their own PDF parsing and valdiation model that’s avaliable under the same dual open source licenses as the rest of veraPDF.

Maven for integrators

If you want to integrate veraPDF into your own Java application and you’re using Maven you can add the following to your POM:

<repositories>
  <repository>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
    <id>vera-dev</id>
    <name>Vera development</name>
    <url>http://artifactory.openpreservation.org/artifactory/vera-dev</url>
  </repository>
</repositories>

to access the veraPDF Maven repository, we’ll be on Maven central soon.

Greenfield POM dependency

To include veraPDF’s greenfield parser and validation model add:

<dependency>
  <groupId>org.verapdf</groupId>
  <artifactId>validation-model</artifactId>
  <version>1.0.5</version>
</dependency>

You can change the version number if you desire.

PDFBox POM dependency

This can be included in your project with this Maven dependency:

<dependency>
  <groupId>org.verapdf</groupId>
  <artifactId>pdfbox-validation-model</artifactId>
  <version>1.0.5</version>
</dependency>

GitHub for source code

The up to date source repos are on GitHub.

Greenfield GithHub project

The clone and build the veraPDF consortium’s greenfield implementation using git and Maven:

git clone https://github.com/veraPDF/veraPDF-validation.git
cd veraPDF-validation
mvn clean install

PDFBox version GitHub project

For the PDFBox implementation:

git clone https://github.com/veraPDF/veraPDF-pdfbox-validation.git
cd veraPDF-pdfbox-validation
mvn clean install

Validating a PDF

To use the library to validate a PDF file you can do the following:

Initialising your chosen foundry

The veraPDF library is unaware of the implementations and needs to be initialised before first use. This is a slightly different process, depending on whether you’ve chosed the greenfield or PDFBox implementation.

Greenfield Foundry initialise

import org.verapdf.pdfa.VeraGreenfieldFoundryProvider;
import org.verapdf.pdfa.Foundries;
import org.verapdf.pdfa.PDFAParser;
import org.verapdf.pdfa.results.ValidationResult;
import org.verapdf.pdfa.PDFAValidator;
import org.verapdf.pdfa.flavours.PDFAFlavour;

VeraGreenfieldFoundryProvider.initialise();

PDFBox Foundry initialise

import org.verapdf.pdfa.PdfBoxFoundryProvider;
import org.verapdf.pdfa.Foundries;
import org.verapdf.pdfa.PDFAParser;
import org.verapdf.pdfa.results.ValidationResult;
import org.verapdf.pdfa.PDFAValidator;
import org.verapdf.pdfa.flavours.PDFAFlavour;

PdfBoxFoundryProvider.initialise();

Validating a PDF File

You only need to intialise once, whichever version you’re using, now the code to validated a file called mydoc.pdf against the PDF/A 1b specification is:

PDFAFlavour flavour = PDFAFlavour.fromString("1b");
PDFAValidator validator = Foundries.defaultInstance().createValidator(flavour, false);
try (PDFAParser parser = Foundries.defaultInstance().createParser(new FileInputStream(`mydoc.pdf`),
    flavour)) {
    ValidationResult result = validator.validate(parser);
    if (result.isCompliant()) {
      // File is a valid PDF/A 1b
    } else {
      // it isn't
    }
}

If you’re not sure what PDF/A specification to use you can let the software decide:

try (PDFAParser parser = Foundries.defaultInstance().createParser(new FileInputStream("mydoc.pdf")) {
    PDFAValidator validator = Foundries.defaultInstance().createValidator(parser.getFlavour(), false);
    ValidationResult result = validator.validate(parser);
    if (result.isCompliant()) {
      // File is a valid PDF/A 1b
    } else {
      // it isn't
    }
}

The veraPDF Processor

There’s a higher level processor API aimed at developers wanting to combine the low-level components. You can read more in on the processor page.