Developing with veraPDF

This is a quick start guide for developers wanting to work with veraPDF. You’ll need to know a little Java, Maven and git to follow the instructions. We’ve assumed you either want to:

  • integrate veraPDF into your own Java application; or
  • contribute to the veraPDF code base.

Whatever your destination, we’ll start the journey together. First you’ll need to decide which version of veraPDF you want to use and how you want to obtain it.

License

VeraPDF free software: you can redistribute it and/or modify it under the terms of either:

Getting veraPDF

There are two implementations of the veraPDF software library, one that uses a fork of the Apache PDFBox project as a PDF parser and validation model.

Greenfield vs PDFBox

Since releasing the PDFBox implementation the veraPDF consortium have developed their own “Greenfield” PDF parsing and validation model that’s available under the same dual open source licenses as the rest of veraPDF. You can also use PDFBox under the APL license. We recommend using Greenfield parser, as the support for PDFBox parser is not guaranteed to be available long term.

Greenfield POM dependency

To include veraPDF’s greenfield parser and validation model add:

<dependency>
  <groupId>org.verapdf</groupId>
  <artifactId>validation-model</artifactId>
  <version>1.24.1</version>
</dependency>

You can change the version number if you desire.

PDFBox POM dependency

This can be included in your project with this Maven dependency:

<dependency>
  <groupId>org.verapdf</groupId>
  <artifactId>pdfbox-validation-model</artifactId>
  <version>1.24.1</version>
</dependency>

javax vs jakarta

Both implementations above depend on javax. If your project uses jakarta, you should use alternative dependencies:

<dependency>
  <groupId>org.verapdf</groupId>
  <artifactId>validation-model-jakarta</artifactId>
  <version>1.24.1</version>
</dependency>

for Greenfield implementation and

<dependency>
  <groupId>org.verapdf</groupId>
  <artifactId>pdfbox-validation-model-jakarta</artifactId>
  <version>1.24.1</version>
</dependency>

for PDFBox implementation.

GitHub for source code

The up to date source repos are on GitHub.

Greenfield GitHub project

The clone and build the veraPDF consortium’s greenfield implementation using git and Maven:

git clone https://github.com/veraPDF/veraPDF-validation.git
cd veraPDF-validation
mvn clean install

PDFBox version GitHub project

For the PDFBox implementation:

git clone https://github.com/veraPDF/veraPDF-pdfbox-validation.git
cd veraPDF-pdfbox-validation
mvn clean install

Validating a PDF

To use the library to validate a PDF file you can do the following:

Initialising your chosen foundry

The veraPDF library is unaware of the implementations and needs to be initialised before first use. This is a slightly different process, depending on whether you’ve chosen the greenfield or PDFBox implementation.

Greenfield Foundry initialise

import org.verapdf.core.EncryptedPdfException;
import org.verapdf.core.ModelParsingException;
import org.verapdf.core.ValidationException;
import org.verapdf.gf.foundry.VeraGreenfieldFoundryProvider;
import org.verapdf.pdfa.Foundries;
import org.verapdf.pdfa.PDFAParser;
import org.verapdf.pdfa.results.ValidationResult;
import org.verapdf.pdfa.PDFAValidator;
import org.verapdf.pdfa.flavours.PDFAFlavour;

VeraGreenfieldFoundryProvider.initialise();

PDFBox Foundry initialise

import org.verapdf.core.EncryptedPdfException;
import org.verapdf.core.ModelParsingException;
import org.verapdf.core.ValidationException;
import org.verapdf.pdfbox.foundry.PdfBoxFoundryProvider;
import org.verapdf.pdfa.Foundries;
import org.verapdf.pdfa.PDFAParser;
import org.verapdf.pdfa.results.ValidationResult;
import org.verapdf.pdfa.PDFAValidator;
import org.verapdf.pdfa.flavours.PDFAFlavour;

PdfBoxFoundryProvider.initialise();

Validating a PDF File

You only need to initialise once, whichever version you’re using, now the code to validate a file called mydoc.pdf against the PDF/A 1b specification is:

PDFAFlavour flavour = PDFAFlavour.fromString("1b");
try (PDFAParser parser = Foundries.defaultInstance().createParser(new FileInputStream("mydoc.pdf"), flavour)) {
    PDFAValidator validator = Foundries.defaultInstance().createValidator(flavour, false);
    ValidationResult result = validator.validate(parser);
    if (result.isCompliant()) {
      // File is a valid PDF/A 1b
    } else {
      // it isn't
    }
} catch (IOException | ValidationException | ModelParsingException | EncryptedPdfException exception) {
	// Exception during validation
}

If you’re not sure what specification to use you can let the software decide:

try (PDFAParser parser = Foundries.defaultInstance().createParser(new FileInputStream("mydoc.pdf"))) {
    PDFAValidator validator = Foundries.defaultInstance().createValidator(parser.getFlavour(), false);
    ValidationResult result = validator.validate(parser);
    if (result.isCompliant()) {
      // File is a valid PDF/A 1b
    } else {
      // it isn't
    }
} catch (IOException | ValidationException | ModelParsingException | EncryptedPdfException exception) {
	// Exception during validation
}

The veraPDF Processor

There’s a higher level processor API aimed at developers wanting to combine the low-level components. You can read more in on the processor page.