Thursday 26 July 2007

quicky: xml schema validation using dom4j

I believe I have spend to much time with xml schema validation using dom4j, considering what I think should be a trival operation.
I was not capable of finding the howto on this subject, so I will provide my own. Mosty as a note to myself the next time I have to do schema validation.

public Document parseXMl(String xml) throws Exception {
URL resource = getClass().getResource("/my-schema.xsd");
SAXReader reader = new SAXReader(true);
reader.setFeature("http://apache.org/xml/features/validation/schema", true);
reader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
reader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", new File(resource.getFile()));
reader.setErrorHandler(new ErrorHandlerImpl());
InputSource source = new InputSource(new StringReader(xml));
source.setEncoding("UTF-8");
return reader.read(source);
}

3 comments:

Anonymous said...

Brilliant, thank you. I too spent way too long looking for the answer to this seemingly trivial question.

Jacob von Eyben said...

One other thing. If the schema you try to use is placed in one of your depending jar files you cannot just do:
reader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", new File(resource.getFile()));

Then you have to create your own entity resolver.
Inspired by this thread: http://mail-archives.apache.org/mod_mbox/xerces-j-dev/200108.mbox/%3C001201c13163$e7bcadb0$36a4c8c3@SHAREDVALUE.COM%3E

I created my own entity resolver:

private class CachedEntityResolver implements EntityResolver {

private String systemId;
private InputStream stream;

public CachedEntityResolver(String systemId, InputStream stream) {
this.systemId = systemId;
this.stream = stream;
}

public InputSource resolveEntity(String publicId, String systemId) {
InputSource result = null;
if (systemId.contains(this.systemId)) {
result = new InputSource(stream);
}
return result;
}
}


and used it like this:

schemaStream = ReportParserImpl.class.getResourceAsStream(ATLIGHT_REPORT_SCHEMA_SOURCE);
reader.setEntityResolver(new CachedEntityResolver("schema-name.xsd", schemaStream));


Then my document header looks like this:
<rapport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema-name.xsd">

I'm not sure that the fact that I just check if the systemId contains 'schema-name.xsd' is a hack, but it works and is deployable in any environment as it doesn't take the domain into concern.

Anonymous said...

Thanks. After lots of time digging, conflicting documents, unclear apis, yours was spot on. Thanks a milllion.