Friday 15 June 2007

The preferred csv parser for java?

My project needs to parse a csv file.
Of course I could write "yet-another-csv-parser", but this time I looked a bit further - past my own code to see if someone else had created the preferred csv java api.

My requirements seemed fair:

  • Open source:
    • I would like to see the code and have the ability to extend where needed
    • If the code is available from a maven repository it is easy to download and add to my favorite IDE (intellij idea)
    • Distributed to ibiblio with sourcecode would be great as it eases my work even more.
  • Some kind of error reporting
    It would be nice if the parser could report a more specific error than just an IOException with now details. It would be nice if the error could tell something about what line that couldn't be parsed.

I found a couple of api's that I took a closer look at:

http://www.csvreader.com/
- Seemed easy to use, but not available from ibiblio (or any other repository I could find)
- Binary was available from sourceforge as was the sourcecode
- No error reporting besides IOException

http://www.mvnrepository.com/artifact/genjava/gj-csv
- Binary and source available from ibiblio.
- No error reporting besides IOException

http://www.mvnrepository.com/artifact/net.sf.opencsv/opencsv
- Only binary available from ibiblio.
- No error reporting besides IOException

Looked at some other apis as well but my general observation was that non of the api's had a focus on error reporting or validation of the document.

Why doesn't the java community have a preferred api for csv parsing?
One explanation could be that every project implements its own as parsing a csv file seems as a trivial operation.
Another explanation could be that this kind of work suffer from the Not invented here syndrome? :-)

Please let me know that I am missing the implementation - it simply must be out there.

7 comments:

Anonymous said...

A suggestion could be http://flatpack.sourceforge.net/

Jacob von Eyben said...

Flatpack looks promising, but I can't find it in any maven repository in contrast to what
this thread states.

Do you or anyone else know if it is available from a maven repository?

Anonymous said...

Check out this one: http://jffp.sourceforge.net/

Anonymous said...

How about:

String[] fields = String.split(",");

Anonymous said...

Just to add to the list, I did a *very* basic one a couple years ago: http://kasparov.skife.org/csv/ and I know Henri Yandell wrote one somewhere in http://www.osjava.org/ as well, I think another was donated to Jakarta at one point but no one picked up the ball -- that one was a commercial thing and probably is the most robust I know of in the face of bizarro input, but was much slower than mine.

Unknown said...

Hi,

Looking at one comment wrt FlatPack, I'd like to say that flatpack will available on a Maven repository as soon as we complete this release.

In the meantime, feel free to grab a SNAPSHOT at:
http://objectlabkit.sf.net/m1-repo/
under net.sf.flatpack


Thanks
Benoit

Anonymous said...

Apache Commons has a fairly intelligent CSVParser, and includes a CSVStrategy class that allows one to set delimiters, encapulators, escape characters, etc. and handle complex encapsulators.

http://commons.apache.org/sandbox/csv/apidocs/org/apache/commons/csv/package-summary.html