Wednesday, May 31, 2006

XML versus Binary Document Format Debates

Responding to Sean McGrath on his recent post about XML-vs-binary document formats:

Look, its just like I was trying to say in #python@irc.freenode.net the other day: There is nothing wrong with a format being binary. There is no virtue to be found in every byte of a file being interpreted as a textual character (or part of one) that represents your real data. "There ain't such thing as plain text," says Joel Spolsky! There isn't any difference in interpresting the binary as text than just interpretting it directly as your data.

You can easily have XML formats as undocumented and inconsistant between versions as any binary format, but you get the added benefit of extra processing overhead, bloated filesizes, and limitations on structure and performance (try keeping efficient on-disk indexes into an XML file up-to-date).

I do believe text-based formats and XML has its place, but these places are limited. I would have much perfered an opening and standardization of a relational-based format the way Word documents worked internally before Microsoft was bullied into an XML-based format.

I'm wondering if I actually know Sean and didn't realize it, because his points are exactly those I was arguing against in #python the other day, and so I wonder if someone I was arguing was McGrath by a different name. Knowing your strangers is a great gem of the internet-age, isn't it?

1 comment:

Anonymous said...

*Documents and/or Markup*

Step 1) Learn stream compression in Python via built-ins "zlib" or "bz2"
Step 2) Learn the subset of XLM that is Elementtree, use "cElementtree"
Step 3) There is no step 3

*Asynchronous Messaging*

Step 1) Learn json, use "simplejson"
Step 2) There is no step 2

*Configuration and Customization*

Step 1) Use a Python module
Step 2) There is no step 2

I started programming in 1982, when I was 11 years old. I wrote my last binary file in 1988, when I was 17 years old. The tools have greatly improved since then.

There are people who think there is more to say about these topics. I had my last debate about binary files when I was 17.5 years old.

Cheers, and all the best with your current and future projects.

moe@manuelmgarcia.com

I write here about programming, how to program better, things I think are neat and are related to programming. I might write other things at my personal website.

I am happily employed by the excellent Caktus Group, located in beautiful and friendly Carrboro, NC, where I work with Python, Django, and Javascript.

Blog Archive