Skip to main content

XML versus Binary Document Format Debates

Responding to Sean McGrath on his recent post about XML-vs-binary document formats:

Look, its just like I was trying to say in #python@irc.freenode.net the other day: There is nothing wrong with a format being binary. There is no virtue to be found in every byte of a file being interpreted as a textual character (or part of one) that represents your real data. "There ain't such thing as plain text," says Joel Spolsky! There isn't any difference in interpresting the binary as text than just interpretting it directly as your data.

You can easily have XML formats as undocumented and inconsistant between versions as any binary format, but you get the added benefit of extra processing overhead, bloated filesizes, and limitations on structure and performance (try keeping efficient on-disk indexes into an XML file up-to-date).

I do believe text-based formats and XML has its place, but these places are limited. I would have much perfered an opening and standardization of a relational-based format the way Word documents worked internally before Microsoft was bullied into an XML-based format.

I'm wondering if I actually know Sean and didn't realize it, because his points are exactly those I was arguing against in #python the other day, and so I wonder if someone I was arguing was McGrath by a different name. Knowing your strangers is a great gem of the internet-age, isn't it?

Comments

Anonymous said…
*Documents and/or Markup*

Step 1) Learn stream compression in Python via built-ins "zlib" or "bz2"
Step 2) Learn the subset of XLM that is Elementtree, use "cElementtree"
Step 3) There is no step 3

*Asynchronous Messaging*

Step 1) Learn json, use "simplejson"
Step 2) There is no step 2

*Configuration and Customization*

Step 1) Use a Python module
Step 2) There is no step 2

I started programming in 1982, when I was 11 years old. I wrote my last binary file in 1988, when I was 17 years old. The tools have greatly improved since then.

There are people who think there is more to say about these topics. I had my last debate about binary files when I was 17.5 years old.

Cheers, and all the best with your current and future projects.

moe@manuelmgarcia.com

Popular posts from this blog

CARDIAC: The Cardboard Computer

I am just so excited about this. CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer.  One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC? It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on

Statement Functions

At a small suggestion in #python, I wrote up a simple module that allows the use of many python statements in places requiring statements. This post serves as the announcement and documentation. You can find the release here . The pattern is the statement's keyword appended with a single underscore, so the first, of course, is print_. The example writes 'some+text' to an IOString for a URL query string. This mostly follows what it seems the print function will be in py3k. print_("some", "text", outfile=query_iostring, sep="+", end="") An obvious second choice was to wrap if statements. They take a condition value, and expect a truth value or callback an an optional else value or callback. Values and callbacks are named if_true, cb_true, if_false, and cb_false. if_(raw_input("Continue?")=="Y", cb_true=play_game, cb_false=quit) Of course, often your else might be an error case, so raising an exception could be useful

How To Teach Software Development

How To Teach Software Development Introduction Developers Quality Control Motivation Execution Businesses Students Schools Education is broken. Education about software development is even more broken. It is a sad observation of the industry from my eyes. I come to see good developers from what should be great educations as survivors, more than anything. Do they get a headstart from their education or do they overcome it? This is the first part in a series on software education. I want to open a discussion here. Please comment if you have thoughts. Blog about it, yourself. Write about how you disagree with me. Write more if you don't. We have a troubled industry. We care enough to do something about it. We hark on the bad developers the way people used to point at freak shows, but we only hurt ourselves but not improving the situation. We have to deal with their bad code. We are the twenty percent and we can't talk to the eighty percent, by definition, so we need to impro