XML-based formats
"meaningless": the XML structure only conveys "meta data"
meta data: title, author, ... and whatever you get from the extensions
content itself is free text
but what is more interesting? the meta data or the content?