Fabien Benetou's PIM | Analysis / glTF in HTML, not PDF or even ePub

Why PDF is the wrong format to bring text to XR and why the Web with proper provenance and responsive design from stylesheets is all we need

For the Future of Text numerous discussions started on the premise that PDF is an interesting format to bring to VR or AR.

This is the wrong question. It assumes a medium can be transcluded in another. It assumes that because VR or AR or here XR for short has been named "The Ultimate Display" in 1965 Ivan Sutherland, it could somehow capture all past displays, and their formats, meaningfully.

Even though XR eventually could, we are not actually watching movies today that are sequentially showing pages of books. Rather we are getting a totally new experience that is shaped by the medium.

So yes, today, we can take a PDF and display it in XR, showing page after page as just images at first and try to somehow reproduce the experience of reading in a headset. It could open up a lot of new usages because, unlike with a television or screen we can actually interact back. We can write back on the content being displayed. Yet, what is the very reason for a PDF to exist? A PDF or Portable Document Format exists to be the same on all devices. It is a format used not be interacted with but rather be displayed untouched, verbatim. It has been somehow modified recently to allow the bare minimum of interaction, i.e signature, while remaining integrity for the rest of the document. This has tremendous value but begs the question, why would one want this in a spacial world? What is the value of a document keeping its shape, namely A4 or Letter pages, while the entire world around it can be freely reshaped? What is the value of a static document once interactive notebooks allowing one to not just "consume" a document but rather play with it, challenge it, share it back modified?

PDF does provide value but the value itself comes from a mindset of staticity, of permanence, of being closed.

The reality of most of our daily life, our workflow, is not that static. A document might be read printed in A4 or Letter yes but it might just as well be read on a 6.1" portrait display to an A4-ish eink device to a 32" 4K landscape monitor. Should the document itself remain the same or rather should its content adapt to where and how one wants to consume and eventually push back on it?

I would argue that any content that is not inviting annotation or even better the actual attempt at existing in its target context is stale. Beyond that it is not promoting hermeneutics or our own ability to make sense of it. Rather, it presents itself as the "truth" of the matter, and it maybe very well be, but unless it can be challenged to be proven as such, it is a very poor object of study.

Consequently a PDF, like a 4.25 x 6.87" paperback is a but a relic of an outdated past. It is an outdated symbol of knowledge rather than a current vector of learning.

The very same content could using HTML provide the very same capabilities and more. An HTML page can be read on any device with a browser but also much beyond. An HTML page with the right CSS, or cascading stylesheets, can be printed, either actually printed to paper or virtually to a document, including a PDF or an ePub, and thus become something static again. With the right stylesheets that document could look exactly like the author wants on whatever devices they believe it would be best consumed yet without preventing the reader from consuming it the way they want, because they have a device nobody else has.

So even though HTML and PDF can both be brought within XR, one begs for skeumorphism. The PDF is again, by what it claims to be its intrensic value, trapped in a frame. Bringing that frame in XR works of course but limits one can interact with it. Consequently focusing on bringing PDF to XR means limiting the ability to work with text. HTML, especially when written properly, namely with tags that represent semantics rather than how to view the content, insure that this is properly delegated to stylesheets is not trapped in skeumorphism. The content from an HTML document, in addition to being natively parseable by browsers that are already running on XR devices, can then be shapped to the usage. It can also be dynamic, from the most basic forms to image maps to 3D models that can in turn be manipulated in XR to, last but not least, computational notebooks. While PDF are static in both shape and execution model, namely none, an HTML document can also embed script tags that can modify its behavior. That behavior allows the intertwining of story and interaction. The content then is not just a passive description delegating, poorly as argued before due to the minimum ability to modify it while reading it, the interpretation to the reader but practicly makes the exploration of complex system impossible. An HTML document in contrast can present the content so that the system itself being studied can be embedded and thus run, not through the mind of the reader, but actually run. The simulation become the content letting the reader become an explorer of that content and thus able to try to understand much richer and complex systems while confronting their understanding to the truth of that system.

Unfortunately even though there exists today a solution for true responsiveness of 2D content, namely stylesheets, this is not true of 3D content, even less spacial content that could be manipulated in VR or AR or both. True responsiveness remains challenging because interactions are radically different and the space in which one has such interactions are also radically different. A 6.1" portrait display, an A4-ish eink device or a 32" 4K landscape monitor are still in the end flat surfaces one can point at, scroll within, etc. Reconsidering this and more in both a physical room and a virtual one, eventually with some understanding (e.g flat surface detection for floor and walls), leads to a richness of interactions vastly different. Consequently one must not just consider how to reflow a 2D document from a rectangle to another rectangle but rather to a partly filled volume. Currently there is no automated way to day so beside display skeumorphically the document in the volume. This works but is not particularly interesting, the same way that one does not watch a movie showing pages of a book, even a good book. Instead, being serious about picking a document format, being PDF, HTML, ePub or another, means being serious about the interactions with that document and the novel interactions truly novel interfaces, like VR and AR, do bring.

Assuming one still does want to bring 2D documents to a volume, the traditional question of provenance remains. As we bring a document in, how does the system know what the document is, its format in order to be displayed correctly but also its origin and other metadata? The Web did solve most of that problem through URIs and more commonly URLs, or DOI being looked up to become URLs pointing to a document, either a live one or the archive of one. The Web already provides a solution to how the content itself can move, e.g redirection, and browsers are able to follow such redirection to provide a pragmatic approach to a digital World that changes over time.

The question then often becomes, if formats already exist, if provenance can be solved, is there not a risk to point only to live documents that can become unaccessible? That is true but unfortunately death is a part of life. Archiving content is a perpetual challenge but it should not come at the cost of the present. For that still though mechanisms are already in place, namely local caching and mirroring. Local caching means that once a document is sucessfuly accessed the reading system can fetch a complete or partial copy then rely on it in the future if the original document is not available. PWA or Progressive Web Applications feature such a mechanism where the browser acts as a reader of documents but also a database of visited pages, proxying connections and providing a fallback so that even while offline, content that is already on the device remains accessible. Finally mirroring, centralised or not, insure that documents do remain accessible if the original source is not avaiable for whatever reason. The fact that most websites do not provide either PWA or downloadble archives for efficient mirroring is in no way a testimony that the Web does not have the capacity for resilience, only that good practices for providing documents over time are not yet seen as valuable enough. Luckily efforts like the Internet Archive do mirror content even while the original owner has made no effort to make their content more resilient. Finally technical solutions like IPFS, or the InterPlanetary File System, make replication across machines more convenient and thus more reliable, again despite more authors not putting the necessary care into having their work remaining available beside providing them to a third party that will archive without necessarily facilitating access.

Finally, being PDF, HTML, ePub or another format, the focus hitherto has been on bringing text, thus 2D, even arguably 1D if seen as a single string, to a volume, thus a 3D space with, i.e AR, or without, i.e VR, context. Even though this provides a powerful way to explore a new interface, XR, we must remain aware that this is still a form of transclusion. We are trying to force old media in a new one and thus will remain a limited endeavor. Yes it would surefly be interesting to bring the entirety of Humanity's knowledge to XR but is it genuinely a worthwhile pursuit? Past media still exist alongside XR and thus allow use, either while using XR (e.g using a phone or desktop screen while using AR or a collaborative experience with one person in VR and another video calling from a museum) or before and after it (e.g using a desktop to prepare a VR space then share it after) ... or even through our memory of it. Consequently even without any effort of bringing the content in XR, it does remain accessible somehow. The question rather could become, what native to 3D format could better help to create novel usages, based or not on older format. For this there are already countless solutions as 3D software long predates XR. That said 2 recent formats did emerge, i.e glTF or USD, Graphics Language Transmission Format and Universal Scene Description. Both are roughly equivalent but glTF, beside relying on the most popular Web format for data, namely JSON, already provides community extensions. This I believe is the most interesting aspect. glTF does not try to be encompassing but rather provide the minimum feature set then one can build on it for their own usage. That means there is an escape valve allowing to be readible by all other software but if one does find it insufficient can build on it and adapt it to their needs. This means glTF could become a format not just to exchange 3D models to display manipulable objects in XR but finally that such objects could address the points touched on before, namely text as a primitive, its provenance explicit.

glTF in HTML, not PDF or even ePub {Analysis}

Why PDF is the wrong format to bring text to XR and why the Web with proper provenance and responsive design from stylesheets is all we need