Web3 for data preservation? (Or is it just another expensive P2P?)

A heap of data in Steampunk style, as rendered by DALL-E. Slightly modified by Marcel Waldvogel

Drew Austin raises an important question in Wired: How should we deal with our accumulated personal data? How can we get from randomly hoarding to selection and preservation? And why does his proposed solution of Web3 not work out? A few analytical thoughts.

More thoughts on blockchain and Web3 can be found here.

Current situation

We as individuals, organizations, and the entire society as a whole are creating increasing amounts of data. But then are stuck what to do with them.

Look at your email workflow: What do you do after processing your emails in your inbox? Delete them? Archive them? How long?

How about your photo collection? Will you ever look at it again? How do you find that impromptu family picture, taken some five or six years ago, where your brother looked so cute?

Some organizations have institutionalized this, but how do we, as individuals, handle this?

Structuring the problem

Archivists have collected best practices for this over the centuries. A typical workflow includes:

  1. Acquisition: Obtaining the material
  2. Arrangement: Structure the material into categories or topics
  3. Selection: Chose what you want to keep (and maybe decide, how long)
  4. Annotation: Add cataloging metadata, describing the material, so it can easily be found again
  5. Preservation: Ensure that the data will not deteriorate and can still be accessed/read later (think decades or centuries!)

This workflow includes subject matter experts, archivists and conservators, to name just some of the jobs and skills involved. If it is being done by trained personnel on time allocated for that purpose (and, typically, also paid), the process has been shown to work well for physical objects over centuries. Variations of this process are in use for digital data, still being fine-tuned.

But this requires time and money, and also commitment. How can this be ensured for individuals or small organizations without the resources?

1. Acquisition

For individuals, this is typically trivial: You already have the data. Also, for small organizations, this might be simple to achieve.

2. Arrangement

Grouping is somewhat more complex, but assuming we later have enough metadata and search, this step might be skipped.

3. Selection

Individuals creating large amounts of data is a new phenomenon. Previously, selection was left to experts (scholars citing, librarians curating, bookkeepers/engineers classifying, …). The goal and its value to society/the company were clear.

As individuals, we need to learn how to do perform selection on our own data. Few of us are willing to spend time on it, even though it might help you organize both your data and thoughts.

For some things, it might be dear (or important) enough to do it manually. For other parts, it might be helpful to use automated tools; e.g., train some AI based on your preferences and accept that some decisions might go wrong.

4. Annotation

In a first approximation, annotation can also be simplified: We have full-text search for text, which will definitely improve over time. For images, we have classifiers as well. Both have their drawbacks, but hey, you can’t have the cake and eat it.

5. Preservation

Again, as a society/organization/individual, we need to state how much we value preservation.

Archivists have been doing that for physical representations, typically being paid for their jobs. For digital data, libraries and online archives have started archiving as part of their overall duty and budget.

This does not scale yet to individuals or organizations. Yes, we need a general, simple solution. However, I shudder when reading in the article that Web3 will solve the problem. When monetary incentives from the underlying business model (“line goes up”) ever dwindle, so will the number of replicas maintaining a copy.

And it is not that the overall Blockchain ecosystem needs to die, before your data becomes extinct. It suffices for your particular chosen Blockchain project to die, the one your data is on. And how would you know that one of those hundreds of blockchains currently out there will survive for decades or even centuries?

Just remember that 15 years ago, before the dawn of the iPhone, we did not think that just a few years later everyone would be able to take pictures even in remote areas and share them with the world in just a few seconds. Or that we could obtain a local map, an encyclopedia entry, or a restaurant reservation while being far away from home.

Also, having an entire Blockchain as your archival unit, you lose the ability for selection, which is what the article started out with. Retaining everything forever cannot be the goal.

And you having to retain everyone else’s data just to keep yours may also not be your first choice, especially as that data may be millions or billions times larger than your own data. Having smaller, more flexible collections is probably the way to go (and git repositories could be such an idea, supported by several web publishing systems).

Whatever you chose: Make sure there is an incentive, monetary or otherwise, to keep that data. And commitment from your side as well.

Some 20 years ago, many ideas were floating around: Tit-for-tat, LOCKSS, … in addition to monetary ones. We should reconsider them.

Thinking clearly about the goals we want to achieve and how to achieve them is more important than just randomly stating buzzwords. Because Web3 is just expensive P2P.


This article was prompted by a post by Matthias Bürcher and incorporates some of his feedback. Thank you!

The article’s teaser image was rendered by DALL•E 2 using the prompt “A heap of data. Steampunk style.” and extended/modified using generation frame and eraser. Plus some final touches by me (moving the orange ink bottle into the selection; otherwise, that area would have resembled sky instead of something more table-ish).

Blockchain ecosystem

More posts in the blockchain ecosystem here, with the latest here:

Let’s stay in touch!

Receive a mail whenever I publish a new post.

About 1-2 Mails per month, no Spam.

Follow me on the Fediverse

Web apps

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.