The Custodial Dilemma
In physical archaeology, when a site is excavated, there's a clear ethical framework:
- The excavator is responsible for proper documentation
- Artifacts go to museums with preservation mandates
- Findings are published for scholarly record
- Destruction of a site without preservation is considered unethical
In digital archaeology, this framework doesn't exist. When GeoCities shut down, Yahoo had no custodial obligation. When platforms die or pivot, users are left to rescue their own artifacts—if they even know to try.
The Three Layers of Custody
Layer 1: File Preservation (The Minimal Standard)
The most basic form of custody: keeping the bits intact.
Current Actors
- Internet Archive: Wayback Machine, Archive Team rescues
- Individual Archivists: Users who download their own data
- Fan Communities: ROM preservationists, game archive sites
- Academic Projects: Digital humanities initiatives
Limitations
Files alone create Umbrabytes. A downloaded GeoCities site is technically preserved, but without:
- The webring context it was part of
- The guestbook interactions
- The social network of linked homepages
- The cultural norms of early web design
...it becomes a "fly in amber"—visible but incomprehensible.
Layer 2: Context Preservation (The Scholarly Standard)
Documenting the ecosystem, not just the artifacts.
What Must Be Preserved
- Social Norms: How did people communicate? What were unwritten rules?
- Technical Constraints: Why did designs look this way? What were the limits?
- Cultural Practices: What rituals existed? (Forum signatures, blogrolls, etc.)
- Network Structure: How were sites connected? What was the "town square"?
- Temporal Evolution: How did practices change over time?
Methods
- Oral History: Interview participants while they're still accessible
- Metadata Collection: Save dates, user counts, interaction patterns
- FAQs and Guides: Archive tutorials, rules, community documents
- Comparative Analysis: Study multiple platforms to identify patterns
Example: Flash Games
Preserving a Flash game's .swf file (Layer 1) isn't enough. Layer 2 requires:
- Screenshots of the portal where it was hosted (Newgrounds, Miniclip)
- Comment threads showing how players discussed strategies
- Developer interviews about design constraints
- The medal/achievement system that motivated replays
Without this, the .swf is a Petribyte—playable, but culturally orphaned.
Layer 3: Functional Restoration (The Gold Standard)
Making the artifact work in its original ecosystem—or a faithful emulation.
Approaches
- Emulation: Run old software in simulated environments (Flash emulators, browser VMs)
- Recreation: Rebuild the ecosystem (restore a phpBB forum with original posts)
- Documentation: If restoration is impossible, document how it worked
Example: MMO Preservation
When an MMO shuts down (World of Warcraft Classic, Club Penguin), full preservation means:
- Layer 1: Save client files, assets, databases
- Layer 2: Document social rituals (raid strategies, guild dynamics, memes)
- Layer 3: Emulate servers so the game remains playable (as with Club Penguin Rewritten)
Who Bears the Responsibility?
Platform Creators
Current Reality
Most platforms have zero custodial obligation:
- Terms of Service often disclaim responsibility for data loss
- Shutdowns happen with minimal notice (Google Reader: 3 months)
- User data exports are minimal (Facebook gives you a .zip, no context)
- No funding for post-mortem preservation
Proposed Ethics
Platforms that host user-generated content should:
- Advance Notice: 1+ year warning before shutdown
- Full Export: Users get complete data dumps (posts, metadata, connections)
- Archive Partnership: Donate platform snapshot to Internet Archive
- Emulation Toolkit: Provide tools to run a local version (like phpBB backups)
- Context Documentation: Publish a "cultural guide" to the platform's norms
Precedent: LiveJournal
LiveJournal allowed users to export their journals as XML, including:
- All posts and comments
- Friend lists and community memberships
- Metadata (timestamps, privacy settings)
This enabled tools like Dreamwidth (a LiveJournal fork) to import entire journals, preserving the social network. This is Layer 2 custody—not just files, but connections.
Users
Individual Responsibility
Users can practice personal custodianship:
- Download Your Data: Use platform export tools (Google Takeout, Facebook data export)
- Self-Host Copies: Maintain a personal archive site
- Diversify Platforms: Don't rely on a single host (see "Rented Land")
- Use Open Formats: Markdown, HTML, plain text over proprietary formats
Collective Responsibility
User communities can preserve ecosystems:
- Archive Teams: Coordinate rescues when platforms announce shutdown
- Wiki Projects: Document platform culture (TV Tropes for internet phenomena)
- Emulation Projects: Rebuild dead platforms (Neopets server emulators)
Example: The Homestuck Archive Project
When the Flash-based webcomic Homestuck faced obsolescence (Flash's 2020 death), fans:
- Created the Unofficial Homestuck Collection (a downloadable offline version)
- Converted Flash animations to video
- Preserved forum discussions and fan theories
- Documented the cultural impact and memes
This is Layer 3 custody: full functional restoration with cultural context.
Institutional Archivists
Internet Archive
The Internet Archive practices Layer 1 and partial Layer 2 custody:
- Wayback Machine: Snapshots of web pages over time
- Archive Team: Targeted rescues of dying platforms (GeoCities, Yahoo Groups)
- Software Archive: Preservation of obsolete programs
Limitations
- No funding for Layer 3 (functional restoration)
- Legal gray areas (copyright, DMCA takedowns)
- Scale: can't archive everything
- Context Collapse: archives the file, not the ecosystem
Academic Institutions
Universities and libraries are beginning to adopt custodial roles:
- Digital Humanities Projects: Study and preserve born-digital culture
- Oral History Initiatives: Interview early internet pioneers
- Teaching Archives: Use preserved platforms for historical study
Example: Rhizome's Net Art Anthology
Rhizome (affiliated with the New Museum) preserves net art by:
- Emulating old browsers and plugins
- Interviewing artists about intent and context
- Documenting technical specs and dependencies
- Maintaining playable versions
This is Layer 3 custody: functional restoration with artistic intent preserved.
The Ethical Framework
The Archaeologist's Oath (Proposed)
Borrowed from physical archaeology, adapted for digital:
Do No Harm
- Don't "excavate" (copy/archive) without considering consent
- Respect privacy (don't archive private messages without permission)
- Avoid destroying context in the act of preservation
Document Thoroughly
- Files alone are insufficient—save the cultural context
- Note what's missing (broken links, lost dependencies)
- Explain your preservation methods for future archaeologists
Make It Accessible
- Archives should be public, not hoarded
- Use open formats and standards
- Provide documentation for future access
Attribute Properly
- Credit original creators
- Preserve authorship metadata
- Acknowledge community contributions
The Consent Problem
Archiving Without Permission
Internet Archive archives sites without explicit consent. Is this ethical?
Arguments For
- Public web content implies consent to be crawled
- Historical record has public value
- Opt-out mechanisms exist (robots.txt)
Arguments Against
- People post thinking content is ephemeral
- Cultural norms change (old forum posts may embarrass users)
- "Right to be forgotten" conflicts with preservation
Proposed Balance
- Archive public content by default (cultural record argument)
- Honor takedown requests for personal content (privacy argument)
- Anonymize sensitive data (e.g., private forum discussions)
- Add context warnings ("This is a historical snapshot, norms have changed")
The Cost of Neglect
What We've Already Lost
GeoCities (1999-2009)
- 38 million user-created pages
- Only ~15% archived by Internet Archive before shutdown
- The "Web 1.0 aesthetic" and homestead culture largely lost
Flash Content (1996-2020)
- Millions of games, animations, and interactive art
- Browser support ended December 2020
- Flashpoint project has preserved ~100,000 pieces, but millions remain lost
Yahoo Groups (2001-2020)
- Shut down with minimal notice
- Decades of community discussions lost
- Archive Team saved some, but much is gone
The Cultural Cost
Each loss is a "burned library":
- We lose primary sources for studying early internet culture
- Future historians have gaps in the digital record
- Marginalized communities' histories are disproportionately lost (less likely to be "important enough" to archive)
Tools for Custodians
For Individual Users
- wget / httrack: Download entire websites locally
- ArchiveBox: Personal archiving tool for bookmarks/links
- Google Takeout / Facebook Download: Export your platform data
- Static Site Generators: Convert dynamic content to static HTML (Jekyll, Hugo)
For Communities
- ArchiveTeam Warrior: Volunteer archiving tool
- Heritrix: Web crawler for archiving
- Webrecorder: Capture interactive web experiences
- OldWeb.today: Emulate old browsers to view archived sites authentically
For Institutions
- Emulation Frameworks: Maintain playable versions (EaaS - Emulation as a Service)
- Digital Preservation Systems: Lockss, DSpace for long-term storage
- Metadata Standards: Dublin Core, METS for context documentation
The Future of Custodianship
Legal Recognition
Potential future frameworks:
- Platform Shutdown Mandates: Require advance notice and user data exports
- Cultural Heritage Protection: Extend archaeological site protection to digital ecosystems
- Archival Exemptions: Safe harbor for non-commercial preservation (like library exemptions)
Technical Solutions
- Decentralization: ActivityPub, IPFS reduce single-point-of-failure
- Self-Hosting Revival: Make it easier for users to own their data
- Interoperability: Standards that let users migrate between platforms
Cultural Shift
From "platforms own your data" to "users are custodians of culture."
Field Notes
The Myspace Music Massacre: In 2019, Myspace admitted it lost 12 years of music uploads (2003-2015)—50 million songs—in a "server migration error." No backups existed. For many indie artists, this was their only archive. This is custodial negligence at industrial scale. Who should have been responsible? Myspace, clearly—but legally, they had no obligation.
Tumblr's NSFW Purge (2018): Tumblr banned adult content, deleting millions of posts. Users had no warning to back up. Many LGBTQ+ communities lost their primary gathering space. The content wasn't "important enough" for Internet Archive to have fully captured. This is selective archiving—marginalized voices are lost first. Custodial responsibility must include equity: whose cultures deserve preservation?
Mastodon's Answer: Mastodon lets users export their entire account (posts, followers, media) as a portable file. If an instance shuts down, you can migrate to another. This is Layer 2 custody: portable identity and connections, not just files. It's a template for ethical platform design.
Conclusion: Umbrabytes Demand Custodians
Umbrabytes exist because of custodial failure. When platforms die without preservation, we get:
- Type 1 Umbrabytes: No one archived the ecosystem before shutdown
- Type 2 Umbrabytes: No one documented dependencies before APIs broke
- Type 3 Umbrabytes: No one preserved the cultural context before it collapsed
Custodial Responsibility is the antidote. It asks: Who will remember what this meant?