Narrative Provenance
In the physical art world, an Egyptian vase is worthless to a museum if its "provenance" cannot be established. Without knowing which tomb it came from and whose hands it passed through, it is just a pot. It has no context, and therefore, no historical authority.
The same is true in Digital Archaeology. A folder of .jpgs named "MySpace 2005" found on a torrent site has low provenance. A Chain of Custody record detailing *who* scraped it, *when*, using *what tool* (e.g., wget vs. high-fidelity browser automation), and declaring any redactions made, gives the artifact legitimacy.
The Four Pillars of Digital Provenance
Every valid archaeological record must answer four questions:
1. Origin (Where?)
Did this come from the live site? An API? A Google Cache snapshot? A user’s personal hard drive backup? The method of extraction defines the artifact’s fidelity.
2. Timeline (When?)
The exact timestamp of capture. In a digital world where content changes by the second (edit buttons, stealth deletions), knowing the precise moment of excavation is critical.
3. Agency (Who?)
Who performed the excavation? Was it an institutional bot (Internet Archive)? A rogue archivist? The platform owner themselves? The bias and capability of the excavator shape the result.
4. Modification (What changed?)
Digital preservation often requires intervention. Did you convert the format? Did you redact PII (Personally Identifiable Information)? Did you strip the metadata? All modifications must be declared.
Field Notes
The Credibility Gap: Without provenance, digital artifacts are just "data." With provenance, they become "evidence." In an era of deepfakes and AI-generated content, the chain of custody is the only defense against historical revisionism.
Standard Practice: We use the "Provenance Record" template for all excavations, documenting the Method, Capturer, Source State, and Checksums to ensure bit-perfect verification.