so far we’ve got “everyone duplicates the data to their own repo and internally references it”, “sanitize and hash that shit in a way that’s hard to replicate, but at least it’s mutually reconcilable (even if not robustly reproducible), and “be parasitic on a different identification system”