Organizing thoughts and data

Phase 1: Deleting/archiving presence elsewhere

A big part of this thrust to centralize … myself, for lack of a better word—is getting all the bits and pieces together in one place.

There have been a variety of platforms over the years. Some have died—I don't think I stored anything important on orkut—but on those that still exist, you can see the entropy.

I've been convinced of the value of holding onto your data in formats that will resist the test of time for years. Nevertheless, it's amazing to me how much entropy has struck in places where there has been platform continuity for over ten years. Embedded tags (nominally HTML, but obviously parsed somehow by the internal tooling) and other metadata has bit-rotted until everything is nearly unreadable.

After some effort, I have archives in local storage of almost everything I've ever posted online, and the old versions are "deleted"1 off those platforms.

With some exceptions. Facebook is a tough nut to crack; I needed to register as a developer to even begin to download my post graph entries, and after toying with it for a bit I'm not sure that I'm ready to go all the way down that rabbit-hole. It looks like I'd have to run a local server in order to scrape an effective copy of my data from them, since their "archive" tool is god-awful for someone who shared as many links as me.

Phase 2: Triage

A lot of the stuff I've written down can go back up in due time. A lot of it should never see the light of day.

This leads to a question of, what's the point of a blog? I know that a lot of what I write now will seem embarrassing by the standards of future me, but this process seems asymptotic. Meanwhile, angsty blog posts from when I was 15 contain … embarrassing turns of phrase, among other things, that don't shed light on who I am now.

Broadly speaking, I don't believe in deleting things. On reddit2, if I was blatantly wrong, and someone pointed it out, I'd always leave my posts up as a matter of principle, and to provide context for people who came by after.

But does that mean I have to put everything back up? I doubt it.

What's interesting about this place is that (my intent is) it is a place to focus and refine my thoughts. The ideas I've gotten the most mileage out of are the ones I write down in a place I also read. So, some turn of phrase or tiny stub in a page of a notebook I constantly flip through worms its way into memory via spaced repetition, essentially.

This suggests that the real goal is to find the kernels that reflect deeper truths… and then consolidate them into something wiki-like.

Further, blog entries outside my journals should be considered transient. This suggests another layer of metadata, for posts that don't hold interest because they've been superseded by something more refined or more correct. Not just for currently outdated posts, but for stuff that's fresh now that'll seem stupid in two or ten years.

Do I need to own up to everything I've ever said? That way lies madness, surely. But reading and consolidating is probably in the cards, which suggests something wiki-like is on the medium term road map.

Phase 3: Editing

For the stuff that's worth posting, editing is necessary. I may end up going through and doing this cleanup on everything, but the first-order changes involve more or less the following:

  • Remove useless metadata (e.g. date_gmt, added by Wordpress)
  • Clean up remaining metadata
  • Change formatting to markdown
  • Resolve obvious typos
  • Add appropriate tags and categories

The point of this blog, again, is organizing the data, so having effective cross-links is the first part of that. As mentioned before, I'll probably need some sort of "archival" tag, to indicate to readers that what they're reading is historical reference3.

This is a highly manual task. An automated tool might sound good at first blush, but the data sources are heterogeneous and the sorts of formatting that each demanded differ that re-establishing the correct context in markdown syntax demands I re-read and hand-tweak everything.

In theory I'm not against this, because I want to re-read and consolidate as many of my old thoughts as I can, but 1) it's a lot of content and 2) it's going to screw up my plan to get permalinks running. I don't think date+slug is an effective permalink schema for content where date might not be the most important aspect of what I'm doing, but without a database there aren't really any unique primary keys to work against. I'll have to chew on this for a while.

Conclusion

What's the point of all this effort? Well, I have bits and pieces all over, and it's part of a process to more well-define my identity and make my thinking more coherent.

Journaling is more of a "in the moment" process, that reflects where I am in the day and time. Organizing, as an umbrella, is about finding a kernel of myself, and building on it logically.

  1. Who knows if these platforms actually delete old content, though. 

  2. … speaking of sites I wrote content for but never archived… However, so much of what I wrote on reddit only makes sense in the context of the thread it's in, so maybe it's okay that I don't have that. 

  3. And perhaps add an obvious note, and cross-links to newer versions?