Own Your Data

Captured from Twitter, here is Tom Henrich’s partial reconstruction of my conversation with Tantek Çelik, Glenda Bautista, Andy Rutledge and others on the merits of self-hosting social content and publishing to various sites rather than aggregating locally from external sources.

via Own Your Data / technophilia

45 thoughts on “Own Your Data

  1. For five years, I’ve been using service’s APIs (Flickr, Twitter, Facebook, Delicious, etc.) to pull my social content into a local database, and publishing most (but not all) of it at jeffcroft.com. This is the opposite of what Tantek does (publishing locally, and then using the service’s APIs to push content to them). I think it’s really smart to have a local copy of your social data (Tantek’s right, it’s awesome for searching), but I’m not really sure it matter which way you go about it.

    But I can tell you why I chose to do it the way I do it: tooling.

    Let’s just take Twitter, for example. If I want to post a tweet, and I do it Tantek’s way, I have to build some local interface from which to tweet. That takes time and effort. And, it’s likely to be feature incomplete (for example, when Tantek post a reply, it doesn’t seem to use reply-to-id when it gets posted to Twitter, and therefore doesn’t perserve the conversation thread). On the other hand, doing it my way, I don’t have to create any interface — i just download one of the bazillion great Twitter clients already available for any platform, and bam, I’m tweeting.

    Same goes for delicious (or pinboard, or whatever), Facebook, and Flickr — these services already have a ton of great tools built on top of them. For another example, when I’m reading RSS feeds on my iPad using the Reeder app, there’s a built-in function to “Post this to Pinboard.” If I went Tantek’s way, I wouldn’t have a function built into Reeder.

    I am glad that people are finally coming around to the idea that it’s good to have a local backup of your social content, but I think the way Jeremy and Tantek are suggesting we do it is the hard way.

  2. Tantek is looking at the problem of “who owns your data?” It’s a very real and very serious subject that needs to be addressed and will be addressed in the coming years. Tantek is on the leading edge of the inquiry; I applaud that but find his solution ungainly—and I think Twitter is the wrong battleground, because tweets are ephemeral by their nature. Worrying about their impermanence is like worrying about the impermanence of snowflakes.

    A few of his specific concerns are:

    1.) What if Twitter goes away (a la Gnolia, Delicious, etc.)? He will lose access to all his Twitter conversations. This is a legitimate concern to be sure. When you have conversations or publish thoughts (i.e. microblogging) on someone else’s service, you are at the mercy of that service. I have similar concerns, but my solution is the same as it has been since 1995: I post important things at zeldman.com, a site I own and control.

    2.) Twitter has a character limit and some thoughts demand to exceed its limitations. Again, my answer to his problem is, so, blog. Tantek in fact *is* blogging, he’s just doing it in a screwy way that involves Twitter.

    3.) Twitter’s search is poor and data basically can’t be found after a few days. That is a serious limitation of Twitter. I’ve bumped up against it myself, as when Twitter lost the ability to uncover the hash tags on which our Blue Beanie Day haiku contest depended. On those grounds, and assuming that some of the conversations on Twitter are meaningful to their participants, it may make sense to try to work outside Twitter to provide a more durable search of its data, or at least the part of its data that is yours. This is a strong motivation for Tantek. I agree with it and would probably be more on board if Tantek’s UX were better considered. Not only is he doing it the hard way (I agree with Jeff Croft’s point above), he is also doing it the ugly, confusing, off-putting way. As a designer, I can’t get behind anything that makes the web harder to use and understand. The title is “Don’t make me think,” not “Please confuse me while making a simple service like Twitter harder to use.”

    Yet even if Tantek’s solution were perfectly designed, what good is searching YOUR Twitter data if you can’t search other people’s? Tantek’s project wouldn’t have solved my missing hash tag problem.

    Twitter needs to do a better job of storing its data. It probably needs to open source and share the data across multiple servers not connected to Twitter.com. LOCKSS: Lots Of Copies Keep Stuff Safe. If Twitter would do this, Tantek could stop making its API jump thru its anus in a futile bid to make Twitter into something it isn’t.

    Twitter isn’t WordPress. It probably won’t open source its data. The Library of Congress’s capture of all Twitter data may, however, do the trick.

    Twitter is only one social media service. They are all impermanent. We are impermanent, the planet is impermanent, although the planet is less impermanent than, say, Flickr. “The universe is expanding” aside, it makes sense to acknowledge that while we can preserve our data at least until digital formats change (I can store my photos on my hard drive, in the cloud, on Flickr, and a dozen other places, but 50 years from now, JPG may an historical curiosity), we cannot preserve the social relationships connected to our data.

    Let me say that again: we cannot preserve the social relationships connected to our data.

  3. Totally agree that Twitter needs to do a better job with search, although I think it’s a separate problem than “Owning Your Data” — owning your data is relevant to ALL social content, not just tweets.

  4. owning your data is relevant to ALL social content, not just tweets.

    Agreed, I said the same. Leaving aside the problem that digital formats inevitably become obsolete, and assuming texts, photos, and so on can somehow be preserved indefinitely (at least until I die and take my hard drives with me), the fact remains that we cannot preserve the *social* aspect. I can store my photos but not the nice things you said about them.

  5. I’ve been using backupify for a while now. Just the free plan. This doesn’t address data ownership concerns; In fact perhaps it plays into them.

    I like to see Mr. Croft and Mr. Zeldman agree on stuff.

  6. Pownce went away, taking with it long posts and attachments.
    Where did all that content go, I wonder?

    Anyway, just wanted to chime in as another person who is following all of this chatter with keen interest. I certainly have my own opinions on how it should work, but for now I am just observing carefully.

  7. Where did all that content go, I wonder?

    Where did my aunt Lolly’s mind go when she died? I’ve been asking myself that question since I was eight. (We weren’t a religious family; there was no presentation of a consoling afterlife.)

  8. I store the nice things people say about them:

    I take my hat off to you, sir. Sincerely. That is awesome. (It’s also a bit beyond most people’s reach. I wouldn’t attempt it here. All the more reason to admire not only that you’ve done it, but that you’ve done it so well.)

    But, in general, you’re right: it’s much more difficult, and sometimes impossible, to store the complete social context of a piece of content.

    That’s why @t’s Rube Goldberg a la Twitter perplexes me. It misses the core thing, the social thing, while fighting what is great about Twitter–its simplicity.

  9. That’s why @t’s Rube Goldberg a la Twitter perplexes me. It misses the core thing, the social thing, while fighting what is great about Twitter–its simplicity.

    Totally agree.

    I bet he uses Android, too. (HEY-O!)

    :)

  10. Another point is how reliable is the local storage that you speak of. Normally any service that is about to shut itself down has a notice period and allow users to take away whatever they can in terms of raw data (not the social layer).
    Also while the chances of a website failing/shutting is higher yet even local data gets lost, servers get hacked and database get screwed with.

    Also trying to want the social conversation/layer is like asking for something that was never yours to begin with. It was facilitated and initiated by the site in question and hence it would be naive to question how to save it for posterity.

    Trying to replicate/duplicate the data layer of every service with a self developed tools seems like a naive attempt. What about using Gmail or the enterprise version. What if Google shuts it down.

    I totally understand the points I make might be highly ridiculous, but step back and think about it. With the probabilities of events and things that can go wrong. One site where you shared the photos of your cats and 347 people liked it and 1643 commented on it is the least of your concerns.

    Also I am sure every content except for updates and tweets normally have a local storage in form of raw files so is it really that big a problem?

  11. I’ve been using Backupify for a while now. Just the free plan. It doesn’t address the data ownership question; In fact perhaps it just adds to it. But it feels good.

    Not as good as seeing Mr. Croft and Mr. Zeldman agreeing to agree.

  12. Potentially you can store the nice things others say about your work, if you can get at them in the first place. Propagating comments form one place to another is something several companies already offer, and the Salmon Protocol is an effort to formalize and standardize this so that you can have LOCKSS in his domain too.
    John Naughton wrote about how the author Salman Rushdie’s work is now archived

    Emory’s Rushdie archive included not only the writer’s papers, but also his old computers and hard drives. And there, on the slide, was the symbol for an old Apple Macintosh computer and in its directory listing was a folder entitled, simply, “My Money”. And at that moment, if you will forgive the pun, the penny dropped.

    I’ve always associated creative writers with paper, which is ridiculous given that virtually all of them have been using word-processors for decades. I’m accustomed to newsreel footage of police squads descending on the homes of suspected criminals and leaving with every electronic device on the premises. But the idea of university archivists turning up at Rushdie’s apartment and taking away every computer, hard drive, CD-Rom disk and USB stick in his possession had never crossed my mind. And yet that’s what involved if you buy somebody’s “papers” these days.

    Clearly this is a transitional phase; just as preserving a writers’ correspondence meant hoping they kept carbons of their sent letters as well as received ones, currently you need to capture the entire working environment, and presumably set up emulators for the obsolete operating systems to keep it available.

  13. Not as good as seeing Mr. Croft and Mr. Zeldman agreeing to agree.

    We probably agree on lots of things — those just usually aren’t as interesting to talk about. :)

  14. I suspect that my response to where your Aunt Lolly’s mind went will be much different than yours – the last thing I want to do is derail a healthy topic on the permanence of social networking with religion. Someday we may be able to back up,subscribe to, hack, and even scrape content from minds as if they were a database… which will be a terrifying day indeed.

    In the meantime, can we classify people who collect and keep their content in this manner “social hoarders”? As much as I like like idea of keeping everything, I imagine this virtual house of stuff becoming more and more full over the years until it drives you completely insane.

    At some point, we have to know when enough is enough, and start throwing out the old sh*t that doesn’t matter any more. At least so that guests can walk through the living room again.

  15. This issue becomes more and more important as security tightens in the US and as networked social services become more and more important. I am hoping diaspora or a similar solution becomes viable in the near future.

    If smaller organizations can run their own social network platforms and if those platforms can be linked between organisations, that would be a WIN. Groups or super individuals could control their own data and share it in the ways they find most useful.

  16. Anton Peck:

    BINGO. I agree with that, too. Letting go is beautiful. For a while I was keeping all my email (so “future historians” could study The Web Standards Project’s beginnings), transferring it from machine to machine, preserving a version of Eudora that ran in Classic (otherwise the mails and their socially connected chains—who replied to whom, when—would be lost). One day Apple discontinued Classic. One day I started doing mail from the cloud. One day I wiped a bunch of files. It felt great.

    A healthy spirit is not tied to possessions or relics. Currently we are unable to preserve all these little structures that matter to us now. And maybe that is a good thing.

  17. At some point, we have to know when enough is enough, and start throwing out the old sh*t that doesn’t matter any more. At least so that guests can walk through the living room again.

    It’s a really interesting perspective (which was also brought up by Zeldman, when he said, “This isn’t Chopin we’re dealing with). Perhaps not all of this is meant to be stored forever, and we should be okay with that.

    It’s kind of weird that I’m a “social hoarder” (and I absolutely am). In the physical world, I’m the opposite. I don’t have more square footage than I need, I throw things out regularly, I don’t keep clothes around if I haven’t worn them in a year, etc. But somehow, because I can keep everything digitally, store it cleanly and elegantly, and it only takes up the space of a single hard drive, I do it.

    Who knows, maybe I should just lose the mentality that all of this is actually important. :)

  18. I backup my tweets not because I’m concerned that the historians of the future will be denied the benefits of access to them, but so that I can more easily refer back to them if there was something I wanted to look up. You noted Twitter’s search has some pretty extreme limitations, and that’s not acceptable to me.

    Even so, if I’m in a conversation with someone on Twitter, that conversation will be half-missing if Twitter ever dies. And that’s a risk I’m willing to accept, because the alternatives are (1) not backing anything up, or (2) backing everything up, from every user ever.

    I’m more concerned with my actual blog posts. I started using Tumblr & Disqus because they were relatively easy to set up and allowed me to focus more on the content than the platform. I’m starting to reconsider that and look for ways to bring that content back under my own control, though that no doubt means giving up some more advanced functionality that I won’t want to recreate myself. Again, it’s a trade-off, and I’m okay with that.

  19. When you really think about it, all social sites are ephemeral. I’m wondering about the value of keeping all of that data. I mean, I have boxes full of college notebooks and papers in storage that I thought would be valuable to keep. Now, years later, I wonder if I was right to keep them at all. I have photo albums. Even with the explosion of digital photos, I love my photos but I have to ask what tiny fraction of them I really want to keep. It’s sort of an existential question — what are we hanging onto and why?

    Maybe that’s arching over the original problem … but I do that. :)

  20. This is an interesting article with some great comments and insight into this concept of “our data.” My first encounter with Twitter’s lack of history reared its ugly head when I was building the website for Designing Social Interfaces. They wanted to have an active stream of tweets with the #dsi hashtag, but no matter what I tried I could not get Twitter search to look beyond a day or two at most. It was frustrating for all of us and short of immediately placing these tweets into our own local database, we were never able to develop a winning solution.

    I tend to look at things like tweets and their ilk to be fleeting glimpses into our daily lives. While it might be nice to have the means to save these things for posterity, perhaps their impermanence should empower us to hold onto things that truly matter instead? I like the analogy of “saving snowflakes,” as it seems befitting. But even talented photographers can take lasting snapshots of these beautiful, delicate natural works of art to share with the rest of the world.

    I can relate to both sides of this debate equally well. Maybe this is a situation where, much like Tantek has done, we need to scratch our own itch?

  21. I tried to comment here but WordPress hates hypertext.

    And I suppose in the spirit of “owning your data” it’s more appropriate for me to post a lengthy comment on my own site anyway.

    On Owning Your Data: Follow-up to @Zeldman and the #indieweb

    conclusion from that:

    Your site should be the source and hub for everything you post online. This doesn’t exist yet, it’s a forward looking vision, and I and others are hard at work building it. It’s the future of the indie web.

  22. The question whether everything must be saved is important.

    The logical answer is that the question should be mine (as a creator) to answer, regardless of what that answer may be.

  23. Billee D. said:

    Maybe this is a situation where, much like Tantek has done, we need to scratch our own itch?

    Scratching your own itch is exactly the right attitude to take on this and in fact was the very title of my brief talk at the all-participatory Federated Social Web Summit last year:

    Itches & Scratches

    (lots more links/rants/hopes on these topics in that talk outline)

  24. Your site should be the source and hub for everything you post online.

    But why? Surely its just as valid a choice to pull your social content down from the services as it is to push it up? Maybe one works better for you, or fits your socio-political ideals better, but surely it’s up to the individual to choose what works best for them.

    I like that you’re trying to solve the problem in a way that works for you, and I’d encourage you to continue doing so. But to suggest that others “should” also do the same thing is a bit heavy-handed. There are many solutions to this problem, and there are those who don’t even really see it as a problem. To each their own.

  25. While I confess I still have my original paper diaries from when I was 9 years old in a box in the closet that I’ve hauled from MI to SF to various apartments and back again, I don’t have much attachment to conversations that happen on Twitter or similar services. Though digital content and identity has different expirations dates as they pertain in importance to me.

    Ie; Tweet conversations I feel need only a week long expiration date at most before they’re stale. If they’re more important, I’d copy/paste and repost in blog with more detailed thoughts on my own blog, owning content that way.

    Emails have an expiration date from between 5 minutes to 90 days typically depending on source and relative action item.

    Social Network profiles and related content expire within a year and either need a refresh or to be relegated to the elephant graveyard.

    Music/Video/Photo files have a no expiration date by comparison and archived and shared both locally, on a cloud, and on appropriate sites.

    Blogs and digital legacy are their own beast. This most recent article that recalls Leslie Harpold brings this to top of mind as well.

    I used to be obsessed with the notion of keeping records,backups and archives of relative conversation that I’ve had online but it’s … well, kinda insane.I mean, do you go so far as to find or build a tool to export out all pertinent SMS conversations too?

    And for the record, this conversation in blogposts/trackbacks, & comments on your blog, Mr Z, is way more interesting and thoughtful than what was passing through on Twitter anyway and warrants more socially rich and contextual documentation.

  26. As an aside, I find it interesting that you use the term “earned” to reflect interest/social currency in the form of comments.

  27. @Jeff Croft: aren’t other people’s comments that you’re archiving copyrighted to them? Or to Flickr itself?

    Also what if someone edits or deletes a comment? Will the changes be reflected on your site?

    @Jeffrey Zeldman: this is a fascinating topic. But even saving your own data on your site isn’t enough. All of us need to think of the long term too – can our data survive for centuries? Or will everything we do online simply disappear when our hosting runs out (and no one has the passwords we used)?

    This timely article on Slashdot concerning a piece by Dave Winer is a must-read: Are You Ready For the Digital Afterlife?

    Lastly, what was playing while I read the comments here? Only ‘Tears In Rain’, from the Blade Runner soundtrack by Vangelis, which aptly goes:

    “…All those moments, will be lost in time like tears in rain”

  28. @Jeff Croft: aren’t other people’s comments that you’re archiving copyrighted to them? Or to Flickr itself?

    Flickr’s API adheres to all privacy settings. That is, I can’t display the comments of people who have specified their comments to not be available via the API, and I can’t display photos that don’t have a license that allows for it (I also show my favorite photos on my site, when I can).

  29. Jeff Croft said on 10 January 2011 at 3:38 pm:

    But why? Surely its [sic] just as valid a choice to pull your social content down from the services as it is to push it up?

    It is not the same, not even close.

    This is not a matter of opinion/taste, this is a matter of experience and data.

    Please read: http://tantek.com/2011/010/b1/owning-your-data

    In particular:

    “Simply copying from these shared social services still leaves you vulnerable to their flakiness, poor auto-shortening of links, unscalability, downtime, maintenance, database failures, and acquisitions.”

    And note the hyperlinked examples of those problems (in blog post, not here, because WordPress hates hypertext).

    So I suppose, yes, if you’re ok with those problems, then sure, sharecrop and copy away. But in that case why bother with a personal site at all? Since when any of those services goes down, that data will not show up in your personal site (via widgets or feeds or whatever).

  30. So I suppose, yes, if you’re ok with those problems, then sure, sharecrop and copy away. But in that case why bother with a personal site at all? Since when any of those services goes down, that data will not show up in your personal site (via widgets or feeds or whatever).

    I’m not using “widgets or feeds or whatever.” I’m using the service’s API to pull a local copy of the content, as well as any social context information that is available and I deem useful, and storing it in my local database.

    So, my site still works fine, even when the sites stop working, or go down (see, for example, all my ma.gnolia links that are still going strong on jeffcroft.com).

    Your way (making your site the source and pushing to services) has its pros (can still post when the service goes down) and its cons (can’t use existing tools to post to the services, must build you own). My way (making the service the source and pulling it via the APIs) has its pros and cons, too.

    Surely it’s up to an individual to decide which pros and cons resonate with them, and make their own choice? I like your way. I think it’s cool. I like what you’re doing. But for me, being able to use the existing tools is paramount. I don’t want to have to build my own system for posting when great ones already exist.

  31. Jeff writes:

    But for me, being able to use the existing tools is paramount. I don’t want to have to build my own system for posting when great ones already exist.

    Thanks for pointing this out.

    Yes, this is a huge deal, not something to be overlooked, and has been raised before – definitely aware of it.

    There is a *theoretical* (as yet unbuilt, but buildable) workaround (for those of us that want to publish from our own sites) to allow for the use of any of those existing tools, and that in short to use a 2nd private account on those sites as a “publishing conduit” to your own site. Specifically:

    1. create a 2nd private account for such tool use on 3rd party site (like Twitter)
    2. “hook up” your own site to pull from that 2nd private account like an inbox, deleting items from the conduit when they’ve been copied to your site.
    3. push from your site to your primary public account on 3rd party site(s) (as I and others do today).

    Then use any/all of your favorite site-specific tools/apps with your secondary private accounts etc. for the content publishing process/experience, while still owning your own data, including permalinks to originals on your own site.

    But as stated, this is an outline for an implementation – I don’t know anyone who has done this yet. And since it hasn’t been built yet, it is certainly a *harder* solution than what you’ve already gotten working, and that I very much sympathize with.

    Hopefully we’ll get there (perhaps sometime this year) with support for using private 3rd party accounts as publishing conduits. Even that outline seems like it takes more “setup” than should be required, so once built, I would use such an implementation as a working prototype to explore the actual UX issues, and then iterate from there. It’s by no means a “best” answer, just the next possible answer from which we can build/iterate upon.

  32. Do you go home and write down every conversation you had with everyone that day? Do you log all your phone calls? Unless you’re running a customer service biz, I highly doubt it.

    As Jeff has said so much, usually only the pertinent and relative to something life affecting/changing/promoting (outside of work) will be kept. It’s so easy to keep everything digitally, we’ve just done it without much thought.

    I’ve done with my work, clients and projects, mostly for portfolios and references, but some of it may need to go at some point…maybe at gunpoint! :)

  33. Your site should be the source and hub for everything you post online.

    This isn’t the future of the web, it’s what it used to be, back before there were any third-party sites that aggregated our stuff. Remember?

    When I have some spare time I’ll have to figure out how to move my digital photos from Flickr to my own site, but other than that I could care less about whether or not Twitter survives. Everyone has their own threshold for how much (and which) data they’re willing to lose. This applies to hard drives and websites alike. We all get to decide for ourselves, which is the power of this Internet thing we all love so much.

  34. Just imagine if Shakespeare, Newton, Beethoven or any other pre-Internet famous person had used Facebook and Twitter, and kept a regular blog. Imagine Einstein’s blog posts!! Or Dickens’ tweets as he tempted us with details of each new book…

    Now imagine all that data was kept intact. Imagine if anyone could go through Google and read any part of it.

    Wouldn’t that be cool?

    So it would be great if future generations could do the same with our data. Even unknown people’s data would be useful to historians wishing to know how folk lived in the past. (Roman blog posts anyone?)

    Now if only there was a way to download every post from a blog in one go. I once resorted to saving one blog post by post. A good job I did as the author then deleted several of the posts. But I’d like to read all his new blog posts offline, without having to click through each new entry one by one. Is there a way?

  35. Hello. This post is RIGHT ON. I found out about this site and also tantek’s only yesterday. Inspiring! Last year I came to the same conclusion about microblogging data ownership and started building my own tool in Drupal:
    http://stevenread.com/points
    A shameless plug but likely relevant. Haven’t figured out how to push the data other than RSS yet. I very much enjoy having my own methodology, even if temporarily a hermetic one.
    Steve

  36. What a rich thread! This conversation resonates with me because I’ve secretly wished for Twitter to decentralize (or be decentralized). This single point of failure for our tweets (real-time and archived) seems unsustainable and unwise. I applaud Tantek’s efforts to originate his tweets from his domain. I’m also keenly aware that he is doing it the hard way.

    After all his effort, Tantek’s tweets and the means of interacting with them are “owned” by Twitter. Tweeps can only really follow him at twitter.com/t. Is Twitter holding the reins to tight? Can they open up so Tantek can choose to tweet “for real” at tantek.com/tweet? Would decentralization hurt Twitter or help them grow? Would it further tweeting as a platform or fragment it beyond utility? I don’t presume to know the answers. But this conversation around owning your data is healthy. Next up, Facebook.

  37. Anton Peck:

    “At some point, we have to know when enough is enough, and start throwing out the old sh*t that doesn’t matter any more. At least so that guests can walk through the living room again.”

    I keep the data on my day-to-day system free from clutter. Ninety days is a good window for most files and emails. Everything else can go into deep storage.

    Increasingly smaller and affordable storage simplifies digital hoarding (archiving). You can be a pack-rat without ending up on a reality show.

    Maybe some day I’ll have the luxury of time to browse those archives. Forty years from now my opinion on what should be thrown out will likely be very different. A seemingly trivial email from 1993 might just reconnect me with an old friend or help me access otherwise lost memories in this fragile storage system I call a brain. Easier to have it and not want it, than to want it, and not have it.

  38. I’m a digital packrat myself. MJ said, “I mean, do you go so far as to find or build a tool to export out all pertinent SMS conversations too?” With SMS Backup+ for Android, yes, I do — they’re sent to Gmail as emails with the label “SMS.” I look forward to reading the fleeting thoughts I shared about major life events with my friends in the future (after making local copies with Thunderbird, of course).

    For years I saved AIM logs — AIM logs! — with no real intended pupose. Finally a couple years ago I made a 3′ x 5′ poster using wordle.net comprising (almost) all the AIM conversations I’d had with my best friend over about six years, filtered down to the 1,000 most common words (maybe more, I forget). It’s amazing to pore over it and see references to people and places and reminisce about the conversations and experiences we’d had with them.

    You never know what you might wish you had kept.

Comments are closed.