From Del.icio.us to Pinboard.in with Python 4 comments

The sad news that Yahoo plans to shut down del.icio.us reached me this week (although theres still hope). I use del.icio.us pretty much every day and was a little traumatized upon hearing this. Once I had finished wailing and gnashing my teeth I set out looking for somewhere to go.

There are many bookmarking sites/services out there, but I fear change, and pinboard.in seemed like the closest thing to a plain replacement. It even supports the same API as del.icio.us. Theres a small charge for signing up, but no recurring fee, so I broke out the credit card and joined up.

The next step was to figure out how to migrate my bookmarks. del.icio.us provides a export to HTML feature in its settings area, but a quick look at the export revealed some data was missing (mostly extended descriptions). Rabid googling revealed a lesser known XML export mechanism. To use it visit https://api.del.icio.us/v1/posts/all, enter your username and password and save the resulting XML file.

Now to get my bookmarks into pinboard.in. I broke out my trusty text editor and battered together the script below which works just fine, a few hours later all my bookmarks are in pinboard.in, their bookmarklets are installed in my browser, and I'm loving their read later features. Sean is a happy geek again.

You can download my migration script. To use it :

python delmigrate.py backup.xml username password

Heres the source for the curious.

from xml.dom import minidom
import sys

import urllib
import urllib2
import time

user = sys.argv[2]

password = sys.argv[3]

endpoint = "https://api.pinboard.in"

url = "/v1/posts/add?"

#open the xml file to import from and parse it
f = open(sys.argv[1], "r")

doc = minidom.parse(f).documentElement

#keep count of how many urls have been imported
urlcount = 0

count = 0
ellength = len(doc.childNodes)

failcount = 0
while count < ellength:
    e = doc.childNodes[count]

    if e.nodeType == e.ELEMENT_NODE:
        print "import url %s" % urlcount

        #get the attributes from the xml
        href = e.getAttribute("href")
        description = e.getAttribute("description")
        extended = e.getAttribute("extended")
        tags = e.getAttribute("tag")

        dt = e.getAttribute("time")
        rargs = dict(url=href, description=description, extended=extended,
                        tags=tags, dt=dt)
        shared = e.getAttribute("shared")

        if shared.strip() == 'no':
            rargs['shared'] = 'no'

        #convert them to unicode
        rargs = dict([k, v.encode('utf-8')] for k, v in rargs.items())

        print rargs
        #build the request to send
        #set up http auth for pinboard.in
        #doing this for every request may seem wasteful, but urllib2
        #seems to forget the auth details after a half dozen requests
        # if you dont
        password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
        password_manager.add_password(None, endpoint, user, password)

        auth_handler = urllib2.HTTPBasicAuthHandler(password_manager)
        opener = urllib2.build_opener(auth_handler)

        urllib2.install_opener(opener)


        request = urllib2.Request(endpoint + url + urllib.urlencode(rargs))

        #set the user agent
        request.add_header('User-Agent','SeansDeliciousMigrater')
        try:

            r = opener.open(request)
            #send the request and read the response
            response = minidom.parse(r).documentElement.getAttribute("code")

        except Exception, e:
            response = str(e)

        #if we get an invalid response, abort, proabbly throttled
        if response !="done":
            failcount += 1

            print "Failure: Invalid response: %s" % response
            if failcount > 4:

                print "Aborting: Invalid response %s"
                break
            else:
                print "waiting for 30 seconds and retrying"

                time.sleep(30)
        else:
            failcount = 0

            count += 1
            #put in a delay between requests to reduce odds of throttling
            time.sleep(1)

            urlcount += 1
    else:
        count += 1

print "%s urls imported" % urlcount

Comments

Aside from the fun in using python to do this, why not simply follow the instructions at http://pinboard.in/howto#import ? No scripting needed! :-)

Tim 17:12 Monday the 20th of December 2010 #

Hi Tim, take a look at the output from the html backup vs the XML backup, not as much information is preserved and carried accross.

Sean O'Donnell 02:21 Tuesday the 21st of December 2010 #

Ah, cool - that answers that :)

Tim 12:26 Tuesday the 21st of December 2010 #

Thanks for this post! I've got a question: do you know how to make it works with Python 3 ? With approximately the same code (and poster modified with 2to3 and a few fixes for Python3) I've got this error: "TypeError: 'generator' does not support the buffer interface", on the line "urllib.request.urlopen(request).read()", and I didn't found a solution... Regards, Sam.

Samuel 04:02 Sunday the 17th of November 2013 #

New Comment

required

required (not published)

optional