Twitter Wordle

I came across Ankit Ahuja’s blog today and saw his Twitter Wordle blog post. For those of you who don’t know, Wordle is a simple online application that lets you create word clouds from text files or websites, emphasizing words that are used more often. I wanted to make my own but couldn’t find the right resources from the site, so I ended up doing a Google search for a script that would allow me to download all of my tweets into a text file. I found a python script by Zach Seifts that I ran after downloading BeautifulSoup, a Python HTML/XML parser required to run the script. Nick helped me tweak it a little bit so that it worked. Here is the code that I used:

import time
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup

# Replace USERNAME with your twitter username
url = u'http://twitter.com/USERNAME?page=%s'
tweets_file = open('tweets', 'w')

for x in range(10*10000):
f = urlopen(url % x)
text = f.read()
text = text.replace("sc'+'ript", "script")
soup = BeautifulSoup(text)
f.close()
tweets = soup.findAll('span', {'class': 'entry-content'})
if len(tweets) == 0:
break
[tweets_file.write(t.renderContents() + '\n') for t in tweets]
# being nice to twitter's servers
time.sleep(5)
print "working...Page",x

tweets_file.close()

This exported all of my tweets into a text file that included all @ replies and HTML tags. Since Wordle would easily pick them up, I had to get rid of all HTML tags and @’s so that they wouldn’t dominate the word cloud. To do so, I used Emacs to create macros that automatically found and deleted them, leaving only raw text that I could plug into Wordle. The result is this glorious Twitter Wordle cloud à la @chloester:

chloester's Twitter Wordle

This entry was posted in Musings and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.