Spinn3r

Spinn3r provides high volumes of fresh data taping you in to worldwide conversations.

The company was founded by web crawler and RSS expert Kevin Burton after selling his previous company, Rojo, to Six Apart. Spinn3r was originally built to power Tailrank, a real-time blog analysis and topical relevance index which launched in early 2006. The architecture behind Spinn3r was influenced by two large projects. One was Rojo, which had a 500GB-1TB search index. The other was NewsMonster, one of the first and and most advanced client side aggregators with a high performance crawler integrated at its core. The service was launched in August 2007.

API Features

Simple to Use

We ship a standard reference client that integrates directly with your pipeline. If you're running Java, you can get up and running in less than an hour. If you're using another language, you only need to parse out a few XML files every few seconds, but the set up is still pretty straight forward.

Real Time Indexing

Spinn3r is tied into the blog ping network provided by Google, Blogger, Ping-o-Matic, WordPress, FeedBurner, and many other content management systems.

When a new blog post is published, we receive direct notification and add this weblog to the top of our queue. A new crawler is then sent out to index the RSS and HTML of this weblog and is immediately published to our customers.

Microformats

Spinn3r also supports the latest Microformats, including rel-tag and hAtom.

Tags have been heavily requested by our customer base and are included directly in our API.

Full HTML

Along with RSS we also index the full HTML of every blog post. We have an additional set of crawlers that index every URL we discover.

Weblog Ranking

Every weblog in our index is assigned a ranking based on number of inbound links from other weblogs in our index. We also have our Social Media Rank which provides more advanced ranking information including authority, actual rank position, etc.

This information is published within our API so you can make informed decisions about the content you index.

Content Extraction

A good deal of A-list bloggers use summary RSS which doesn't include the full content of their blog posts.

We're smart enough to adapt. Spinn3r has developed a content extraction technology that is smart enough to pinpoint the actual text on a page while excluding sidebar and navigation content.

Spam Prevention

We've developed complex spam prevention technology to prevent spam from being added to our index. Due to the nature of spam, we're never able to completely eliminate the problem, but we're able to minimize it significantly. At any point in time, we believe there's less than 1% spam within Spinn3r.

Ultra Reliable Infrastructure

Spinn3r is hosted in a world class data center. We have over 30 servers and store more than 21TB of content. Every piece of our infrastructure is redundant with additional hardware on standby in case of a failure.

Spinn3r is monitored 24/7 for any potential error in the system, and we back our service with a SLA.

Language Classification

Every post indexed by Spinn3r is classified by language. We compute this from the raw text of the post based on our own proprietary mathematical modeling. It's highly effective. With 200 bytes of content we can predict the correct language with 98% accuracy.

Technical Details