Spinn3r provides high volumes of fresh data taping you in to worldwide conversations.
The company was founded by web crawler and RSS expert Kevin Burton after selling his previous company, Rojo, to Six Apart. Spinn3r was originally built to power Tailrank, a real-time blog analysis and topical relevance index which launched in early 2006. The architecture behind Spinn3r was influenced by two large projects. One was Rojo, which had a 500GB-1TB search index. The other was NewsMonster, one of the first and and most advanced client side aggregators with a high performance crawler integrated at its core. The service was launched in August 2007.
Simple to Use
We ship a standard reference client that integrates directly with your pipeline. If you're running Java, you can get up and running in less than an hour. If you're using another language, you only need to parse out a few XML files every few seconds, but the set up is still pretty straight forward.
Real Time Indexing
Spinn3r is tied into the blog ping network provided by Google, Blogger, Ping-o-Matic, WordPress, FeedBurner, and many other content management systems.
When a new blog post is published, we receive direct notification and add this weblog to the top of our queue. A new crawler is then sent out to index the RSS and HTML of this weblog and is immediately published to our customers.
Tags have been heavily requested by our customer base and are included directly in our API.
Along with RSS we also index the full HTML of every blog post. We have an additional set of crawlers that index every URL we discover.
Every weblog in our index is assigned a ranking based on number of inbound links from other weblogs in our index. We also have our Social Media Rank which provides more advanced ranking information including authority, actual rank position, etc.
This information is published within our API so you can make informed decisions about the content you index.
A good deal of A-list bloggers use summary RSS which doesn't include the full content of their blog posts.
We're smart enough to adapt. Spinn3r has developed a content extraction technology that is smart enough to pinpoint the actual text on a page while excluding sidebar and navigation content.
We've developed complex spam prevention technology to prevent spam from being added to our index. Due to the nature of spam, we're never able to completely eliminate the problem, but we're able to minimize it significantly. At any point in time, we believe there's less than 1% spam within Spinn3r.
Ultra Reliable Infrastructure
Spinn3r is hosted in a world class data center. We have over 30 servers and store more than 21TB of content. Every piece of our infrastructure is redundant with additional hardware on standby in case of a failure.
Spinn3r is monitored 24/7 for any potential error in the system, and we back our service with a SLA.
Every post indexed by Spinn3r is classified by language. We compute this from the raw text of the post based on our own proprietary mathematical modeling. It's highly effective. With 200 bytes of content we can predict the correct language with 98% accuracy.
The best way to get started using Spinn3r is to start with our Open Source reference client. Implemented in Java and shipping with implementations for all of our APIs, you can be up and running with Spinn3r in less than an hour!
Besides allowing you to view real time statistics about Spinn3r, our admin console allows you to interact and debug the API without writing any code. Want to see if a source is indexed within Spinn3r? How about the most recent URLs it published? Not a problem.
All APIs are fully documented from the ground up. Field returned. The raw wire protocol, all the way up to the Javadoc for the API.
Every version of Spinn3r ships with full API documentation automatically generated after each build.
The Spinn3r API allows you to deal with Spinn3r directly without having to worry about our wire protocol, parallel downloads, XML parsing, or any of the heavy lifting required to write a crawler.
The Spinn3r reference client is Open Source, standards based and ultra high performance. We even have the source code hosted on Google Code which allows for quick feedback from the community.