Spinn3r

Need to index the blogosphere?

Spinn3r is a web service for indexing the blogosphere. We provide raw access to every blog post being published - in real time. We provide the data, and you can focus on building your application, mashup, or search engine. We find the weblogs and RSS, index their content, fetch the links, index their comments, etc.

How does it work?

Developers within your company call our API every few seconds for the freshest news content - syndicated to you in real time. With our Open Source reference client, you can be up and running in less than an hour. All with massive costs savings. Spinn3r can save you up to $45k per month compared to running your own crawler!

Maximum Throughput

Forty million (40M) blogs and counting, ten thousand (10K) Mainstream News sources and more than 30M Social Media sources; more than one million (1M) posts per hour. Not only do we index every A-list blog out there, we also watch every mainstream news source and Social Media

High Availability Architecture

Spinn3r is built on a fault tolerant infrastructure and is monitored 24/7 to ensure 99.9% availability. Every component in our system has three redundant copies with standby hardware already online in case of failure.

We're also hosted in a state of the art data center with redundant power and three standby generators.

Advanced Metadata

We can provide you with the top 10k or 10M weblogs, and can filter by language and by posts for a specific site - from a specific author. Tags, language, spam probability, rank, raw inbound link count, etag, HTTP status - all are included within our API.

We go above and beyond user specified metadata. Often some of the metadata can be incorrect. Language is a good example. We use a mathematical language modeling technique which has been in production for more than two years now. With only 200 bytes of text, we can analyze the language of a post with 98% accuracy.

Indexed in Real Time

Spinn3r is updated in real time. When we receive a ping from any of the major ping providers we go off and collect their content, index it, and make it available via our API.

We don't stop there. We have a hybrid indexing technology that enables us to launch our crawlers every thirty minutes against sites that don't provide pings.

When we see the content, so do you. We push the content to your application as soon as we see it so you are always current.

Full Crawler

Spinn3r goes above and beyond indexing just raw RSS - we index the full HTML of a post. After that, we extract the body of the post, excluding sidebar and chrome, and provide this content under a content extract API.

Open Source and Standards Based

The Spinn3r reference client is fully Open Source and standards based.

Our Java API is fully documented and can be integrated directly with your application as a standalone daemon indexing new content and saving it to disk. This allows for easy integration with Perl, Python, Javascript, or any language that has an XML parser.

Evaluation

What are you waiting for? You can be using Spinn3r in a few minutes. Request an evaluation and we'll give you a full one week trial period to play with Spinn3r. We also have a thirty day money back refund program. If, for whatever reason, Spinn3r doesn't work out for you, we'll give you a full refund on your first month of service.

So request an evaluation today and we'll have you up and running in no time!