Bookmark and Share

Prevent Crawlers On Your Data Service

by KodefuGuru 29. June 2009 17:01

At ConvergeSC 2009, an attendee asked me to describe how to prevent crawlers from trolling your ADO.NET Data Service. I explained as best as I could, but I felt like a blog post might make it more clear.

To a bot, your data service looks like any other site on the web. Sure, it’s reading either Atom, POX, JSON, or some other bizarre format you’ve concocted, but it’s still data coming down through http and discoverable via links. There are ways to prevent a bot from crawling, but information available on the web that doesn’t require authentication can be crawled.

The first way to prevent your service from being crawled by a legitimate bot is to put a Robots.txt in the root of the site. Inside the file, put the following lines:

User-agent: *
Disallow: /

This locks down the entire site from being crawled by the bot. If your service coexists with a site you want to be crawled, you can change the Disallow option to /MyService.svc/. be sure to include the closing slash so other pages aren’t accidentally matched.

The conference attendee seemed to be concerned specifically about anchor tags and AJAX. If you’re using the OnClick event of the anchor tag, most spiders will not follow it. However, if the uri is in an href, a crawler will pick it up. Bing and Google will honor a rel attribute with the “nofollow” value to prevent indexing the page. However, Yahoo and Ask will still follow and index a link with that attribute.

If your service is publicly available, using the robots.txt is the way to go. If it’s not publicly available, the service should already be locked down through authentication techniques.

Bookmark and Share

ADO.NET Data Services CTP

by KodefuGuru 10. December 2007 16:11

ADO.NET Data Services is the new name for project "Astoria", and it's been built from the ground up using the lessons learned from the "Astoria" prototyping phases. You can download the new CTP as part of the ASP.NET 3.5 extensions.

Here's what you get as part of this CTP:

  • Support to create ADO.NET Data Services backed by:
    • A relational database by leveraging the Entity Framework
    • Any data source (file, web service, custom store, application logic layer, etc)
  • Serialization Formats:
    • Industry standard AtomPub serialization
    • JSON serialization
  • Business Logic & Validation
    • Insert custom business/validation logic into the Request/response processing pipeline
    • simple infrastructure to build custom access policy 
  • Access Control
    • Easily control the resources viewable from a data service
  • Simple HTTP interface
    • Any platform with an HTTP stack can easily consume a data service
    • Designed to leverage HTTP semantics and infrastructure already deployed at large
  • Client libraries:
    • .NET Framework
    • ASP.NET AJAX
  • Bookmark and Share

    Project Astoria September 2007 CTP Released

    by KodefuGuru 18. September 2007 22:22

    The Project Astoria September 2007 CTP has been released. It's a refresh of the May CTP, but it works with Visual Studio 2008 Beta 2.

    The Astoria September 2007 CTP is now available for download. This CTP is for the most part a refresh of the May CTP bits recompiled so they run with Visual Studio 2008/Entity Framework Beta 2. We did tweak here or there, but no new features were added.

    You can download it here.

    KodefuGuru.GetInfo()

    Chris Eargle
    LinkedIn Twitter Technorati Facebook

    Chris Eargle
    C# MVP, INETA Community Champion


    MVP - Visual C#

     

    INETA Community Champions
    Friend of RedGate
    Telerik .NET Ninja
    Community blogs & blog posts

    I am a #52er

    I have joined Anti-IF Campaign


    World Map

    Tag cloud

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

    © Copyright 2010