by KodefuGuru
29. June 2009 17:01
At ConvergeSC 2009, an attendee asked me to describe how to prevent crawlers from trolling your ADO.NET Data Service. I explained as best as I could, but I felt like a blog post might make it more clear.
To a bot, your data service looks like any other site on the web. Sure, it’s reading either Atom, POX, JSON, or some other bizarre format you’ve concocted, but it’s still data coming down through http and discoverable via links. There are ways to prevent a bot from crawling, but information available on the web that doesn’t require authentication can be crawled.
The first way to prevent your service from being crawled by a legitimate bot is to put a Robots.txt in the root of the site. Inside the file, put the following lines:
User-agent: *
Disallow: /
This locks down the entire site from being crawled by the bot. If your service coexists with a site you want to be crawled, you can change the Disallow option to /MyService.svc/. be sure to include the closing slash so other pages aren’t accidentally matched.
The conference attendee seemed to be concerned specifically about anchor tags and AJAX. If you’re using the OnClick event of the anchor tag, most spiders will not follow it. However, if the uri is in an href, a crawler will pick it up. Bing and Google will honor a rel attribute with the “nofollow” value to prevent indexing the page. However, Yahoo and Ask will still follow and index a link with that attribute.
If your service is publicly available, using the robots.txt is the way to go. If it’s not publicly available, the service should already be locked down through authentication techniques.
by KodefuGuru
10. December 2007 16:11
ADO.NET Data Services is the new name for project "Astoria", and it's been built from the ground up using the lessons learned from the "Astoria" prototyping phases. You can download the new CTP as part of the ASP.NET 3.5 extensions.
Here's what you get as part of this CTP:
Support to create ADO.NET Data Services backed by:
- A relational database by leveraging the Entity Framework.
- Any data source (file, web service, custom store, application logic layer, etc)
Serialization Formats:
- Industry standard AtomPub serialization
- JSON serialization
Business Logic & Validation
- Insert custom business/validation logic into the Request/response processing pipeline
- simple infrastructure to build custom access policy
Access Control
- Easily control the resources viewable from a data service
Simple HTTP interface
- Any platform with an HTTP stack can easily consume a data service
- Designed to leverage HTTP semantics and infrastructure already deployed at large
Client libraries:
- .NET Framework
- ASP.NET AJAX
by KodefuGuru
18. September 2007 22:22
The Project Astoria September 2007 CTP has been released. It's a refresh of the May CTP, but it works with Visual Studio 2008 Beta 2.
The Astoria September 2007 CTP is now available for download. This CTP is for the most part a refresh of the May CTP bits recompiled so they run with Visual Studio 2008/Entity Framework Beta 2. We did tweak here or there, but no new features were added.
You can download it here.