I recently wrote an article for DevelopMentor’s Developments newsletter entitled Azure Storage. Read it at the DevelopMentor website here:
http://www.develop.com/content/newsletters/aprilazure
I’ve republished here for my readers. Enjoy!
Developments: Azure Storage
by Michael Kennedy
[Listen to this article as a podcast: Azure-Storage-Article-Kennedy.mp3]
October 27th 2008, Los Angeles CA – It’s 9 AM and Microsoft is hosting PDC (their most forward looking developer conference). Ray Ozzie and company are introducing Windows Azure: A new platform which is their first foray into the nascent world of large-scale utility computing. This scalable and reliable platform-as-a-service functionality is commonly referred to as “Cloud Computing” because it runs somewhere out there on the Internet.
Computing platforms that rival the reliability of the utility grids (e.g. electric and gas) which we daily take for granted have long been the stuff of dreams.
A few companies have realized this dream – Google and Amazon come to mind as a couple of the rare exceptions who have accomplished this goal. These companies’ web properties seem to handle unbounded amounts of traffic with zero down time. The data centers, redundancies, software engineering and operations know-how required to make this happen are exceedingly expensive. Some reports have Google spending over $2.4 billion (that’s 2,400 million dollars) on data centers in 2007 alone.
Prior to large-scale cloud computing efforts (circa 2005), most of us could only dream of such scalability and reliability. Today we have at least three highly reputable companies offering some kind of pay as you go cloud computing platform – Microsoft, Amazon, and Google.
Microsoft’s Azure is a new comer to the industry. But for .NET developers, it is not to be ignored. Azure allows you to use your existing skills to build essentially the same .NET applications you are familiar with and “deploy them to the cloud.”
These scalable, reliable, and geographically-replicated applications that run on Azure depend on data of course. Virtually all applications we write will be nothing without their underlying data. But if we simply use the tried and true methods of data storage such as the file system or a (single) database server our data is not all that scalable or reliable. Because we cannot have a scalable and reliable application without data, we need a new mechanism for storing and accessing data from our Azure applications.
Enter Azure Storage
Azure storage is the storage component of the Azure platform. It is actually three data services in one:
- Blob Storage – stores unstructured data essentially as a file, limited to 50 GB of data per blob.
- Table Storage – stores structured data that is somewhat like a database. For full database capabilities there is a high level feature called Sql Data Services (SDS).
- Queues – provides interprocess communication functionality between various web and worker roles in your hosted services or even applications running outside of Azure. Queues can pass small xml or binary messages – less than 64 kb per message.
In this article, we will cover just the basics of the three storage services of Windows Azure. I want to give you a sense for what it’s like to program against Azure Storage. At the base level all access to Azure Storage uses pure REST APIs. This means that you can access it from any HTTP enabled platform / language. For example, to download the blob data called “config.xml” in the container called “settings” for the Azure project “kennedy”
you would simply issue a GET to the Uri:
http://kennedy.blob.core.windows.net/settings/config.xml
To save data in a blob you do HTTP POSTs and PUTs in a similar fashion. However, real life is full of edge cases, error handling, security, and serialization which makes the pure HTTP model error prone. Thus, a sample library serves as the de facto .NET API to Azure Storage and ships with the Azure SDK. It is called StorageClient and can be found in default installs here:
C:\Program Files\Windows Azure SDK\v1.0\samples\StorageClient
We will examine working with each of the storage services from the perspective of the StorageClient library – but keep in mind that ultimately this library is a wrapper around a basic and open RESTful API.
Setting The Stage: The Sample Application
To explore Azure Storage I have written a simple photo sharing distributed application. These set of applications allow users to upload photos to a photo sharing site. These photos must be reviewed and approved by moderators of the site. Once approved, the general public can view and interact with the photos. For a concrete example, you could imagine writing a distributed version of the wallpaper sharing site InterfaceLift and
deploying it on Azure in this fashion.
You can download the sample application and follow along if you want to see the full source code and try it out yourself. Just be sure to start the Development Storage utility that comes with the Azure SDK before running the application.
Our distributed application consists of three parts.
- The Uploader: A Windows Forms application that lets contributors upload images to the site.
- The Reviewer: A Windows Forms application that lets moderators view image submissions and either approve or reject them.
- The Website: An ASP.NET website for viewing the photos – this is our public facing application.
A typical use case might be as follows (see diagram below).
- We upload a photo submission with our uploader application. The photo is uploaded to Azure blob storage and a message is sent via an Azure Message Queue to all available reviewer applications. Additional information about the submitter is associated with the photo in Azure table storage.
- The reviewer application watches the message queue for new messages. When one arrives, the photo is added to a list of pending submissions. The reviewer can either reject (delete) the submission or approve it – move it to a permanent blob storage location where it will be publicly viewable.
- Users visit our website and can view all approved photos. This list will change in real-time because it is driven by the reviewer application. The web application simply pulls all photos from the approved photo container in Azure blob storage.
Saving Data: Creating Azure Blobs
To save data to Azure Blob Storage, you must realize blob storage follows the ACE pattern (Authority, Container, Entity) to describe a blob. Authority is simply your Azure solution name. Containers are analogous to folders. And entities are analogous to files.
The listing below is essentially the code that runs when the uploader application uploads a pending image submission to blob storage.
Listing 1:
Sending Notifications: Azure Queuing
In addition to uploading the image to the pending images container in blob storage, we will send a message to a message queue to notify any active or future reviewers of the new submission.
Listing 2.
Saving (More) Data: Structured Storage and Azure Tables
Finally, for the upload application, we must also save some information about the contributor. In Azure Storage we have two reasonable places to store this information.
First, we could save this information directly in blob storage as meta-data associated with the blob itself. This is straightforward and easy. But there is a big limitation: information in this meta-data is not queryable. Suppose I want get all images associated with a single contributor. There is no way in Azure Blob Storage to say give me all the blobs with this filter on the meta-data. You would have to pull the properties of every blob and do the comparison client-side. That’s tantamount to filling a DataSet with “SELECT * FROM PendingImages” and it’s a bad idea.
Instead we will use the third type of Azure Storage: Azure Table Storage. Table Storage allows us to store data with up to 256 properties and query this data as if it were a database. It is exactly what we need for the contributor information. However, you must realize this is not a database. A better mental picture is a durable collection of Dictionary object (as in Dictionary from System.Collections.Generics) with querying built on top. I say this because there is no schema or relational constructs in Azure Table Storage. If you need that, then you’ll want Sql Data Services – a service on top of the core Azure platform.
The code to add an entry to Azure Table Storage does not fit into a single method as it’s driven through the interaction of several classes we must define. Azure Table Storage can be accessed via ADO.NET Data Services (client-side) and this is the method we will use.
First we’ll define a client-side schema for our entry by creating a class called Contributor which derives from the class TableStorageEntity (from the StorageClient library).
Listing 3.
Additionally we must define the tables and queries available to ADO.NET Data Services by created a class derived from TableStorageDataServiceContext and we do that below. We simply have one table called Contributors.
Listing 4.
With those two items in place, we can insert a “row” into Azure Table Storage as follows:
Listing 5.
As for querying Azure Table Storage that is very straight-forward. Because we are using ADO.NET Data Services, querying can be done via LINQ as in “from c in svc.Contributors select c.Name”. Ultimately ADO.NET Data Services is also built on a RESTful API so this translates to the underlying HTTP REST calls. Alternatively, you can use that REST API directly from .NET or any other platform.
Waiting on Queues: The Reviewer Application’s Code
Next, let’s look at how we monitor and pull messages from Azure Queuing. Ultimately we must poll the queue using a RESTful HTTP request. But the StorageClient resurfaces this to us as simple events.
Listing 6.
We won’t cover how we move a blob from the pendingImages blob container to the approvedImages blob container which happens when a reviewer approves an image.
You can look at the sample to see how that is done.
Ultimately It’s About the Website
Finally, let’s look at the web application that actually displays the approved images. We don’t do anything fancy such as paging or error handling that you’d see in a real application. But this will give you a good idea how to work with the blob data as a collection.
Here we’ll create a BlobStorage object and access the BlobContainer approvedImageContainer as we have been in most of the listings. But then instead of saving or reading blobs, we use the ListBlobs method to simply list all the approve images in that container. In order to show the images on our webpage, we just use the BlobProperties.Uri and directly reference that in our HTML. Our ASP.NET application does not touch the data. Rather the consumers (IE, Firefox, Chrome, etc) of the HTML pull the image data directly from blob storage as they would from any web server.
Listing 7.
Now you have a good idea of the concepts and motivation behind Azure Storage. You have seen some typical usages of each of the three storage features: blob storage, table storage, and queuing. Our samples made use of the sample storage API library called StorageClient. Underlying this library we saw that Azure Storage is entirely accessed via RESTful APIs.
Want to get started? Visit http://www.azure.com and choose “Try It Now” to register for a CTP Azure account. You’ll need to download the various SDK’s listed on that same page. They will install the Visual Studio projects required for working with Azure as well as the Development Storage and Development Fabric so you can develop and debug your applications before deploying them to the cloud.
If you want some intensive, expert-lead training on Azure and associated .NET 4.0 topics be sure to contact DevelopMentor. Or call 800.699.1932 to find out what classes we have available today.