Windows Azure Blobs: Improved HTTP Headers for Resume on Download and a Change in If-Match Conditions

September 15, 2011, 1:24 pm

≫ Next: Windows Azure Tables: Introducing Upsert and Query Projection

In the new 2011-08-18 version of the Windows Azure Blob service, we have made some changes to improve browser download and streaming for some media players. We also provided an extension to Blob Service settings to allow anonymous and un-versioned requests to benefit from these changes. The motivation to provide these features are:

Allow browsers to resume download if interrupted. Some browsers require the following:
- ETag returned as part of the response must be quoted to conform to the HTTP spec.
- Return Accept-Ranges in the response header to indicate that range requests are accepted by the service. Though this is not mandatory according to the spec, some browsers still require this.
Support more range formats for range requests. Certain media players request a range of format “Range:bytes=0-“. The Windows Azure Blob service use to ignore this header format. Now, with the new 2011-08-18 version, we will return the entire blob in the format of a range response. This allows such media players to resume playing as soon as response packets arrive rather than waiting for the entire blob to download.
Allow un-versioned requests to be processed using semantics of 2011-08-18 version. Since the above two changes impact un-versioned browser/media player requests and the changes made are versioned, we need to allow such requests to take advantage of the changes made. To allow un-versioned requests to be processed using semantics of 2011-08-18 version, we now take an extra property in “Set Blob Service Properties”, which makes it possible to define the default version for the blob service to use for un-versioned requests to your account.

In addition, another change for the blob service is that we now return “Pre-Condition Failure” (412) for PUT if you do a PUT request with conditional If-Match and the blob does not exist. Previously, we would have recreated this blob. This change is effective for all versions starting with 2009-09-19 version.

We will now cover the changes in more detail.

Header Related Changes

In this section we will cover the header related changes that we have done in the Windows Azure Blob service for 2011-08-18 version.

Quoted ETags

ETags returned in response headers for all APIs are now quoted to conform to RFC 2616 specification. ETags returned in the listing operations as part of XML in response body will remain as is. As mentioned above, this allows browsers to resume download using the ETag. Unquoted ETags were ignored by certain browsers, while all standards-compliant browsers honor quoted ETags. ETags are required by a browser when using a conditional Range GET to resume a download on the blob and it needs to ensure that the partial content it is requesting has not been modified.

With version 2011-08-18, we now support this header format.

Sample GET Blob ResponseHTTP/1.1 200 OK

x-ms-blob-type: BlockBlob
x-ms-lease-status: unlocked
x-ms-meta-m1: v1
x-ms-meta-m2: v2
Content-Length: 11
Content-Type: text/plain; charset=UTF-8
Date: Sun, 25 Sep 2011 22:49:18 GMT
ETag: “0x8CB171DBEAD6A6B”
Last-Modified: Sun, 25 Sep 2011 22:48:29 GMT
x-ms-version: 2011-08-18
Accept-Ranges: bytes
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0

Return Accept-Ranges Header

“Get Blob” requests now return “Accept-Ranges” in response. Though clients should not expect this header to infer that range requests are allowed or not, certain browsers still expect it. For those browsers, if this header is missing, an interrupted download will resume from the beginning rather than resuming from where it was interrupted.

With version 2011-08-18, we now support this header format. The sample REST request above also shows the presence of this new header.

Additional Range Format

Certain media players issue a range request for the entire blob using the format:

Range: bytes=0-

It expects a status code of 206 (i.e. Partial Content) with entire content being returned and Content-Range header set to:

Content-Range: bytes 0-10240779/10240780 (assuming the blob was of length 10240780).

In receiving the Content-Range, the media player would then start streaming the blob rather than waiting for the entire blob to be downloaded first.

With version 2011-08-18, we now support this header format.

Sample Range GET Blob Request

GET http://cohowinery.blob.core.windows.net/videos/build.wmv?timeout=60 HTTP/1.1
User-Agent: WA-Storage/6.0.6002.18312
Range: bytes=100-Host:10.200.30.18

Sample Range GET Blob Repsonse

HTTP/1.1 206 Partial Content
Content-Length: 1048476
Content-Type: application/octet-stream
Content-Range: bytes 100-1048575/1048576
Last-Modified: Thu, 08 Sep 2011 23:39:47 GMT
Accept-Ranges: bytes
ETag: "0x8CE4217E34E31F0"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 387a38ae-fa0c-4fe2-8e60-d6afa2373e56
x-ms-version: 2011-08-18
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Date: Thu, 08 Sep 2011 23:39:46 GMT

<content …>

If-Match Condition on Non-Existent Blob

PUT Blob API with If-Match precondition set to a value other than “*” would have succeeded before, even when the blob did not exist. This should not have succeeded, since it violates the HTTP specification. Therefore, we changed this to return “Precondition failed” (i.e. HTTP Status 412). The breaking change was done to prevent users from inadvertently recreating a deleted blob. This should not impact your service since:

If the application really intendeds to create the blob, then it will send a PUT request without a ETag, since providing the ETag shows that the caller expects the blob to exist.
If an application sends an ETag, then the intent to just update is made explicit – so if a blob does not exist, the request must fail.
The current behavior is unexpected since we recreate the blob when the intent was to just update it. Because of these semantics, no application should be expecting the blob to be recreated.

We have made this change effective in all versions starting with 2009-09-19.

Blob Service Settings and DefaultServiceVersion

Before we get into the changes to the blob service settings, let us understand how versioning works in Windows Azure Storage. The Windows Azure Storage Service accepts the version that should be used to process the request in a “x-ms-version” header. The list of supported versions is explained here. However, in certain cases, versions can be omitted:

Anonymous browser requests do not send a version header since there is no way to add this custom header
The PDC 2008 version of the request also did not require a version header

When requests do not have the version header associated, we call it un-versioned requests. However, the service still needs to associate a version with these requests and the rule was as follows:

If a request is versioned, then we use the version specified in the request header
If the version header is not set, then if the ACL for the blob container was set using version 2009-09-19 or later, we will use the 2009-09-19 version to execute the API
Otherwise we will use the PDC 2008 version of the API (which will be deprecated in the future)

Because of the above rules, the above changes such as ETag, Accept-Ranges header etc, done for un-versioned requests would not have taken effect for the intended scenarios (e.g., anonymous requests). Hence, we now allow a DefaultServiceVersion property that can be set for the blob service for your storage account. This is used only for un-versioned requests and the new version precedence rules for requests are:

If a request is versioned, then we use the version specified in the request header
If a version header is not present and the user has set DefaultServiceVersion in “Set Blob Service Properties”to a valid version (2009-09-19 or 2011-08-18)”, then we will use that default version for this request.
If the version header is not set (explicitly or via the DefaultServiceVersion property), then if the ACL for the container was set using version 2009-09-19 or later, we will use 2009-09-19 version to execute the API
Otherwise, we will use the PDC 2008 version of the API, which will be deprecated in the future.

For users who are targeting their blobs to be downloaded via browsers or media players, we recommend setting this default service version to 2011-08-18 so that the improvements can take effect. We also recommend setting this for your service, since we will be deprecating the PDC 2008 version at some point in the future.

Set DefaultServiceVersion property

The existing “Set Blob Service Properties” has been extended in 2011-08-18 version to include a new DefaultServiceVersion property. This is an optional property and accepted only if it is set to a valid version value. It only applies to the Windows Azure Blob service. The possible values are:

2009-09-19
2011-08-18

When set, this version is used for all un-versioned requests. Please note that the “Set Blob Service Properties” request to set DefaultServiceVersion must be made with version 2011-08-18, regardless of which version you are setting DefaultServiceVersion to. An example REST request looks like the following:

Sample REST Request

PUT http://cohowinery.blob.core.windows.net/?restype=service&comp=properties HTTP/1.1
x-ms-version: 2011-08-18
x-ms-date: Sat, 10 Sep 2011 04:28:19 GMT
Authorization: SharedKey cohowinery:Z1lTLDwtq5o1UYQluucdsXk6/iB7YxEu0m6VofAEkUE=
Host: cohowinery.blob.core.windows.net
Content-Length: 200
<?xml version="1.0" encoding="utf-8"?>
<StorageServiceProperties>
    <Logging>
        <Version>1.0</Version>
        <Delete>true</Delete>
        <Read>false</Read>
        <Write>true</Write>
        <RetentionPolicy>
            <Enabled>true</Enabled>
            <Days>7</Days>
        </RetentionPolicy>
    </Logging>
    <Metrics>
        <Version>1.0</Version>
        <Enabled>true</Enabled>
        <IncludeAPIs>false</IncludeAPIs>
        <RetentionPolicy>
            <Enabled>true</Enabled>
            <Days>7</Days>
        </RetentionPolicy>
    </Metrics>
    <DefaultServiceVersion>2011-08-18</DefaultServiceVersion>
</StorageServiceProperties>

Get Storage Service Properties

Using the 2011-08-18 version, this API will now return the DefaultServiceVersion if it has been set.

Sample REST Request

GET http://cohowinery.blob.core.windows.net/?restype=service&comp=properties HTTP/1.1
x-ms-version: 2011-08-18
x-ms-date: Sat, 10 Sep 2011 04:28:19 GMT
Authorization: SharedKey cohowinery:Z1lTLDwtq5o1UYQluucdsXk6/iB7YxEu0m6VofAEkUE=
Host: cohowinery.blob.core.windows.net

Sample Library and Usage

We provide sample code that can be used to set these service settings. It is very similar to the example provided in the Analytics blog but it uses the new DefaultServiceVersion and we have renamed some classes in which we have used “ServiceSettings” in place of “AnalyticsSettings” in class names and method names.

Class SettingsSerializerHelper handles serialization and deserialization of settings.
Class ServiceSettings represents the service settings. It also contains DefaultServiceVersion property which should be set only for blob service. Windows Azure Queue and Table service will return a 400 HTTP (“Bad Request”) status error code.
Class ServiceSettingsExtension implements extension methods that can be used to set/get service settings.

The way to use the code is still the same except for the new DefaultServiceVersion property:

CloudStorageAccount account = CloudStorageAccount.Parse(ConnectionString);
CloudBlobClient blobClient = account.CreateCloudBlobClient();
ServiceSettings settings = new ServiceSettings()
        {
            LogType = LoggingLevel.Delete | LoggingLevel.Read | LoggingLevel.Write,
            IsLogRetentionPolicyEnabled = false,
            LogRetentionInDays = 7,
            IsMetricsRetentionPolicyEnabled = true,
            MetricsRetentionInDays = 3,
            MetricsType = MetricsType.All,
            DefaultServiceVersion = "2011-08-18"
        };

blobClient.SetServiceSettings(settings);

Here are the rest of the utility classes.

using System;
using System.Globalization;
using System.IO;
using System.Net;
using System.Text;
using System.Xml;
using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.StorageClient;

/// <summary>
/// This class handles the serialization and deserialization of service settings
/// </summary>
public static class SettingsSerializerHelper
{
    private const string RootPropertiesElementName = "StorageServiceProperties";
    private const string VersionElementName = "Version";
    private const string RetentionPolicyElementName = "RetentionPolicy";
    private const string RetentionPolicyEnabledElementName = "Enabled";
    private const string RetentionPolicyDaysElementName = "Days";
    private const string DefaultServiceVersionElementName = "DefaultServiceVersion";

    private const string LoggingElementName = "Logging";
    private const string ApiTypeDeleteElementName = "Delete";
    private const string ApiTypeReadElementName = "Read";
    private const string ApiTypeWriteElementName = "Write";

    private const string MetricsElementName = "Metrics";
    private const string IncludeApiSummaryElementName = "IncludeAPIs";
    private const string MetricsEnabledElementName = "Enabled";

    private const int MaximumRetentionDays = 365;

    /// <summary>
    /// Reads the settings provided from stream
    /// </summary>
    /// <param name="xmlReader"></param>
    /// <returns></returns>
    internal static ServiceSettings DeserializeServiceSettings(XmlReader xmlReader)
    {
        // Read the root and check if it is empty or invalid
        xmlReader.Read();
        xmlReader.ReadStartElement(SettingsSerializerHelper.RootPropertiesElementName);

        ServiceSettings settings = new ServiceSettings();

        while (true)
        {
            if (xmlReader.IsStartElement(SettingsSerializerHelper.LoggingElementName))
            {
                DeserializeLoggingElement(xmlReader, settings);
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.MetricsElementName))
            {
                DeserializeMetricsElement(xmlReader, settings);
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.DefaultServiceVersionElementName))
            {
                settings.DefaultServiceVersion = xmlReader.ReadElementString(SettingsSerializerHelper.DefaultServiceVersionElementName);
            }
            else
            {
                break;
            }
        }

        xmlReader.ReadEndElement();

        return settings;
    }


    /// <summary>
    /// Write the settings provided to stream
    /// </summary>
    /// <param name="xmlWriter"></param>
    /// <param name="settings"></param>
    /// <returns></returns>
    internal static void SerializeServiceSettings(XmlWriter xmlWriter, ServiceSettings settings)
    {
        xmlWriter.WriteStartDocument();
        xmlWriter.WriteStartElement(SettingsSerializerHelper.RootPropertiesElementName);

        //LOGGING STARTS HERE
        xmlWriter.WriteStartElement(SettingsSerializerHelper.LoggingElementName);

        xmlWriter.WriteStartElement(SettingsSerializerHelper.VersionElementName);
        xmlWriter.WriteValue(settings.LogVersion);
        xmlWriter.WriteEndElement();

        bool isReadEnabled = (settings.LogType & LoggingLevel.Read) != LoggingLevel.None;
        xmlWriter.WriteStartElement(SettingsSerializerHelper.ApiTypeReadElementName);
        xmlWriter.WriteValue(isReadEnabled);
        xmlWriter.WriteEndElement();

        bool isWriteEnabled = (settings.LogType & LoggingLevel.Write) != LoggingLevel.None;
        xmlWriter.WriteStartElement(SettingsSerializerHelper.ApiTypeWriteElementName);
        xmlWriter.WriteValue(isWriteEnabled);
        xmlWriter.WriteEndElement();

        bool isDeleteEnabled = (settings.LogType & LoggingLevel.Delete) != LoggingLevel.None;
        xmlWriter.WriteStartElement(SettingsSerializerHelper.ApiTypeDeleteElementName);
        xmlWriter.WriteValue(isDeleteEnabled);
        xmlWriter.WriteEndElement();

        SerializeRetentionPolicy(xmlWriter, settings.IsLogRetentionPolicyEnabled, settings.LogRetentionInDays);
        xmlWriter.WriteEndElement(); // logging element

        //METRICS STARTS HERE
        xmlWriter.WriteStartElement(SettingsSerializerHelper.MetricsElementName);

        xmlWriter.WriteStartElement(SettingsSerializerHelper.VersionElementName);
        xmlWriter.WriteValue(settings.MetricsVersion);
        xmlWriter.WriteEndElement();

        bool isServiceSummaryEnabled = (settings.MetricsType & MetricsType.ServiceSummary) != MetricsType.None;
        xmlWriter.WriteStartElement(SettingsSerializerHelper.MetricsEnabledElementName);
        xmlWriter.WriteValue(isServiceSummaryEnabled);
        xmlWriter.WriteEndElement();

        if (isServiceSummaryEnabled)
        {
            bool isApiSummaryEnabled = (settings.MetricsType & MetricsType.ApiSummary) != MetricsType.None;
            xmlWriter.WriteStartElement(SettingsSerializerHelper.IncludeApiSummaryElementName);
            xmlWriter.WriteValue(isApiSummaryEnabled);
            xmlWriter.WriteEndElement();
        }

        SerializeRetentionPolicy(
            xmlWriter,
            settings.IsMetricsRetentionPolicyEnabled,
            settings.MetricsRetentionInDays);
        xmlWriter.WriteEndElement(); // metrics 

        // Save default service version if provided. NOTE - this should be set only for blob service
        if (!string.IsNullOrEmpty(settings.DefaultServiceVersion))
        {
            xmlWriter.WriteStartElement(SettingsSerializerHelper.DefaultServiceVersionElementName);
            xmlWriter.WriteValue(settings.DefaultServiceVersion);
            xmlWriter.WriteEndElement();
        }

        xmlWriter.WriteEndElement(); // root element
        xmlWriter.WriteEndDocument();
    }

    private static void SerializeRetentionPolicy(XmlWriter xmlWriter, bool isRetentionEnabled, int days)
    {
        xmlWriter.WriteStartElement(SettingsSerializerHelper.RetentionPolicyElementName);

        xmlWriter.WriteStartElement(SettingsSerializerHelper.RetentionPolicyEnabledElementName);
        xmlWriter.WriteValue(isRetentionEnabled);
        xmlWriter.WriteEndElement();

        if (isRetentionEnabled)
        {
            xmlWriter.WriteStartElement(SettingsSerializerHelper.RetentionPolicyDaysElementName);
            xmlWriter.WriteValue(days);
            xmlWriter.WriteEndElement();
        }

        xmlWriter.WriteEndElement(); // Retention policy for logs
    }

    /// <summary>
    /// Reads the logging element and fills in the values in settings instance
    /// </summary>
    /// <param name="xmlReader"></param>
    /// <param name="settings"></param>
    private static void DeserializeLoggingElement(
        XmlReader xmlReader,
        ServiceSettings settings)
    {
        // Read logging element
        xmlReader.ReadStartElement(SettingsSerializerHelper.LoggingElementName);

        while (true)
        {
            if (xmlReader.IsStartElement(SettingsSerializerHelper.VersionElementName))
            {
                settings.LogVersion = xmlReader.ReadElementString(SettingsSerializerHelper.VersionElementName);
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.ApiTypeReadElementName))
            {
                if (DeserializeBooleanElementValue(
                    xmlReader,
                    SettingsSerializerHelper.ApiTypeReadElementName))
                {
                    settings.LogType = settings.LogType | LoggingLevel.Read;
                }
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.ApiTypeWriteElementName))
            {
                if (DeserializeBooleanElementValue(
                    xmlReader,
                    SettingsSerializerHelper.ApiTypeWriteElementName))
                {
                    settings.LogType = settings.LogType | LoggingLevel.Write;
                }
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.ApiTypeDeleteElementName))
            {
                if (DeserializeBooleanElementValue(
                    xmlReader,
                    SettingsSerializerHelper.ApiTypeDeleteElementName))
                {
                    settings.LogType = settings.LogType | LoggingLevel.Delete;
                }
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.RetentionPolicyElementName))
            {
                // read retention policy for logging
                bool isRetentionEnabled = false;
                int retentionDays = 0;
                DeserializeRetentionPolicy(xmlReader, ref isRetentionEnabled, ref retentionDays);
                settings.IsLogRetentionPolicyEnabled = isRetentionEnabled;
                settings.LogRetentionInDays = retentionDays;
            }
            else
            {
                break;
            }
        }

        xmlReader.ReadEndElement();// end Logging element
    }

    /// <summary>
    /// Reads the metrics element and fills in the values in settings instance
    /// </summary>
    /// <param name="xmlReader"></param>
    /// <param name="settings"></param>
    private static void DeserializeMetricsElement(
        XmlReader xmlReader,
        ServiceSettings settings)
    {
        bool includeAPIs = false;

        // read the next element - it should be metrics. 
        xmlReader.ReadStartElement(SettingsSerializerHelper.MetricsElementName);

        while (true)
        {
            if (xmlReader.IsStartElement(SettingsSerializerHelper.VersionElementName))
            {
                settings.MetricsVersion = xmlReader.ReadElementString(SettingsSerializerHelper.VersionElementName);

            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.MetricsEnabledElementName))
            {
                if (DeserializeBooleanElementValue(
                    xmlReader,
                    SettingsSerializerHelper.MetricsEnabledElementName))
                {
                    // only if metrics is enabled will we read include API
                    settings.MetricsType = settings.MetricsType | MetricsType.ServiceSummary;
                }
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.IncludeApiSummaryElementName))
            {
                if (DeserializeBooleanElementValue(
                    xmlReader,
                    SettingsSerializerHelper.IncludeApiSummaryElementName))
                {
                    includeAPIs = true;
                }
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.RetentionPolicyElementName))
            {
                // read retention policy for metrics
                bool isRetentionEnabled = false;
                int retentionDays = 0;
                DeserializeRetentionPolicy(xmlReader, ref isRetentionEnabled, ref retentionDays);
                settings.IsMetricsRetentionPolicyEnabled = isRetentionEnabled;
                settings.MetricsRetentionInDays = retentionDays;
            }
            else
            {
                break;
            }
        }

        if ((settings.MetricsType & MetricsType.ServiceSummary) != MetricsType.None)
        {
            // If Metrics is enabled, IncludeAPIs must be included.
            if (includeAPIs)
            {
                settings.MetricsType = settings.MetricsType | MetricsType.ApiSummary;
            }
        }

        xmlReader.ReadEndElement();// end metrics element
    }


    /// <summary>
    /// Reads the retention policy in logging and metrics elements 
    /// and fills in the values in settings instance.
    /// </summary>
    /// <param name="xmlReader"></param>
    /// <param name="isRetentionEnabled"></param>
    /// <param name="retentionDays"></param>
    private static void DeserializeRetentionPolicy(
        XmlReader xmlReader,
        ref bool isRetentionEnabled,
        ref int retentionDays)
    {
        xmlReader.ReadStartElement(SettingsSerializerHelper.RetentionPolicyElementName);

        while (true)
        {
            if (xmlReader.IsStartElement(SettingsSerializerHelper.RetentionPolicyEnabledElementName))
            {
                isRetentionEnabled = DeserializeBooleanElementValue(
                    xmlReader,
                    SettingsSerializerHelper.RetentionPolicyEnabledElementName);
            }
            else if (xmlReader.IsStartElement(SettingsSerializerHelper.RetentionPolicyDaysElementName))
            {
                string intValue = xmlReader.ReadElementString(
                    SettingsSerializerHelper.RetentionPolicyDaysElementName);
                retentionDays = int.Parse(intValue);
            }
            else
            {
                break;
            }
        }

        xmlReader.ReadEndElement(); // end reading retention policy
    }

    /// <summary>
    /// Read a boolean value for xml element
    /// </summary>
    /// <param name="xmlReader"></param>
    /// <param name="elementToRead"></param>
    /// <returns></returns>
    private static bool DeserializeBooleanElementValue(
        XmlReader xmlReader,
        string elementToRead)
    {
        string boolValue = xmlReader.ReadElementString(elementToRead);
        return bool.Parse(boolValue);
    }
}

[Flags]
public enum LoggingLevel
{
    None = 0,
    Delete = 2,
    Write = 4,
    Read = 8,
}

[Flags]
public enum MetricsType
{
    None = 0x0,
    ServiceSummary = 0x1,
    ApiSummary = 0x2,
    All = ServiceSummary | ApiSummary,
}

/// <summary>
/// The service settings that can set/get
/// </summary>
public class ServiceSettings
{
    public static string Version = "1.0";

    public ServiceSettings()
    {
        this.LogType = LoggingLevel.None;
        this.LogVersion = ServiceSettings.Version;
        this.IsLogRetentionPolicyEnabled = false;
        this.LogRetentionInDays = 0;

        this.MetricsType = MetricsType.None;
        this.MetricsVersion = ServiceSettings.Version;
        this.IsMetricsRetentionPolicyEnabled = false;
        this.MetricsRetentionInDays = 0;
    }

    /// <summary>
    /// The default service version to use for un-versioned requests
    /// NOTE: This can be set only for blob service. 
    /// Possible values: 2009-09-19 or 2011-08-18. 
    /// </summary>
    public string DefaultServiceVersion { get; set; }

    /// <summary>
    /// The type of logs subscribed for
    /// </summary>
    public LoggingLevel LogType { get; set; }

    /// <summary>
    /// The version of the logs
    /// </summary>
    public string LogVersion { get; set; }

    /// <summary>
    /// Flag indicating if retention policy is set for logs in $logs
    /// </summary>
    public bool IsLogRetentionPolicyEnabled { get; set; }

    /// <summary>
    /// The number of days to retain logs for under $logs container
    /// </summary>
    public int LogRetentionInDays { get; set; }

    /// <summary>
    /// The metrics version
    /// </summary>
    public string MetricsVersion { get; set; }

    /// <summary>
    /// A flag indicating if retention policy is enabled for metrics
    /// </summary>
    public bool IsMetricsRetentionPolicyEnabled { get; set; }

    /// <summary>
    /// The number of days to retain metrics data
    /// </summary>
    public int MetricsRetentionInDays { get; set; }

    private MetricsType metricsType = MetricsType.None;

    /// <summary>
    /// The type of metrics subscribed for
    /// </summary>
    public MetricsType MetricsType
    {
        get
        {
            return metricsType;
        }

        set
        {
            if (value == MetricsType.ApiSummary)
            {
                throw new ArgumentException("Including just ApiSummary is invalid.");
            }

            this.metricsType = value;
        }
    }
}


/// <summary>
/// Extension methods for setting service settings
/// </summary>
public static class ServiceSettingsExtension
{
    static string RequestIdHeaderName = "x-ms-request-id";
    static string VersionHeaderName = "x-ms-version";
    static string VersionToUse = "2011-08-18";
    static TimeSpan DefaultTimeout = TimeSpan.FromSeconds(30);

    #region ServiceSettings
    /// <summary>
    /// Set blob service settings
    /// </summary>
    /// <param name="client"></param>
    /// <param name="settings"></param>
    public static void SetServiceSettings(this CloudBlobClient client, ServiceSettings settings)
    {
        SetSettings(client.BaseUri, client.Credentials, settings, false /* useSharedKeyLite */);
    }

    /// <summary>
    /// Set queue service settings
    /// </summary>
    /// <param name="client"></param>
    /// <param name="baseUri"></param>
    /// <param name="settings"></param>
    public static void SetServiceSettings(this CloudQueueClient client, Uri baseUri, ServiceSettings settings)
    {
        SetSettings(baseUri, client.Credentials, settings, false /* useSharedKeyLite */);
    }

    /// <summary>
    /// Set blob service settings
    /// </summary>
    /// <param name="client"></param>
    /// <param name="settings"></param>
    public static void SetServiceSettings(this CloudTableClient client, ServiceSettings settings)
    {
        SetSettings(client.BaseUri, client.Credentials, settings, true /* useSharedKeyLite */);
    }

    /// <summary>
    /// Set service settings
    /// </summary>
    /// <param name="baseUri"></param>
    /// <param name="credentials"></param>
    /// <param name="settings"></param>
    /// <param name="useSharedKeyLite"></param>
    internal static void SetSettings(
        Uri baseUri, 
        StorageCredentials credentials, 
        ServiceSettings settings, 
        bool useSharedKeyLite)
    {
        UriBuilder builder = new UriBuilder(baseUri);
        builder.Query = string.Format(
            CultureInfo.InvariantCulture,
            "comp=properties&restype=service&timeout={0}",
            DefaultTimeout.TotalSeconds);

        HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(builder.Uri);
        request.Headers.Add(VersionHeaderName, VersionToUse);
        request.Method = "PUT";

        StorageCredentialsAccountAndKey accountAndKey = credentials as StorageCredentialsAccountAndKey;
        using (MemoryStream buffer = new MemoryStream())
        {
            XmlTextWriter writer = new XmlTextWriter(buffer, Encoding.UTF8);
            SettingsSerializerHelper.SerializeServiceSettings(writer, settings);
            writer.Flush();
            buffer.Seek(0, SeekOrigin.Begin);
            request.ContentLength = buffer.Length;

            if (useSharedKeyLite)
            {
                credentials.SignRequestLite(request);
            }
            else
            {
                credentials.SignRequest(request);
            }

            using (Stream stream = request.GetRequestStream())
            {
                stream.Write(buffer.GetBuffer(), 0, (int)buffer.Length);
            }

            try
            {
                using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
                {
                    Console.WriteLine("Response Request Id = {0} Status={1}", response.Headers[RequestIdHeaderName], response.StatusCode);
                    if (HttpStatusCode.Accepted != response.StatusCode)
                    {
                        throw new Exception("Request failed with incorrect response status.");
                    }
                }
            }
            catch (WebException e)
            {
                Console.WriteLine(
                    "Response Request Id={0} Status={1}",
                    e.Response != null ? e.Response.Headers[RequestIdHeaderName] : "Response is null",
                    e.Status);
                throw;
            }

        }
    }

    /// <summary>
    /// Get blob service settings
    /// </summary>
    /// <param name="client"></param>
    /// <returns></returns>
    public static ServiceSettings GetServiceSettings(this CloudBlobClient client)
    {
        return GetSettings(client.BaseUri, client.Credentials, false /* useSharedKeyLite */);
    }

    /// <summary>
    /// Get queue service settings
    /// </summary>
    /// <param name="client"></param>
    /// <param name="baseUri"></param>
    /// <returns></returns>
    public static ServiceSettings GetServiceSettings(this CloudQueueClient client, Uri baseUri)
    {
        return GetSettings(baseUri, client.Credentials, false /* useSharedKeyLite */);
    }

    /// <summary>
    /// Get table service settings
    /// </summary>
    /// <param name="client"></param>
    /// <returns></returns>
    public static ServiceSettings GetServiceSettings(this CloudTableClient client)
    {
        return GetSettings(client.BaseUri, client.Credentials, true /* useSharedKeyLite */);
    }

    /// <summary>
    /// Get service settings
    /// </summary>
    /// <param name="baseUri"></param>
    /// <param name="credentials"></param>
    /// <param name="useSharedKeyLite"></param>
    /// <returns></returns>
    public static ServiceSettings GetSettings(
        Uri baseUri, 
        StorageCredentials credentials, 
        bool useSharedKeyLite)
    {
        UriBuilder builder = new UriBuilder(baseUri);
        builder.Query = string.Format(
            CultureInfo.InvariantCulture,
            "comp=properties&restype=service&timeout={0}",
            DefaultTimeout.TotalSeconds);

        HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(builder.Uri);
        request.Headers.Add(VersionHeaderName, VersionToUse);
        request.Method = "GET";

        StorageCredentialsAccountAndKey accountAndKey = credentials as StorageCredentialsAccountAndKey;

        if (useSharedKeyLite)
        {
            credentials.SignRequestLite(request);
        }
        else
        {
            credentials.SignRequest(request);
        }

        try
        {
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                Console.WriteLine("Response Request Id={0} Status={1}", response.Headers[RequestIdHeaderName], response.StatusCode);

                if (HttpStatusCode.OK != response.StatusCode)
                {
                    throw new Exception("expected HttpStatusCode.OK");
                }

                using (Stream stream = response.GetResponseStream())
                {
                    using (StreamReader streamReader = new StreamReader(stream))
                    {
                        string responseString = streamReader.ReadToEnd();
                        Console.WriteLine(responseString);

                        XmlReader reader = XmlReader.Create(new MemoryStream(ASCIIEncoding.UTF8.GetBytes(responseString)));
                        return SettingsSerializerHelper.DeserializeServiceSettings(reader);
                    }
                }
            }
        }
        catch (WebException e)
        {
            Console.WriteLine(
                "Response Request Id={0} Status={1}",
                e.Response != null ? e.Response.Headers[RequestIdHeaderName] : "Response is null",
                e.Status);
            throw;
        }
    }
    #endregion
}

Jai Haridas

↧

Windows Azure Tables: Introducing Upsert and Query Projection

September 15, 2011, 1:25 pm

≫ Next: Windows Azure Queues: Improved Leases, Progress Tracking, and Scheduling of Future Work

≪ Previous: Windows Azure Blobs: Improved HTTP Headers for Resume on Download and a Change in If-Match Conditions

As part of the “2011-08-18” version, two Windows Azure Table features were introduced; Upsert, represented by the InsertOrReplace Entity and InserOrMerge Entity APIs, and Query Projection.

In this section, we will first provide an overview of these two features, by defining them and providing use case scenarios. Then, we will illustrate how the Storage Client Library and WCF Data Services can be used to invoke the new APIs by providing sample code based on a usage scenario that highlights both newly introduced features.

Upsert (InsertOrReplace Entity and InsertOrMerge Entity) Feature

The term Upsert is a combination of update and insert. It refers to inserting an entity if the entity does not exist or updating the existing entity if it already exists.

Upsert requests will save an extra call to the storage system when an application wants to insert or update an entity without needing to know if the entity already exists or not. Without Upsert, the application could send up to two requests to get the same behavior i.e. an Update Entity request followed by an Insert Entity request in case the update failed with ‘entity does not exist’. As you can imagine, Upsert can significantly help performance and decrease latencies in certain scenarios (see next section for examples).

The two Upsert APIs provided by Windows Azure Table are InsertOrReplace Entity and InsertOrMerge Entity which are defined as follows:

InsertOrReplace Entity: as the API name implies, InsertOrReplace Entity will insert the entity if the entity does not exist, or if the entity exists, replace the existing one. This means that once the operation successfully completes the table will contain the new entity with properties as defined in the InsertOrReplace Entity request, replacing the prior entity and its properties if it had previously existed.
InsertOrMerge Entity: InsertOrMerge Entity will insert the entity if the entity does not exist or, if the entity exists, merges the provided entity properties with the already existing ones. Once the operation successfully completes, the table will contain the provided entity with updated properties provided in the request. In other words, if the entity exists, the API would have the same effect as Merge Entity, where the resultant entity is a union of properties between the existing and the updated ones.

As you can infer from the above, the difference between the two Upsert commands lays in the update behavior if the entity already exists.

Note: Both Upsert APIs can be used in batch operations (Entity Group Transactions).

Usage Examples

Here are some possible scenarios for using Upsert.

InsertOrReplace Usage Example

An example scenario for use of the InsertOrReplace API would be when a component is responsible for creating and updating entities and ensuring that none of the old entity properties are retained in case the entity already exists. In this case, that component will issue an InsertOrReplace request.

As a concrete example, consider an engine as part of an Azure application that constantly updates its data set by pulling the newest version from a data feed that it is subscribed to. That data feed could be a movie listing, real estate listing, weather updates, etc.; provided as an internet service. At a regular interval, the Azure application would pull the latest data from the feed and update its view that is stored in Windows Azure Table. The application could already have an outdated view of the dataset; in this case a replacement for these entities is needed if they already exist. The data feed could also be providing new information and therefore the application would insert those as new entities. Without upsert capabilities the Azure application would first attempt to send an unconditional Update Entity request, and on failure, which means the entity does not exist, the application would then issue an Insert Entity request. This means that the application would end up doing two requests for every entity in the data set if it does not already exist.

This scenario is greatly simplified by the Upsert capability, where the application would issue a single InsertOrReplace Entity request for every entity instead of two requests. Upsert in this scenario doubles the performance when the entity did not already exist. In addition, the application could boost its performance further by batching InsertOrReplace requests together as part of an entity group transaction.

InsertOrMerge Usage Example

The InsertOrMerge API would appeal in situations where two distinct components need to insert/update an entity in a table, while also being responsible for managing properties in the entities. In addition, these components do not want to reset or delete the other’s component specific properties if the entity already exists.

As an example, consider a customer information table that is updated by multiple components of a service; component A is a mobile application and component B is the website of the service accessible through the browser. Based on some user input, assume that component A can detect the mobile phone number of a user called John Smith and wants to insert that information into the table. In this situation, the mobile device will issue an InsertOrMerge command and expects the customer entity to be created with only John Smith’s mobile phone number if the entity does not exist, or merge the properties sent with the existing ones in case John Smith’s entity already exists. Similarly, component B that is running as part of the website which can collect more information from the user such as his address, email address, etc. would do the same if it wants to update John Smith’s information without removing any other previously recorded information.

Without Upsert, both components would have first issued an unconditional Merge Entity request and on failure, they would have sent an Insert Entity request. They would also have to deal with the edge case where both of their Insert requests collide and one of those components had to re-issue an Update Entity request again. Upsert, as you can see, will greatly simplify the code and can provide a significant performance gain. We will highlight this example through code in the subsequent sections.

Query Projection Feature

Projection refers to querying a subset of an entity or entities’ properties. This is analogous to selecting a subset of the columns/properties of a certain table when querying in LINQ. It is a mechanism that would allow an application to reduce the amount of data returned by a query by specifying that only certain properties are returned in the response. For more information, you can also refer to Windows Azure Table: Query Entities, Windows Azure Table: Writing LINQ Queries, WCF Data Services: Query Projections, and OData: Select System Query Option ($select).

Usage Example

The obvious benefit of Query Projection is to reduce the latency of retrieving data from a Windows Azure Table. There are many usage scenarios where you only want to retrieve the needed properties in a table. For example, consider the scenario where we want to retrieve only a few properties in a table for display on a web page for entities that contain hundreds of properties. In this scenario we would save both on bandwidth usage and improve performance by using Projection.

Another usage is for a recurring job that acts on only a few properties and updates them on a regular basis. In this case, Projection would be handy in only retrieving the entities’ properties that needs updating. This latter usage scenario will be highlighted in sample code provided in the subsequent sections.

In addition, the projection feature will be useful in writing code that would count the number of entities in a certain table in a more efficient manner. We are working on providing Count() in the future, but until that is available Projection is useful since it will allow the transfer of a single property back to the client instead of the full row which makes the counting job more efficient. The code that highlights this usage is demonstrated in a later section.

Using Storage Client Library and WCF Data Services to invoke Upsert commands and Query Projection

In this section, we will walk you through sample code that highlights how an application can use the Storage Client Library and WCF Data Services in order to be able to send Upsert commands such as InsertOrReplace Entity or InsertOrMerge Entity and to do Query Projection.

We will use the customer scenario described in the above InsertOrMerge Usage Example section for the below sample code where a system inserts and updates customer information through two different means; a component running as a mobile application and a component running as part of a website service.

Sending new storage version using the Storage Client Library

To unlock the new Windows Azure Storage Table Upsert and Query Projection features, all REST/OData requests should be tagged with the “20011-08-18” storage version as described in the MSDN documents for InsertOrReplace Entity, InsertOrMerge Entity and Query Entities. If your application is using the .NET storage client library, you can leverage the following code that will help you make use of all current capabilities of the TableServiceContext class while being able to invoke the new Upsert and projection APIs.

The reason the below code is necessary is because the released library sends an older storage version “2009-09-19” that does not support these newly released features. Once a new version is released, the below code would not be needed.

public class TableServiceContextV2 : TableServiceContext
{
    private const string StorageVersionHeader = "x-ms-version";
    private const string August2011Version = "2011-08-18";

    public TableServiceContextV2(string baseAddress, StorageCredentials credentials):
        base (baseAddress, credentials)
    {
        this.SendingRequest += SendingRequestWithNewVersion;
    }

    private void SendingRequestWithNewVersion(object sender, SendingRequestEventArgs e)
    {
        HttpWebRequest request = e.Request as HttpWebRequest;

        // Apply the new storage version as a header value
        request.Headers[StorageVersionHeader] = August2011Version;
    }
}

We will be using TableServiceContextV2 throughout the subsequent sections in order to make use of the new feature.

Sample Setup Code

In this section we will define the schema of the “Customers” table that will be used throughout the subsequent sections and the sample code needed to fill this sample table with some initial entities.

Assume that the “Customers” table entity is defined as follows:

[DataServiceKey("PartitionKey", "RowKey")]
public class CustomerEntity
{
    public CustomerEntity(string partitionKey, string rowKey)
    {
        this.PartitionKey = partitionKey;
        this.RowKey = rowKey;
    }

    public CustomerEntity() {}

    /// <summary>
    /// Customer First Name
    /// </summary>
    public string PartitionKey { get; set; }
    
    /// <summary>
    /// Customer Last Name
    /// </summary>
    public string RowKey { get; set; }

    public DateTime Timestamp { get; set; }

    public string Address { get; set; }

    public string Email { get; set; }

    public string PhoneNumber { get; set; }

    // The below 2 properties are declared as nullable since
    // they are considered optional fields in this example
    public DateTime? CustomerSince { get; set; }

    public int? Rating { get; set; }
}

We will initialize the “Customers” table using the following code:

string accountName = "someaccountname";
string accountKey = "SOMEKEY";
string customersTableName = "Customers";

CloudStorageAccount account = CloudStorageAccount.Parse(string.Format("TableEndpoint=http://{0}.table.core.windows.net;AccountName={0};AccountKey={1}", accountName, accountKey));

CloudTableClient tableClient = account.CreateCloudTableClient();
tableClient.CreateTableIfNotExist(customersTableName);

// Bootstrap the Customers table with a set of sample customer entries
TableServiceContext bootstrapContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

BootstrapTable(customersTableName, bootstrapContext);

And the BootstrapTable method is defined as follows:

static void BootstrapTable(string tableName, TableServiceContext serviceContext)
{
    CustomerEntity customer1 = new CustomerEntity("Walter", "Harp");
    customer1.Address = "1345 Fictitious St, St Buffalo, NY 98052";
    customer1.CustomerSince = DateTime.Parse("01/05/2010");
    customer1.Email = "Walter@contoso.com";
    customer1.PhoneNumber = "425-555-0101";
    customer1.Rating = 4;

    serviceContext.AddObject(tableName, customer1);

    CustomerEntity customer2 = new CustomerEntity("Jonathan", "Foster");
    customer2.Address = "1234 SomeStreet St, Bellevue, WA 75001";
    customer2.CustomerSince = DateTime.Parse("01/05/2005");
    customer2.Email = "Jonathan@fourthcoffee.com";
    customer2.Rating = 3;

    serviceContext.AddObject(tableName, customer2);

    CustomerEntity customer3 = new CustomerEntity("Lisa", "Miller");
    customer3.Address = "4567 NiceStreet St, Seattle, WA 54332";
    customer3.CustomerSince = DateTime.Parse("01/05/2003");
    customer3.Email = "Lisa@northwindtraders.com";
    customer3.Rating = 2;

    serviceContext.AddObject(tableName, customer3);

    serviceContext.SaveChanges();
}

InsertOrMerge Entity API Sample Code

As per the sample code scenario, the mobile app would want to InsertOrMerge an entity for John Smith with his phone number, similarly the website engine would want to do the same but for different and distinct properties. Assume that the mobileServiceContext defined below will represent the DataServiceContext that is running as part of the mobile app and the websiteServiceContext will represent the DataServiceContext that is running as part of the website engine.

// The mobileServiceContext will represent the app running on a mobile phone that is responsible in updating/inserting the customer phone number
TableServiceContext mobileServiceContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

// The websiteServiceContext will represent the instance of the system that is able to insert and update other information about the customer 
TableServiceContext websiteServiceContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

The WCF DataServiceContext class does not natively support an InsertOrMerge or InsertOrReplace API. As mentioned in the MSDN documents for InsertOrMerge Entity and InsertOrReplace Entity, the OData requests as they appear on the wire are very similar in nature to a Merge Entity and Update Entity respectively with the main difference being that the If-Match header, represented by the Etag parameter in code, is omitted. In WCF Data Services API terms, this would be analogous to first attaching (AttachTo) an entity object to the DataServiceContext without any Etag specified and then invoking the UpdateObject method. This means the DataServiceContext is not initially tracking that object and therefore will omit sending any If-Match header when the SaveChanges method is called. The fact that an entity is being updated without an If-Match header signals to Windows Azure Tables that this is an Upsert request.

Note: Prior to version “20011-08-18”, Windows Azure Table will reject any MERGE or PUT requests made against it where the If-Match header was not specified, as those earlier versions do not support Upsert commands.

The code as part of the mobile application which Upserts John Smith’s phone number would therefore look as follows:

// The mobile app collects the customer's phone number
CustomerEntity mobileCustomer = new CustomerEntity("John", "Smith");
mobileCustomer.PhoneNumber = "505-555-0122";

// Notice how the AttachTo method is called with a null Etag which indicates that this is an Upsert Command
mobileServiceContext.AttachTo(customersTableName, mobileCustomer, null);

mobileServiceContext.UpdateObject(mobileCustomer);

// No SaveChangeOptions is used, which indicates that a MERGE verb will be used. This set of steps will result in an InsertOrMerge command to be sent to Windows Azure Table
mobileServiceContext.SaveChanges();

Similarly, the website engine which may not be aware if any data already exists for John Smith, and would want to InsertOrMerge its collected data so that it does not overwrite any existing data.

CustomerEntity websiteCustomer = new CustomerEntity("John", "Smith");
websiteCustomer.Address = "6789 Main St, Albuquerque, VA 98004";
websiteCustomer.Email = "John@cohowinery.com";

// Since the website system might not know if the customer entry already exists, it will also issue an InsertOrMerge command as follows
websiteServiceContext.AttachTo(customersTableName, websiteCustomer);
websiteServiceContext.UpdateObject(websiteCustomer);
websiteServiceContext.SaveChanges();

At this point, if both components have Upserted John Smith’s information using the InsertOrMerge API as described above, the “Customers” table will now contain a John Smith entry with his phone number, address and email recorded.

InsertOrReplace Entity API Sample Code

Assume that there is an option on the website to sync all of the customer’s data from their mail service, and the semantics the website wants is to replace all of customer’s entity with this new data if it already exists, otherwise insert the entity. In this case, any information that was already in the system needs to be replaced; otherwise, we will insert the information as provided. Therefore, the best option for the website would be to use the InsertOrReplace Entity API.

Since we wish to send an InsertOrReplace Entity request, we would first need to AttachTo the websiteServiceContext without providing any Etag value before calling the UpdateObject method. In addition, the SaveChanges method would need to be invoked with the SaveChangesOptions.ReplaceOnUpdate parameter to indicate that the Upsert operation should replace the existing entity in case it exists. The code for this example for a customer called David would look like this:

CustomerEntity mailServiceCustomer = new CustomerEntity("David", "Alexander");
mailServiceCustomer.PhoneNumber = "333-555-0155";
mailServiceCustomer.Address = "234 Main St, Anaheim, TX, 65000";
mailServiceCustomer.Email = "David@wideworldimporters.com";

// Note how SaveChanges is called with ReplaceOnUpdate which is the differentiation
// factor between an InsertOrMerge Entity API and InsertOrReplace Entity Api
websiteServiceContext.AttachTo(customersTableName, mailServiceCustomer);
websiteServiceContext.UpdateObject(mailServiceCustomer);
websiteServiceContext.SaveChanges(SaveChangesOptions.ReplaceOnUpdate);

Note: If in a rare case, the application wants to Upsert an entity that is already tracked by the DataServiceContext, you should first detach from the context and then attach to it in order to clear out any Etag tracking before performing the rest of the above steps.

Query Projection Sample Code

To demonstrate the query projection feature, we will expand on the sample code provided in the previous sections. Assume that there is an offline job that needs to update the rating for all the customers. It needs to increment the rating by one, for everyone who has been a customer since 2006. To accomplish this, it would not be efficient to read all entities’ properties; rather, it would be more efficient to only retrieve the Rating property for all entities that match the CustomerSince criteria using query projection and then update just that property as needed. Since projection will provide a partial view of all the CustomerEntity properties, the best practice is to create a new data service entity type that would only hold the properties that we are interested in.

[DataServiceEntity]
    public class CustomerRating
    {
        public DateTime? CustomerSince { get; set; }
        
        public int? Rating { get; set; }
    }

Although Rating is the only needed property to perform the job, we will also retrieve CustomerSince that the job could use for debugging and logging purposes.

The scan job code that uses projection and update is as follows:

TableServiceContext ratingServiceContext = new TableServiceContextV2(tableClient.BaseUri.ToString(), tableClient.Credentials);

var query = from entity in ratingServiceContext.CreateQuery<CustomerRating>(customersTableName)
            where entity.CustomerSince < DateTime.Parse("01/01/2006")
            select new CustomerRating
            {
                CustomerSince =  entity.CustomerSince,
                Rating = entity.Rating
            };

// Iterate over all the entities that match the query criteria and increment the rating by 1
foreach (CustomerRating customerRating in query)
{
    if (customerRating.Rating.HasValue)
    {
        ++customerRating.Rating;
    }
    else
    {
        // in case no rating was already set
        customerRating.Rating = 1;
    }
    ratingServiceContext.UpdateObject(customerRating);
}

ratingServiceContext.SaveChanges();

Even though you might not have explicitly projected PartitionKey and RowKey, the server will be returning them as part of the OData entity resource path, known as link in WCF Data Services terms. Etag (or entity timestamp) is also returned as part of the response. These 3 properties are tracked by the DataServiceContext which uses them whenever an entity update is subsequently performed. Therefore, optimistic concurrency is also guaranteed on an entity with partial view as is the case with the CustomerRating.

The above code could be written differently if you wanted to re-use the CustomerEntity class instead of the CustomerRating class to project on. However, we highly recommend the use of a partial view class as it is the case with the CustomerRating. This would avoid resetting any value type properties defined as int, double, datetime, etc. by mistake in case they were not defined as nullable types. Furthermore, if you use the original CustomerEntity class, updates will generate a request with all the full entity properties serialized which would consume unnecessary bandwidth since only the Rating property was intended to be updated.

Note: When projecting on an entity type that defines PartitionKey and RowKey such as CustomerEntity you must project on those keys if these entities are updated in later on.

Note: You can project on up to 250 properties including the partition key, row key and timestamp.

Entity Counting Using Projection

Until we provide Count(),you can use projection as follows. You can select any column you wish, though a common approach is to project the PartitionKey since it is anyway returned as part of the OData query projection response.

/// <summary>
/// Counts the number of entities in an Azure table
/// </summary>
/// <param name="dataServiceContext">The TableServiceContextV2 to use in order to issue the requests.  TableServiceContextV2 makes sure that Aug-2011 version is transmitted on the wire</param>
/// <param name="tableName">The table name that we wish to count its entities</param>
/// <returns></returns>
public long GetEntityCount(TableServiceContext dataServiceContext, string tableName)
{
    long count = 0;
            
    var query = from entity in dataServiceContext.CreateQuery<TableServiceEntity>(tableName)
                select new
                 {
                     entity.PartitionKey
                 };

    foreach (var row in query.AsTableServiceQuery())
    {
        ++count;
    }

    return count;
}

Other Projection Usage and Consideration

If you want to project on a single property, the following WCF code is not supported and the query will be rejected by the server since it will result in an unsupported OData request format:

// The below single-entity projection is not supported and will be rejected by the server
var query = from entity in ratingServiceContext.CreateQuery<CustomerEntity>(customersTableName)
            where entity.CustomerSince < DateTime.Parse("01/01/2006")
            select entity.Rating;

Instead, the following code needs to be used. Note that the below code also demonstrates how you can project using an anonymous non-entity type object.

var query = from entity in ratingServiceContext.CreateQuery<CustomerEntity>(customersTableName)
            where entity.CustomerSince < DateTime.Parse("01/01/2006")
            select new
            {
                entity.Rating
            };

// The below code demonstrates how the projected Rating property could be accessed
foreach (var partialData in query)
{
    Console.WriteLine("Rating: {0}", partialData.Rating);
}

Jean Ghanem

↧

Windows Azure Queues: Improved Leases, Progress Tracking, and Scheduling of Future Work

September 15, 2011, 1:26 pm

≫ Next: Introducing Geo-replication for Windows Azure Storage

≪ Previous: Windows Azure Tables: Introducing Upsert and Query Projection

As part of the “2011-08-18” version, we have introduced several commonly requested features to the Windows Azure Queue service. The benefits of these new features are:

Allow applications to store larger messages
Allow applications to schedule work to be processed at a later time
Allow efficient processing for long running tasks, by adding:
- Leasing: Processing applications can now extend the visibility timeout on a message they have dequeued and hence maintain a lease on the message
- Progress Tracking: Processing applications can update the message content of a message they have dequeued to save progress state so that a new worker can continue from that state if the prior worker crashed.

That was then

To better understand these features, let us quickly summarize the messaging semantics in Windows Azure Queue. The Windows Azure Queue service provides a scalable message delivery system that can be used to build workflow and decouple components that need to communicate. With the 2009-09-19 version of the service, users could add up to 8KB messages into the queue. When adding a message, users specify a time to live (< 7 days) after which the message is automatically deleted if it still exists in the queue. When added to the queue, a message is visible and a candidate to be dequeued to be processed by workers. Workers use a 2-phase dequeue/delete pattern. This semantic required the workers to estimate the time it would take to process the message at the time of message is retrieved, often referred to as a non-renewable lease period of the message called the “visibility timeout”. This non-renewable lease period had a limit of 2 hours. When the message is retrieved, a unique token called a pop receipt is associated with the message and must be used for subsequent operations on the message. Once the message is retrieved from the Queue, the message becomes invisible in the queue. When a message is completely processed, the worker subsequently issues a request to delete the message using the pop receipt. This 2-phase process ensures that a message is available to another worker if the initial worker crashes while processing the message.

This is now

With the 2011-08-18 version, we focused on streamlining the use of Windows Azure Queues to make them simpler and more efficient. First, we made it extremely simple for workers to process long running jobs efficiently – this required the ability to extend the lease on the message by providing a new visibility timeout. Without this ability, workers would have had to provide a generous lease period to the “Get Messages” API since the lease period is set before the message is inspected.

To further improve efficiency, we now allow workers to also update the message contents they have dequeued. This can be used to store progress information and intermittent states so that if the worker crashes, a new worker can resume the work rather than starting from scratch. Finally, we targeted scenarios that allow support for larger messages and allow scheduling of work when adding messages to the queue. To reiterate, the following features in the 2011-08-18 version, makes working with Windows Azure Queues simpler and more efficient:

The maximum message size has been increased to 64KB which will allow more applications to store the full message in the queue, instead of storing the actual message contents in blobs, and to now keep progress information in the message.
A message can be added to the queue with a visibility timeout so that it becomes visible to workers at a later time.
A lease on the message can be extended by the worker that did the original dequeue so that it can continue processing the message.
The maximum visibilitytimeout for both scheduling future work, dequeueing a message, and updating it for leasing has been extended to 7 days.
The message content can now be updated to save the progress state, which allows other workers to resume processing the message without the need to start over from the beginning.

NOTE: The current storage client library (version 1.5) uses the 2009-09-19 version and hence these new features are not available. We will be releasing an update with these new features in a future release of the SDK. Until that time we have provided some extension methods later in this posting that allow you to start using these new features today.

We will now go over the changes to the Windows Azure Queue service APIs in detail.

PUT Message

The “PUT Message” REST API is used to add messages to the queue. It now allows the message content to be up to 64KB and also provides an optional visibility timeout parameter. For example, you can now put a message into the queue with a visibilitytimeout of 24 hours, and the message will sit in the queue invisible until that time. Then at that time it will become visible for workers to process (along with the other messages in that queue).

By default, the visibilitytimeout used is 0 which implies that a message becomes visible for processing as soon as it is added to the queue. The visibilitytimeout is specified in seconds and must be >= 0 and < 604,800 (7 days). It also should be less than the “time to live”. Time to live has a default value of 7 days after which a message is automatically removed from the queue if it still exists. A message will be deleted from the queue after its time to live has been reached, regardless of whether it has become visible or not.

REST Examples

Here is a REST example on how to add a message that will be visible in 10 minutes. The visibility timeout is provided as a query parameter to the URI called “visibilitytimeout” and is in seconds. The optional expiry time is provided as messagettl query parameter and is set in seconds here 2 days in this example.

Request:

POST http://cohowinery.queue.core.windows.net/videoprocessing/messages?visibilitytimeout=600&messagettl=172800&timeout=30 HTTP/1.1
x-ms-version: 2011-08-18
x-ms-date: Fri, 02 Sep 2011 05:03:21 GMT
Authorization: SharedKey cohowinery:sr8rIheJmCd6npMSx7DfAY3L//V3uWvSXOzUBCV9Ank=
Content-Length: 100

<QueueMessage>
<MessageText>PHNhbXBsZT5zYW1wbGUgbWVzc2FnZTwvc2FtcGxlPg==</MessageText>
</QueueMessage>

Storage Client Library Example

We will use the extension methods provided at the end of this blog to show how to add messages that are made visible at a later time.

Let us look at the scenario of a video processing workflow for Coho Winery. Videos are uploaded by the Marketing team at Coho Winery. Once these videos are uploaded, they need to be processed before it can be displayed on the Coho Winery web site – the workflow is:

Scan for virus
Encode the video in multiple formats
Compress the video for efficiency and this is compressed to the new location that the website picks it up from.

When uploading the videos initially, the component adds a message to the queue after the videos is uploaded. However, 1 day is allowed before the video is processed to allow a period of time for changes to be made to the video in the workflow. The message is appended to the queue with delayed visibility to allow this grace 1 day time period. A set of instructions go into the message which include the format, encoder to use, compression to use, scanners to use etc. The idea is that in addition to this information required for processing the message, we will also save the current state in the message. The format used is as follows; the first 2 characters represent the processing state, followed by the actual content.

/// <summary>
/// Add message for each blob in input directory. 
/// After uploading, add a message to the queue with invisibility of 1 day 
/// to allow the blob to be uploaded.
/// </summary>
private static void UploadVideos()
{
    CloudQueueClient queueClient = Account.CreateCloudQueueClient();
    CloudQueue queue = queueClient.GetQueueReference(QueueName);
    queue.EncodeMessage = false;

    string[] content = GetMessageContent();
    for (int i = 0; i < content.Length; i++)
    {
        // upload the blob (not provided for brevity…)

        // Call the extension method provided at the end of this post
 queue.PutMessage(
            Account.Credentials, 
            EncodeMessage(content[i], ProcessingState.VirusScan),
            StartVisibilityTimeout, // set to 1 day
            MessageTtl, // set to 3 days
            ServerRequestTimeout);
        
    }
}

/// <summary>
/// The processing stages for a message
/// </summary>
public enum ProcessingState : int
{
    VirusScan = 1,
    Encoder = 2,
    Compress = 3,
    Completed  = 4
}
/// <summary>
/// Form of the queue message is: [2 digits for state][Actual Message content]
/// </summary>
/// <param name="content"></param>
/// <param name="state"></param>
/// <returns></returns>
private static string EncodeMessage(string content, ProcessingState state)
{
    return string.Format("{0:D2}{1}", (int)state, content);
}

Update Message

The “Update Message” REST API is used to extend the lease period (aka visibility timeout) and/or update the message content. A worker that is processing a message can now determine the extra processing time it needs based on the content of a message. The lease period, specified in seconds, must be >= 0 and is relative to the current time. 0 makes the message visible at that time in the queue as a candidate for processing. The maximum value for lease period is 7 days. Note, when updating the visibilitytimeout it can go beyond the expiry time (or time to live) that is defined when the message was added to the queue. But the expiry time will take precedence and the message will be deleted at that time from the queue.

Update Message can also be used by workers to store the processing state in the message. This processing state can then be used by another worker to resume processing if the former worker crashed or got interrupted and the message has not yet expired.

When getting a message, the worker gets back a pop-receipt. A valid pop-receipt is needed to perform any action on the message while it is invisible in the queue. The Update Message requires the pop receipt returned during the “Get Messages” request or a previous Update Message. The pop receipt is invalid (400 HTTP status code) if:

The message has expired.
The message has been deleted using the last pop receipt received either from “Get Messages” or “Update Message”.
The invisibility time has elapsed and the message has been retrieved by another “Get Messages” call.
The message has been updated with a new visibility timeout and hence a new pop receipt is returned. Each time the message is updated, it gets a new pop-receipt which is returned with the UpdateMessage call.

NOTE: When a worker goes to renew the lease (extend the visibility timeout), if for some reason the pop receipt is not received by the client (e.g., network error), the client can retry the request with the pop receipt it currently has. But if that retry fails with “Message not found” then the client should give up processing the message, and get a new message to process. This is because the prior message did have its visibility timeout extended, but it now has a new pop receipt, and that message will become visible again after the timeout elapses at which time a worker can dequeue it again and continue processing it.

The pop receipt returned in the response should be used for subsequent “Delete Message” and “Update Message” APIs. The new next visibility timeout is also returned in the response header.

REST Examples

Update a message to set the visibility timeout to 1 minute.

PUT http://cohowinery.queue.core.windosws.net/videoprocessing/messages/663d89aa-d1d9-42a2-9a6a-fcf822a97d2c?popreceipt=AgAAAAEAAAApAAAAGIw6Q29bzAE%3d&visibilitytimeout=60&timeout=30 HTTP/1.1
x-ms-version: 2011-08-18
x-ms-date: Fri, 02 Sep 2011 05:03:21 GMT
Authorization: SharedKey cohowinery:batcrWZ35InGCZeTUFWMdIQiOZPCW7UEyeGdDOg7WW4=
Host: 10.200.21.10
Content-Length: 75

<QueueMessage><MessageText>new-message-content</MessageText></QueueMessage>

Storage Client Library Example

Continuing with the example of video processing workflow for Coho Winery, we will now go over the processing part of the workflow. The video processing task is a long running task and we would like to divide the work into stages defined by the ProcessingState enumeration mentioned above. The workflow is to retrieve a message, then decode its content to get the processing state and the actual content. To retrieve, we use the new extension method since the September 2009 version of GetMessage API blocked visibility timeouts of longer than 2 hours on the client side, and therefore won’t support this workflow. ProcessMessages starts a timer to iterate through all the current messages retrieved and renew the lease or delete the message based on the processing state and when the message will be visible again. ProcessMessages converts the QueueMessage retrieved into MessageInfo and adds it to the list of messages that needs to be renewed. The MessageInfo class exists since the QueueMessage class does not allow updating the pop receipt which needs to set on every Update message.

public class MessageInfo
{
    /// <summary>
    /// Message info constructor
    /// </summary>
    /// <param name="queue"></param>
    /// <param name="messageId"></param>
    /// <param name="popRceeipt"></param>
    public MessageInfo(
        CloudQueue queue, 
        string messageId, 
        string popRceeipt, 
        string content, 
        ProcessingState state)
    {
        this.Queue = queue;
        this.MessageId = messageId;
        this.PopReceipt = popRceeipt;
        this.State = state;
        this.Content = content;
    }
 
    /// <summary>
    /// The queue to which the message belongs to
    /// </summary>
    public CloudQueue Queue { get; private set; }
 
    /// <summary>
    /// The message id  for the message
    /// </summary>
    public string MessageId { get; private set; }
 
    /// <summary>
    /// The pop receipt to use for update and delete
    /// </summary>
    public string PopReceipt { get; set; }
 
    /// <summary>
    /// The content of the message
    /// </summary>
    public string Content { get; set; }
 
    /// <summary>
    /// Next visibility time
    /// </summary>
    public DateTime NextVisibility { get; set; }
 
    /// <summary>
    /// The processing state the message is in. If completed, it will be 
    /// deleted from the queue
    /// </summary>
    public ProcessingState State { get;  set; }
}

/// <summary>
/// Called every minute to renew the lease
/// </summary>
private static void OnRenewLeaseTimer(object state)
{
    // Exception handling hidden for brevity...
 
    // traversing from last to allow deleting the message
    // from the list
    for ( int i = MessageList.Count-1; i >= 0; i--)
    {
        MessageInfo message = MessageList[i];
 
        // if the message is completed - let us delete it
        if(message.State == ProcessingState.Completed)
        {
            message.Queue.DeleteMessage(message.MessageId, message.PopReceipt);
            Console.WriteLine(
                "Deleted Message Id {0} to stage {1}",
                message.MessageId,
                (int)message.State);
            MessageList.RemoveAt(i);
        }
        else if (
    message.NextVisibility.Subtract(DateTime.UtcNow).TotalSeconds < RenewalTime)
        {
            // if next visibility is < renewal time then let us renew it again
            DateTime nextVisibilityTime;
            string newPopReceipt;
 
            // based on whether we need to stop or not and the state, we will 
            // update the visibility
            // NOTE: we always update content but we can be smart about it and update only 
            // if state changes
            message.Queue.UpdateMessage(
                Account.Credentials,
                message.MessageId,
                message.PopReceipt,
                VisibilityTimeout,
                EncodeMessage(message.Content, message.State),
                ServerRequestTimeout,
                out newPopReceipt,
                out nextVisibilityTime);
            message.PopReceipt = newPopReceipt;
            message.NextVisibility = nextVisibilityTime;
 
            Console.WriteLine(
                "Updated Message Id {0} to stage {1} Next visible at {2}", 
                message.MessageId, 
                (int)message.State, 
                nextVisibilityTime);
        }
    }
}
 

// NOTE: Exception handling is excluded here for brevity 
/// <summary>
/// Processes a given number of messages. It iterates through stages and extends 
/// visibility and saves state if it should continue processing.
/// </summary>
private static void ProcessMessages()
{    
CloudQueueClient queueClient = Account.CreateCloudQueueClient();
 
    CloudQueue queue = queueClient.GetQueueReference(QueueName);
    queue.EncodeMessage = false;
    
    Timer timer = new Timer(new TimerCallback(OnRenewLeaseTimer), null, 0, TimerInterval);
 
    while (true)
    {
        QueueMessage message = queue.GetMessages(
            Account.Credentials,
            VisibilityTimeout,
            1 /* message count */,
            ServerRequestTimeout).FirstOrDefault();
 
        if (message == null)
        {
            Thread.Sleep(PollingTime);
            continue;
        }
 
 
        string messageContent = message.Text;
        Console.WriteLine(
            "\n\nGot message Content={0} Length={1} Id={2} InsertedAt={3} Visibility={4}",
            messageContent, 
            messageContent.Length,
            message.Id, 
            message.InsertionTime, 
            message.TimeNextVisible);
 
        string content;
        ProcessingState state;
        DecodeMessage(messageContent, out content, out state);
        
   MessageInfo msgInfo = new MessageInfo(
            queue, 
            message.Id, 
            message.PopReceipt, 
            content, 
            state);
        MessageList.Add( msgInfo );
 
        Console.WriteLine("Message Id {0} is in stage {1}", message.Id, (int)state);
 
        // keep processing until we complete all stages of processing or 
        // we have next visibility < UtcNow i.e. lease lost
        while (state != ProcessingState.Completed 
    && msgInfo.NextVisibility < DateTime.UtcNow)
        {
            // do some work..
            ProcessStage(msgInfo.MessageId, msgInfo.Content, ref state);
            msgInfo.State = state;
        }
    }
}

Get Messages

The “Get Messages” REST API is used to retrieve messages. The only change in 2011-08-18 version is that the visibility timeout has been extended from 2 hours to 7 days.

REST Examples

Get messages with visibility timeout set to 4 hours (provided in seconds).

GET http://cohowinery.queue.core.windosws.net/videoprocessing/messages? visibilitytimeout=14400&timeout=30 HTTP/1.1
x-ms-version: 2011-08-18
x-ms-date: Fri, 02 Sep 2011 05:03:21 GMT
Authorization: SharedKey cohowinery:batcrWZ35InGCZeTUFWMdIQiOZPCW7UEyeGdDOg7WW4=
Host: 10.200.21.10

Storage Client Library Example

The example in Update Message covers the invocation of GetMessages extension.

Storage Client Library Extensions

As we mentioned above, the existing Storage Client library released in SDK version 1.5 does not support the new version, therefore we have provided sample extension methods described in this blog post so you can start using these new features today. These extension methods can help you issue such requests. Please test this thoroughly before using it in production to ensure it meets your needs.

We have provided 2 extension methods:

PutMessage: implements adding a message to the queue with visibility timeout.
UpdateMessage: implements updating a message (content and/or visibility timeout. It returns the new pop receipt and next visibility timeout. It does not change the CloudQueueMessage type, as pop receipt and next visibility are not publically accessible.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using Microsoft.WindowsAzure.StorageClient.Protocol;
using System.IO;
using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.StorageClient;

// NOTE: Please test these before using in production
public static class QueueExtensions
{
    /// <summary>
    /// Add a message to the queue. The visibility timeout param can be used to optionally 
    /// make the message visible at a future time
    /// </summary>
    /// <param name="queue">
    /// The queue to add message to
    /// </param>
    /// <param name="credentials">
    /// The storage credentials used for signing
    /// </param>
    /// <param name="message">
    /// The message content
    /// </param>
    /// <param name="visibilityTimeout">
    /// value in seconds and should be greater than or equal to 0 and less than 604800 (7 days). 
    /// It should also be less than messageTimeToLive
    /// </param>
    /// <param name="messageTimeToLive">
    /// (Optional) Time after which the message expires if it is not deleted from the queue.
    /// It can be a maximum time of 7 days.
    /// </param>
    /// <param name="timeout">
    /// Server timeout value
    /// </param>
    public static void PutMessage(
        this CloudQueue queue, 
        StorageCredentials credentials, 
        string message, 
        int? visibilityTimeout, 
        int? messageTimeToLive,
        int timeout)
    {
        StringBuilder builder = new StringBuilder(queue.Uri.AbsoluteUri);

        builder.AppendFormat("/messages?timeout={0}", timeout);

        if (messageTimeToLive != null)
        {
            builder.AppendFormat("&messagettl={0}", messageTimeToLive.ToString());
        }

        if (visibilityTimeout != null)
        {
            builder.AppendFormat("&visibilitytimeout={0}", visibilityTimeout);
        }

        HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(builder.ToString());
        request.Method = "POST";
        request.Headers.Add("x-ms-version", "2011-08-18");

        byte[] buffer = QueueRequest.GenerateMessageRequestBody(message);

        request.ContentLength = buffer.Length;
        credentials.SignRequest(request); 
        using (Stream stream = request.GetRequestStream())
        {
            stream.Write(buffer, 0, buffer.Length);
        }

        try
        {
            using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                // we expect 201 for Put Message
                if (response.StatusCode != HttpStatusCode.Created)
                {
                    throw new InvalidOperationException("Unexpected response code.");
                }
            }
        }
        catch (WebException e)
        {
            // Log any exceptions for debugging
            LogWebException(e);
            throw;
        }
    }

    /// <summary>
    /// Update the message to extend visibility timeout and optionally 
    /// the message contents 
    /// </summary>
    /// <param name="queue">
    /// The queue to operate on
    /// </param>
    /// <param name="credentials">
    /// The storage credentials used for signing
    /// </param>
    /// <param name="messageId">
    /// The ID of message to extend the lease on
    /// </param>
    /// <param name="popReceipt">
    /// pop receipt to use
    /// </param>
    /// <param name="visibilityTimeout">
    /// Value should be greater than or equal to 0 and less than 7. 
    /// </param>
    /// <param name="messageBody">
    /// (optional) The message content
    /// </param>
    /// <param name="timeout">
    /// Server timeout value
    /// </param>
    /// <param name="newPopReceiptID">
    /// Return the new pop receipt that should be used for subsequent requests when 
    /// the lease is held
    /// </param>
    /// <param name="nextVisibilityTime">
    /// Return the next visibility time for the message. This is time until which the lease is held
    /// </param>
    public static void UpdateMessage(
        this CloudQueue queue, 
        StorageCredentials credentials, 
        string messageId, 
        string popReceipt,
        int visibilityTimeout, 
        string messageBody,
        int timeout, 
        out string newPopReceiptID,
        out DateTime nextVisibilityTime)
    {
        StringBuilder builder = new StringBuilder(queue.Uri.AbsoluteUri);

        builder.AppendFormat(
            "/messages/{0}?timeout={1}&popreceipt={2}&visibilitytimeout={3}", 
            messageId, 
            timeout, 
            Uri.EscapeDataString(popReceipt),
            visibilityTimeout);

        HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(builder.ToString());
        request.Method = "PUT";
        request.Headers.Add("x-ms-version", "2011-08-18");

        if (messageBody != null)
        {
            byte[] buffer = QueueRequest.GenerateMessageRequestBody(messageBody);

            request.ContentLength = buffer.Length;
            credentials.SignRequest(request);
            using (Stream stream = request.GetRequestStream())
            {
                stream.Write(buffer, 0, buffer.Length);
            }
        }
        else
        {
            request.ContentLength = 0;
            credentials.SignRequest(request);
        }

        try
        {
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                if (response.StatusCode != HttpStatusCode.NoContent)
                {
                    throw new InvalidOperationException("Unexpected response code.");
                }

                newPopReceiptID = response.Headers["x-ms-popreceipt"];
                nextVisibilityTime = DateTime.Parse(response.Headers["x-ms-time-next-visible"]);
            }
        }
        catch (WebException e)
        {
            // Log any exceptions for debugging
            LogWebException(e);
            throw;
        }
    }


    /// <summary>
    /// Get messages has been provided only because storage client library does not allow 
    /// invisibility timeout to exceed 2 hours
    /// </summary>
    /// <param name="queue">
    /// The queue to operate on
    /// </param>
    /// <param name="credentials">
    /// The storage credentials used for signing
    /// </param>
    /// <param name="messageId">
    /// The ID of message to extend the lease on
    /// </param>
    /// <param name="popReceipt">
    /// pop receipt to use
    /// </param>
    /// <param name="visibilityTimeout">
    /// Value should be greater than or equal to 0 and less than 7. 
    /// </param>
    /// <param name="messageBody">
    /// (optional) The message content
    /// </param>
    /// <param name="timeout">
    /// Server timeout value
    /// </param>
    /// <param name="newPopReceiptID">
    /// Return the new pop receipt that should be used for subsequent requests when 
    /// the lease is held
    /// </param>
    /// <param name="nextVisibilityTime">
    /// Return the next visibility time for the message. This is time until which the lease is held
    /// </param>
    public static IEnumerable<QueueMessage> GetMessages(
        this CloudQueue queue,
        StorageCredentials credentials,
        int? visibilityTimeout,
        int? messageCount,
        int timeout)
    {
        StringBuilder builder = new StringBuilder(queue.Uri.AbsoluteUri);

        builder.AppendFormat(
            "/messages?timeout={0}",
            timeout);

        if (messageCount != null)
        {
            builder.AppendFormat("&numofmessages={0}", messageCount);
        }

        if (visibilityTimeout != null)
        {
            builder.AppendFormat("&visibilitytimeout={0}", visibilityTimeout);
        }

        HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(builder.ToString());
        request.Method = "GET";
        request.Headers.Add("x-ms-version", "2011-08-18");
        credentials.SignRequest(request);

        try
        {
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                if (response.StatusCode != HttpStatusCode.OK)
                {
                    throw new InvalidOperationException("Unexpected response code.");
                }

                GetMessagesResponse msgResponses = QueueResponse.GetMessages(response);
                
                // force it to be parsed right away else the response will be closed
                // since QueueResponse.GetMessages parses responses lazily. 
                QueueMessage[] messages = msgResponses.Messages.ToArray<QueueMessage>();
                return messages.AsEnumerable<QueueMessage>();
            }
        }
        catch (WebException e)
        {
            // Log any exceptions for debugging
            LogWebException(e);
            throw;
        }
    }

    /// <summary>
    /// Log the exception in your preferred logging system
    /// </summary>
    /// <param name="e">
    /// The exception to log
    /// </param>
    private static void LogWebException(WebException e)
    {
        HttpWebResponse response = e.Response as HttpWebResponse;
        Console.WriteLine(string.Format(
            "Request failed with '{0}'. Status={1} RequestId={2} Exception={3}",
            e.Message,
            response.StatusCode,
            response != null ? response.Headers["x-ms-request-id"] : "<NULL>",
            e.ToString()));

    // Log to your favorite location…
    
    }
}

Jai Haridas

↧

Introducing Geo-replication for Windows Azure Storage

September 15, 2011, 1:27 pm

≫ Next: Windows Azure Storage at BUILD 2011: Geo-Replication and new Blob, Table and Queue features

≪ Previous: Windows Azure Queues: Improved Leases, Progress Tracking, and Scheduling of Future Work

We are excited to announce that we are now geo-replicating customer’s Windows Azure Blob and Table data, at no additional cost, between two locations hundreds of miles apart within the same region (i.e., between North and South US, between North and West Europe, and between East and Southeast Asia). Geo-replication is provided for additional data durability in case of a major data center disaster.

Storing Data in Two Locations for Durability

With geo-replication, Windows Azure Storage now keeps your data durable in two locations. In both locations, Windows Azure Storage constantly maintains multiple healthy replicas of your data.

The location where you read, create, update, or delete data is referred to as the ‘primary’ location. The primary location exists in the region you choose at the time you create an account via the Azure Portal (e.g., North Central US). The location where your data is geo-replicated is referred to as the secondary location. The secondary location is automatically determined based on the location of the primary; it is in the other data center that is in the same region as the primary. In this example, the secondary would be located in South Central US (see table below for full listing). The primary location is currently displayed in the Azure Portal, as shown below. In the future, the Azure Portal will be updated to show both the primary and secondary locations. To view the primary location for your storage account in the Azure Portal, click on the account of interest; the primary region will be displayed on the lower right side under Country/Region, as highlighted below.

The following table shows the primary and secondary location pairings:

Primary	Secondary
North Central US	South Central US
South Central US	North Central US
East US	West US
West US	East US
North Europe	West Europe
West Europe	North Europe
South East Asia	East Asia
East Asia	South East Asia

Geo-Replication Costs and Disabling Geo-Replication

Geo-replication is included in current pricing for Azure Storage. This is called Geo Redundant Storage.

If you do not want your data geo-replicated you can disable geo-replication for your account. This is called Locally Redundant Storage, and is a 23% to 34% discounted price (depending on how much data is stored) over geo-replicated storage. See here for more details on Locally Redundant Storage (LRS).

When you turn geo-replication off, the data will be deleted from the secondary location. If you decide to turn geo-replication on again after you have turned it off, there is a re-bootstrap egress bandwidth charge (based on the data transfer rates) for copying your existing data from the primary to the secondary location to kick start geo-replication for the storage account. This charge will be applied only when you turn geo-replication on after you have turned it off. There is no additional charge for continuing geo-replication after the re-bootstrap is done.

Currently all storage accounts are bootstrapped and in geo-replication mode between primary and secondary storage locations.

How Geo-Replication Works

When you create, update, or delete data to your storage account, the transaction is fully replicated on three different storage nodes across three fault domains and upgrade domains inside the primary location, then success is returned back to the client. Then, in the background, the primary location asynchronously replicates the recently committed transaction to the secondary location. That transaction is then made durable by fully replicating it across three different storage nodes in different fault and upgrade domains at the secondary location. Because the updates are asynchronously geo-replicated, there is no change in existing performance for your storage account.

Our goal is to keep the data durable at both the primary and secondary location. This means we keep enough replicas in both locations to ensure that each location can recover by itself from common failures (e.g., disk, node, rack, TOR failing), without having to talk to the other location. The two locations only have to talk to each other to geo-replicate the recent updates to storage accounts. They do not have to talk to each other to recover data due to common failures. This is important, because it means that if we had to failover a storage account from the primary to the secondary, then all the data that had been committed to the secondary location via geo-replication will already be durable there.

With this first release of geo-replication, we do not provide an SLA for how long it will take to asynchronously geo-replicate the data, though transactions are typically geo-replicated within a few minutes after they have been committed in the primary location.

How Geo-Failover Works

In the event of a major disaster that affects the primary location, we will first try to restore the primary location. Dependent upon the nature of the disaster and its impacts, in some rare occasions, we may not be able to restore the primary location, and we would need to perform a geo-failover. When this happens, affected customers will be notified via their subscription contact information (we are investigating more programmatic ways to perform this notification). As part of the failover, the customer’s “account.service.core.windows.net” DNS entry would be updated to point from the primary location to the secondary location. Once this DNS change is propagated, the existing Blob and Table URIs will work. This means that you do not need to change your application’s URIs – all existing URIs will work the same before and after a geo-failover.

For example, if the primary location for a storage account “myaccount” was North Central US, then the DNS entry for myaccount.<service>.core.windows.net would direct traffic to North Central US. If a geo-failover became necessary, the DNS entry for myaccount.<service>.core.windows.net would be updated so that it would then direct all traffic for the storage account to South Central US.

After the failover occurs, the location that is accepting traffic is considered the new primary location for the storage account. This location will remain as the primary location unless another geo-failover was to occur. Once the new primary is up and accepting traffic, we will bootstrap a new secondary, which will also be in the same region, for the failed over storage accounts. In the future we plan to support the ability for customers to choose their secondary location (when we have more than two data centers in a given region), as well as the ability to swap their primary and secondary locations for a storage account.

Order of Geo-Replication and Transaction Consistency

Geo-replication ensures that all the data within a PartitionKey is committed in the same order at the secondary location as at the primary location. This said, it is also important to note that there are no geo-replication ordering guarantees across partitions. This means that different partitions can be geo-replicating at different speeds. However, once all the updates have been geo-replicated and committed at the secondary location, the secondary location will have the exact same state as the primary location. However, because geo-replication is asynchronous, recent updates can be lost in the event of a major disaster.

For example, consider the case where we have two blobs, foo and bar, in our storage account (for blobs, the complete blob name is the PartitionKey). Now say we execute transactions A and B on blob foo, and then execute transactions X and Y against blob bar. It is guaranteed that transaction A will be geo-replicated before transaction B, and that transaction X will be geo-replicated before transaction Y. However, no other guarantees are made about the respective timings of geo-replication between the transactions against foo and the transactions against bar. If a disaster happened and caused recent transactions to not get geo-replicated, that would make it possible for, transactions A and X to be geo-replicated, while losing transactions B and Y. Or transactions A and B could have been geo-replicated, but neither X nor Y had made it. The same holds true for operations involving Tables, except that the partitions are determined by the application defined PartitionKey of the entity instead of the blob name. For more information on partition keys, please see Windows Azure Storage Abstractions and their Scalability Targets.

Because of this, to best leverage geo-replication, one best practice is to avoid cross-PartitionKey relationships whenever possible. This means you should try to restrict relationships for Tables to entities that have the same PartitionKey value. Since all transactions within a single partition are geo-replicated in order, this guarantees those relationships will be committed in order on the secondary.

The only multiple object transaction supported by Windows Azure Storage is Entity Group Transactions for Windows Azure Tables, which allow clients to commit a batch of entities together as a single atomic transaction. Geo-replication also treats this batch as an atomic operation. Therefore, the whole batch transaction is committed atomically on the secondary.

Summary

This is our first step in geo-replication, where we are now providing additional durability in case of a major data center disaster. The next steps involve developing features needed to help applications recover after a failover, which is an area we are investigating further.

Brad Calder and Monilee Atkinson

↧

Windows Azure Storage at BUILD 2011: Geo-Replication and new Blob, Table and Queue features

September 16, 2011, 1:42 am

≫ Next: Blob Download Bug in Windows Azure SDK 1.5

≪ Previous: Introducing Geo-replication for Windows Azure Storage

We are excited to release geo-replication and a new version of the REST API to enable functionality improvements for Windows Azure Blobs, Tables, and Queues. At this time we are now geo-replicating all Windows Azure Blob and Table data, at no additional cost, between two data centers for additional data durability in case of a major disaster.

Geo-Replication

Geo-replication replicates your Windows Azure Blob and Table data between two locations that are hundreds of miles apart and within the same region (i.e., between North Central and South Central US, between North Europe and Europe West, and between East and South East Asia). We do not replicate data across different regions. Note that there is no change in existing performance as updates are asynchronously geo-replicated.

New Blob, Tables and Queue features

For REST API improvements, we have just released the new version (“2011-08-18”), which contains:

Table Upsert – allows a single request to be sent to Windows Azure Tables to either insert an entity (if it doesn’t exist) or update/replace the entity (if it exists).
Table Projection (Select) – allows a client to retrieve a subset of an entity’s properties. This improves performance by reducing the serialization/deserialization cost and bandwidth used for retrieving entities.
Improved Blob HTTP header support – improves experience for streaming applications and browser downloads.
Queue UpdateMessage – allows clients to have a lease on a message and renew the lease while it processes it, as well as update the contents of the message to track the progress of the processing.
Queue InsertMessage with visibility timeout – allows clients to queue up future work items. It allows a newly inserted message to stay invisible on the queue until the timeout expires

Table Upsert

The Table Upsert allows a client to send a single request to either update or insert an entity; the appropriate action is taken based on if the entity already exists or not. This saves a call in the scenario where an application would want to insert the entity if it doesn’t exist or update it if it does exist. This feature is exposed via the InsertOrReplace Entity and InsertOrMerge Entity APIs.

InsertOrReplace Entity – inserts the entity if it does not exist or replaces the existing entity if it does exist.
InsertOrMerge Entity – inserts the entity if it does not exist or merges with the existing one if it does exist.

Table Projection (Select)

Table Projection allows you to retrieve a subset of the properties of one or more entities, and only returns those properties/columns from Azure Tables. Projection improves performance by reducing latency when retrieving data from a Windows Azure Table. It also saves bandwidth by returning only the properties of interest.

Improved Blob download experience

We have added additional HTTP header support to Windows Azure Blobs to improve the experience for streaming applications and resuming download. Without this support, some browsers would have to restart reading a blob from the beginning if there was an interruption in the download.

Queue UpdateMessage

With the current Queue API, once a worker retrieves a message from the queue, it has to specify a long enough visibility timeout so that it can finish processing the message before the timeout expires. In many scenarios, the worker may want to extend the visibility timeout if it needs more time to process the message. This new UpdateMessage API enables such scenarios. It allows the worker to use the visibility timeout as a lease on the message, so that it can periodically extend the lease and maintain the ownership of the message until the processing completes.

The UpdateMessage API also supports updating the content of the message. This allows the worker to update the message in the Queue to record progress information. Then if the worker crashes, this allows the next worker to continue processing the message from where the prior worker left off.

This functionality enables worker roles to take on longer running tasks than before. It also allows faster failover time, since the leases can be set at fairly small intervals (e.g. 1 minute) so that if a worker role fails, the message will become visible within a minute for another worker role to pick up.

Queue InsertMessage with visibility timeout

We have added support in the InsertMessage API to allow you to specify the initial visibility timeout value for a message. This allows a newly inserted message to stay invisible on the queue until the timeout expires. This allows scheduling of future work by adding messages that become visible at a later time.

For more information see our BUILD talk or one of the following blog posts

Brad Calder

↧

Blob Download Bug in Windows Azure SDK 1.5

September 27, 2011, 10:24 pm

≫ Next: SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

≪ Previous: Windows Azure Storage at BUILD 2011: Geo-Replication and new Blob, Table and Queue features

Update: We have now released a fix for this issue. The download blob methods in this version throw an IOException if the connection is closed while downloading the blob, which is the same behavior seen in versions 1.4 and earlier of the StorageClient library.

We strongly recommend that users using SDK version 1.5.20830.1814 upgrade their applications immediately to this new version 1.5.20928.1904. You can determine if you have the affected version of the SDK by going to Programs and Features in the Control Panel and verify the version of the Windows Azure SDK. If version 1.5.20830.1814 is installed, please follow these steps to upgrade:

Click “Get Tools & SDK” on the Windows Azure SDK download page. You do not need to uninstall the previous version first.
Update your projects to use the copy of Microsoft.WindowsAzure.StorageClient.dll found in C:\Program Files\Windows Azure SDK\v1.5\bin\

We found a bug in the StorageClient library in Windows Azure SDK 1.5 that impacts the DownloadToStream, DownloadToFile, DownloadText, and DownloadByteArray methods for Windows Azure Blobs.

If a client is doing a synchronous blob download using the SDK 1.5 and its connection is closed, then the client can get a partial download of the blob. The problem is that the client does not get an exception when the connection is closed, so it thinks the full blob was downloaded. For example, if the blob was 15MB, and the client downloaded just 1MB and the connection was closed, then the client would only have 1MB (instead of 15MB) and think that it had the whole blob. Instead, the client should have gotten an exception. The problem only occurs when the connection to the client is closed, and only for synchronous downloads, but not asynchronous downloads.

The issue was introduced in version 1.5 of the Azure SDK when we changed the synchronous download methods to call the synchronous Read API on the web response stream. We see that once response headers have been received, the synchronous read method on the .NET response stream does not throw an exception when a connection is lost and the blob content has not been fully received yet. Since an exception is not thrown, this results in the Download method behaving as if the entire download has completed and it returns successfully when only partial content has been downloaded.

The problem only occurs when all of the following are true:

A synchronous download method is used
At least the response headers are received by the client after which the connection to the client is closed before the entire content is received by the client

Notably, one scenario where this can occur is if the request timeout happens after the headers have been received, but before all of the content can be transferred. For example, if the client set the timeout to 30 seconds for download of a 100GB blob, then it’s likely that this problem would occur, because 30 seconds is long enough for the response headers to be received along with part of the blob content, but is not long enough to transfer the full 100GB of content.

This does not impact asynchronous downloads, because asynchronous reads from a response stream throw an IOException when the connection is closed. In addition, calls to OpenRead() are not affected as they also use the asynchronous read methods.

We will be releasing an SDK hotfix for this soon and apologize for any inconvenience this may have caused. Until then we recommend that customers use SDK 1.4 or the async methods to download blobs in SDK 1.5. Additionally, customers who have already started using SDK 1.5, can work around this issue by doing the following: Replace your DownloadToStream, DownloadToFile, DownloadText, and DownloadByteArray methods with BeginDownloadToStream/EndDownloadToStream. This will ensure that an IOException is thrown if the connection is closed, similar to SDK 1.4. The following is an example showing you how to do that:

CloudBlob blob = new CloudBlob(uri);
blob.DownloadToStream(myFileStream); // WARNING: Can result in partial successful downloads

// NOTE: Use async method to ensure an exception is thrown if connection is 
// closed after partial download
blob.EndDownloadToStream(
blob.BeginDownloadToStream(myFileStream, null /* callback */, null /* state */));

If you rely on the text/file/byte array versions of download, we have the below extension methods for your convenience, which wraps a stream to work around this problem.

using System.IO;
using System.Text;
using Microsoft.WindowsAzure.StorageClient;

public static class CloudBlobExtensions
{
    /// <summary>
    /// Downloads the contents of a blob to a stream.
    /// </summary>
    /// <param name="target">The target stream.</param>
    public static void DownloadToStreamSync(this CloudBlob blob, Stream target)
    {
        blob.DownloadToStreamSync(target, null);
    }

    /// <summary>
    /// Downloads the contents of a blob to a stream.
    /// </summary>
    /// <param name="target">The target stream.</param>
    /// <param name="options">An object that specifies any additional options for the 
    /// request.</param>
    public static void DownloadToStreamSync(this CloudBlob blob, Stream target, 
        BlobRequestOptions options)
    {
        blob.EndDownloadToStream(blob.BeginDownloadToStream(target, null, null));
    }

    /// <summary>
    /// Downloads the blob's contents.
    /// </summary>
    /// <returns>The contents of the blob, as a string.</returns>
    public static string DownloadTextSync(this CloudBlob blob)
    {
        return blob.DownloadTextSync(null);
    }

    /// <summary>
    /// Downloads the blob's contents.
    /// </summary>
    /// <param name="options">An object that specifies any additional options for the 
    /// request.</param>
    /// <returns>The contents of the blob, as a string.</returns>
    public static string DownloadTextSync(this CloudBlob blob, BlobRequestOptions options)
    {
        Encoding encoding = Encoding.UTF8;

        byte[] array = blob.DownloadByteArraySync(options);

        return encoding.GetString(array);
    }

    /// <summary>
    /// Downloads the blob's contents to a file.
    /// </summary>
    /// <param name="fileName">The path and file name of the target file.</param>
    public static void DownloadToFileSync(this CloudBlob blob, string fileName)
    {
        blob.DownloadToFileSync(fileName, null);
    }

    /// <summary>
    /// Downloads the blob's contents to a file.
    /// </summary>
    /// <param name="fileName">The path and file name of the target file.</param>
    /// <param name="options">An object that specifies any additional options for the 
    /// request.</param>
    public static void DownloadToFileSync(this CloudBlob blob, string fileName, 
        BlobRequestOptions options)
    {
        using (var fileStream = File.Create(fileName))
        {
            blob.DownloadToStreamSync(fileStream, options);
        }
    }

    /// <summary>
    /// Downloads the blob's contents as an array of bytes.
    /// </summary>
    /// <returns>The contents of the blob, as an array of bytes.</returns>
    public static byte[] DownloadByteArraySync(this CloudBlob blob)
    {
        return blob.DownloadByteArraySync(null);
    }

    /// <summary>
    /// Downloads the blob's contents as an array of bytes. 
    /// </summary>
    /// <param name="options">An object that specifies any additional options for the 
    /// request.</param>
    /// <returns>The contents of the blob, as an array of bytes.</returns>
    public static byte[] DownloadByteArraySync(this CloudBlob blob, 
        BlobRequestOptions options)
    {
        using (var memoryStream = new MemoryStream())
        {
            blob.DownloadToStreamSync(memoryStream, options);

            return memoryStream.ToArray();
        }
    }
}

Usage Examples:

blob.DownloadTextSync();
blob.DownloadByteArraySync();
blob.DownloadToFileSync(fileName);

Joe Giardino

↧

SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

November 20, 2011, 11:08 am

≫ Next: Windows Azure Storage Client for Java Blob Features

≪ Previous: Blob Download Bug in Windows Azure SDK 1.5

We recently published a paper describing the internal details of Windows Azure Storage at the 23^rd ACM Symposium on Operating Systems Principles (SOSP).

The paper can be found here. The conference also posted a video of the talk here, and the slides can be found here.

The paper describes how we provision and scale out capacity within and across data centers via storage stamps, and how the storage location service is used to manage our stamps and storage accounts. Then it focuses on the details for the three different layers of our architecture within a stamp (front-end layer, partition layer and stream layer), why we have these layers, what their functionality is, how they work, and the two replication engines (intra-stamp and inter-stamp). In addition, the paper summarizes some of the design decisions/tradeoffs we have made as well as lessons learned from building this large scale distributed system.

A key design goal for Windows Azure Storage is to provide Consistency, Availability, and Partition Tolerance (CAP) (all 3 of these together, instead of just 2) for the types of network partitioning we expect to see for our architecture. This is achieved by co-designing the partition layer and stream layer to provide strong consistency and high availability while being partition tolerance for the common types of partitioning/failures that occur within a stamp, such as node level and rack level network partitioning.

In this short conference talk we try to touch on the key details of how the partition layer provides an automatically load balanced object index that is scalable to 100s of billions of objects per storage stamp, how the stream layer performs its intra-stamp replication and deals with failures, and how the two layers are co-designed to provide consistency, availability, and partition tolerant for node and rack level network partitioning and failures.

Brad Calder

↧

Windows Azure Storage Client for Java Blob Features

March 5, 2012, 9:41 am

≫ Next: Windows Azure Storage Client for Java Tables Deep Dive

≪ Previous: SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

We have released the Storage Client for Java with support for Windows Azure Blobs, Queues, and Tables. Our goal is to continue to improve the development experience when writing cloud applications using Windows Azure Storage. As such, we have incorporated feedback from customers and forums for the current .NET libraries to help create a more seamless API that is both powerful and simple to use. This blog post serves as an overview of a few new features for Blobs that are currently unique to the Storage Client for Java, which are designed to address common scenarios when working with Cloud workloads.

MD5

One of the key pieces of feedback we get is to make working with MD5 easier and more seamless. For java we have simplified this scenario to provide consistent behavior and simple configuration.

There are two different ways to use Content-MD5s in the Blob service: a transactional MD5 is used to provide data integrity during transport of blocks or pages of a blob, which is not stored with the blob, and an MD5 of the entire blob, which is stored with the blob and returned on subsequent GET operations (see the blog post here for information on what the server provides).

To make this easy, we have designed high level controls for common cross cutting scenarios that will be respected by every API. For example, no matter which API a user chooses to upload a blob (page or block) the MD5 settings will be honored. Additionally we have decoupled transactional MD5 (which is useful to ensure transport integrity of individual blocks and pages) and Blob Level MD5 which sets the MD5 value on the entire blob, which is then returned on subsequent GETs.

The following example illustrates how to use BlobRequestOptions to utilize transactional content md5 to ensure that uploads and downloads are validated correctly. Note: transactional MD5 is not needed when using HTTPS as HTTPS provides its own integrity mechanism. Both the transactional MD5 and the full blob level MD5 are set to false (turned off) by default. The following shows how to turn both of them on.

// Define BlobRequestOptions to use transactional MD5
BlobRequestOptions options = new BlobRequestOptions();
options.setUseTransactionalContentMD5(true);
options.setStoreBlobContentMD5 (true); // Set full blob level MD5


blob.upload(sourceStream blobLength,
            null /* AccessCondition */,
            options,
            null /* OperationContext */);

blobRef.download(outStream,
            null /* AccessCondition */,
            options,
            null /* OperationContext */);

Sparse Page Blob

The most common use for page blobs among cloud applications is to back a VHD (Virtual Hard Drive) image. When a page blob is first created it exists as a range of zero filled bytes. The Windows Azure Blob service provides the ability to write in increments of 512 byte pages and keep track of which pages have been written to. As such it is possible for a client to know which pages still contain zero filled data and which ones contain valid data.

We are introducing a new feature in this release of the Storage Client for Java which can omit 512 byte aligned ranges of zeros when uploading a page blob, and subsequently intelligently download only the non-zero data. During a download when the library detects that the current data being read exists in a zero’d region the client simply generates these zero’d bytes without making additional requests to the server. Once the read continues on into a valid range of bytes the library resumes making requests to the server for the non-zero’d data.

The following example illustrates how to use BlobRequestOptions to use the sparse page blob feature.

// Define BlobRequestOptions to use sparse page blob
BlobRequestOptions options = new BlobRequestOptions();
options.setUseSparsePageBlob(true);
blob.create(length);

// Alternatively could use blob.openOutputStream
blob.upload(sourceStream blobLength,
            null /* AccessCondition */,
            options,
            null /* OperationContext */);

// Alternatively could use blob.openInputStream
blobRef.download(outStream,
            null /* AccessCondition */,
            options,
            null /* OperationContext */);

Please note this optimization works in chunked read and commit sizes (configurable via CloudBlobClient. setStreamMinimumReadSizeInBytes and CloudBlobClient.setPageBlobStreamWriteSizeInBytes respectively). If a given read or commit consists entirely of zeros then the operation is skipped altogether. Alternatively if a given read or commit chunk consists of only a subset of non-zero data then it is possible that the library will “shrink” the read or commit chunk by ignoring any beginning or ending pages which consist entirely of zeros. This allows us to optimize both cost (fewer transactions) and speed (less data) in a predictable manner.

Download Resume

Another new feature to this release of the Storage Client for Java is the ability for full downloads to resume themselves in the event of a disconnect or exception. The most cost efficient way for a client to download a given blob is in a single REST GET call. However if you are downloading a large blob, say of several GB, an issue arises on how to handle disconnects and errors without having to pre-buffer data or re-download the entire blob.

To solve this issue, the blob download functionality will now check the retry policy specified by the user and determine if the user desires to retry the operation. If the operation should not be retried it will simply throw as expected, however if the retry policy indicates the operation should be retried then the download will revert to using a BlobInputStream positioned to the current location of the download with an ETag check. This allows the user to simply “resume” the download in a performant and fault-tolerant way. This feature is enabled for all downloads via the CloudBlob.download method.

Best Practices

We’d also like to share some best practices for using blobs with the Storage Client for Java:

Always provide the length of the data being uploaded if it is available; alternatively a user may specify -1 if the length is not known. This is needed for authentication. Uploads that specify -1 will cause the Storage Client to pre-read the data to determine its length, (and potentially to calculate md5 if enabled). If the InputStream provided is not markable BlobOutputStream is used.
Use markable streams (i.e. BufferedInputStream) when uploading Blobs. In order to support retries and to avoid having to prebuffer data in memory a stream must be markable so that it can be rewound in the case of an exception to retry the operation. When the stream provided does not support mark the Storage Client will use a BlobOutputStream which will internally buffer individual blocks until they are commited. Note: uploads that are over CloudBlobClient.getSingleBlobPutThresholdInBytes() (Default is 32 MB, but can be set up to 64MB) will also be uploaded using the BlobOutputStream.
If you already have the MD5 for a given blob you can set it directly via CloudBlob.getProperties().setContentMd5 and it will be sent on a subsequent Blob upload or by calling CloudBlob.uploadProperties(). This can potentially increase performance by avoiding a duplicate calculation of MD5.
Please note MD5 is disabled by default, see the MD5 section above regarding how to utilize MD5.
BlobOutputStreams commit size is configurable via CloudBlobClient.setWriteBlockSizeInBytes() for BlockBlobs and CloudBlobClient.setPageBlobStreamWriteSizeInBytes() for Page Blobs
BlobInputStreams minimum read size is configurable via CloudBlobClient.setStreamMinimumReadSizeInBytes()
For lower latency uploads BlobOutputStream can execute multiple parallel requests. The concurrent request count is default to 1 (no concurrency) and is configurable via CloudBlobClient.setConcurrentRequestCount(). BlobOutputStream is accessible via Cloud[Block|Page]Blob.openOutputStream or by uploading a stream that is greater than CloudBlobClient.getSingleBlobPutThresholdInBytes() for BlockBlob or 4 MB for PageBlob.

Summary

This post has covered a few interesting features in the recently released Windows Azure Storage Client for Java. We very much appreciate all the feedback we have gotten from customers and through the forums, please keep it coming. Feel free to leave comments below,

Joe Giardino
Developer
Windows Azure Storage

Resources

Get the Windows Azure SDK for Java

Learn more about the Windows Azure Storage Client for Java

Learn more about Windows Azure Storage

↧

Windows Azure Storage Client for Java Tables Deep Dive

March 5, 2012, 9:44 am

≫ Next: Windows Azure Storage Client for Java Overview

≪ Previous: Windows Azure Storage Client for Java Blob Features

This blog post serves as an overview to the recently released Windows Azure Storage Client for Java which includes support for the Azure Table Service. Azure Tables is a NoSQL datastore. For detailed information on the Azure Tables data model, see the resources section below.

Design

There are three key areas we emphasized in the design of the Table client: usability, extensibility, and performance. The basic scenarios are simple and “just work”; in addition, we have also provided three distinct extension points to allow developers to customize the client behaviors to their specific scenario. We have also maintained a degree of consistency with the other storage clients (Blob and Queue) so that moving between them feels seamless. There are also some features and requirements that make the table service unique.

For more on the overall design philosophy and guidelines of the Windows Azure Storage Client for Java see the related blog post in the Links section below.

Packages

The Storage Client for Java is distributed in the Windows Azure SDK for Java jar (see below for locations). The Windows Azure SDK for Java jar also includes a “service layer” implementation for several Azure services, including storage, which is intended to provide a low level interface for users to access various services in a common way. In contrast, the client layer provides a much higher level API surface that is more approachable and has many conveniences that are frequently required when developing scalable Windows Azure Storage applications. For the optimal development experience avoid importing the base package directly and instead import the client sub package (com.microsoft.windowsazure.services.table.client). This blog post refers to this client layer.

Common

com.microsoft.windowsazure.services.core.storage – This package contains all storage primitives such as CloudStorageAccount, StorageCredentials, Retry Policies, etc.

Tables

com.microsoft.windowsazure.services.table.client – This package contains all the functionality for working with the Windows Azure Table service, including CloudTableClient, TableServiceEntity, etc.

Object Model

A diagram of the table object model is provided below. The core flow of the client is that a user defines an action (TableOperation, TableBatchOperation, or TableQuery) over entities in the Table service and executes these actions via the CloudTableClient. For usability, these classes provide static factory methods to assist in the definition of actions.

For example, the code below inserts a single entity:

tableClient.execute([Table Name], TableOperation.insert(entity));

Figure 1: Table client object model

Execution

CloudTableClient

Similar to the other Azure storage clients, the table client provides a logical service client, CloudTableClient, which is responsible for service wide operations and enables execution of other operations. The CloudTableClient class can update the Storage Analytics settings for the Table service, list all the tables in the account, and execute operations against a given table, among other operations.

TableRequestOptions

The TableRequestOptions class defines additional parameters which govern how a given operation is executed, specifically the timeout and RetryPolicy that are applied to each request. The CloudTableClient provides default timeout and RetryPolicy settings; TableRequestOptions can override them for a particular operation.

TableResult

The TableResult class encapsulates the result of a single TableOperation. This object includes the HTTP status code, the ETag and a weak typed reference to the associated entity.

Actions

TableOperation

The TableOperation class encapsulates a single operation to be performed against a table. Static factory methods are provided to create a TableOperation that will perform an insert, delete, merge, replace, retrieve, insertOrReplace, and insertOrMerge operation on the given entity. TableOperations can be reused so long as the associated entity is updated. As an example, a client wishing to use table storage as a heartbeat mechanism could define a merge operation on an entity and execute it to update the entity state to the server periodically.

Sample – Inserting an Entity into a Table

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
    
tableClient.createTableIfNotExists("people");

// Create a new customer entity.
CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");
customer1.setEmail("Walter@contoso.com");
customer1.setPhoneNumber("425-555-0101");

// Create an operation to add the new customer to the people table.
TableOperation insertCustomer1 = TableOperation.insert(customer1);

// Submit the operation to the table service.
tableClient.execute("people", insertCustomer1);

TableBatchOperation

The TableBatchOperation class represents multiple TableOperation objects which are executed as a single atomic action within the table service. There are a few restrictions on batch operations that should be noted:

You can perform batch updates, deletes, inserts, merge and replace operations.
A batch operation can have a retrieve operation, if it is the only operation in the batch.
A single batch operation can include up to 100 table operations.
All entities in a single batch operation must have the same partition key.
A batch operation is limited to a 4MB data payload.

The CloudTableClient.execute overload which takes as input a TableBatchOperation will return an ArrayList of TableResults which will correspond in order to the entries in the batch itself. For example, the result of a merge operation that is the first in the batch will be the first entry in the returned ArrayList of TableResults. In the case of an error the server may return a numerical id as part of the error message that corresponds to the sequence number of the failed operation in the batch unless the failure is associated with no specific command such as ServerBusy, in which case -1 is returned. TableBatchOperations, or Entity Group Transactions, are executed atomically meaning that either all operations will succeed or if there is an error caused by one of the individual operations the entire batch will fail.

Sample – Insert two entities in a single atomic Batch Operation

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableBatchOperation;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
    
tableClient.createTableIfNotExists("people");

// Define a batch operation.
TableBatchOperation batchOperation = new TableBatchOperation();

// Create a customer entity and add to the table
CustomerEntity customer = new CustomerEntity("Smith", "Jeff");
customer.setEmail("Jeff@contoso.com");
customer.setPhoneNumber("425-555-0104");
batchOperation.insert(customer);

// Create another customer entity and add to the table
CustomerEntity customer2 = new CustomerEntity("Smith", "Ben");
customer2.setEmail("Ben@contoso.com");
customer2.setPhoneNumber("425-555-0102");
batchOperation.insert(customer2);        

// Submit the operation to the table service.
tableClient.execute("people", batchOperation);

TableQuery

The TableQuery class is a lightweight query mechanism used to define queries to be executed against the table service. See “Querying” below.

Entities

TableEntity interface

The TableEntity interface is used to define an object that can be serialized and deserialized with the table client. It contains getters and setters for the PartitionKey, RowKey, Timestamp, Etag, as well as methods to read and write the entity. This interface is implemented by the TableServiceEntity and subsequently the DynamicTableEntity that are included in the library; a client may implement this interface directly to persist different types of objects or objects from 3^rd-party libraries. By overriding the readEntity or writeEntity methods a client may customize the serialization logic for a given entity type.

TableServiceEntity

The TableServiceEntity class is an implementation of the TableEntity interface and contains the RowKey, PartitionKey, and Timestamp properties. The default serialization logic TableServiceEntity uses is based off of reflection where an entity “property” is defined by a class which contains corresponding get and set methods where the return type of the getter is the same as that of the input parameter of the setter. This will be discussed in greater detail in the extension points section below. This class is not final and may be extended to add additional properties to an entity type.

Sample – Define a POJO that extends TableServiceEntity

// This class defines one additional property of integer type, since it extends
// TableServiceEntity it will be automatically serialized and deserialized.
public class SampleEntity extends TableServiceEntity {
    private String SampleProperty;

    public String getSampleProperty() {
      return this.SampleProperty;
    }

    public String setSampleProperty (String sampleProperty) {
      this.SampleProperty= sampleProperty;
    }
}

DynamicTableEntity

The DynamicTableEntity class allows clients to update heterogeneous entity types without the need to define base classes or special types. The DynamicTableEntity class defines the required properties for RowKey, PartitionKey, Timestamp, and Etag; all other properties are stored in a HashMap form. Aside from the convenience of not having to define concrete POJO types, this can also provide increased performance by not having to perform serialization or deserialization tasks. We have also provided sample code that demonstrates this.

Sample – Retrieve a single property on a collection of heterogeneous entities

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableQuery;

// Define the query to retrieve the entities, notice in this case we
// only need to retrieve the Count property.
TableQuery<DynamicTableEntity> query = TableQuery.from(tableName, DynamicTableEntity.class).select(new String[] { "Count" });

// Note the TableQuery is actually executed when we iterate over the
// results. Also, this sample uses the DynamicTableEntity to avoid
// having to worry about various types, as well as avoiding any
// serialization processing.
for (DynamicTableEntity ent : tableClient.execute(query)) {
    EntityProperty countProp = ent.getProperties().get("Count");

    // Users should always assume property is not there in case another
    // client removed it.
    if (countProp == null) {
     throw new IllegalArgumentException("Invalid entity, Count property not found!");

    // Display Count property, however you could modify it here and persist it back to the service.
    System.out.println(countProp.getValueAsInteger());
    }
}

EntityProperty

The EntityProperty class encapsulates a single property of an entity for the purposes of serialization and deserialization. The only time the client has to work directly with EntityProperties is when using DynamicTableEntity or implementing the TableEntity.readEntity and TableEntity.writeEntity methods. The EntityProperty stores the given value in its serialized string form and deserializes it on each subsequent get.

Please note, when using a non-String type property in a tight loop or performance critical scenario, it is best practice to not update an EntityProperty directly, as there will be a performance implication in doing so. Instead, a client should deserialize the entity into an object, update that object directly, and then persist that object back to the table service (See POJO Sample below).

The samples below show two approaches that can be a players score property. The first approach uses DynamicTableEntity to avoid having to declare a client side object and updates the property directly, whereas the second will deserialize the entity into a POJO and update that object directly.

Sample –Update of entity property using EntityProperty

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableResult;

// Retrieve entity
TableResult res = tableClient.execute("gamers", TableOperation.retrieve("Smith", "Jeff", DynamicTableEntity.class));
DynamicTableEntity player = res.getResultAsType();

// Retrieve Score property
EntityProperty scoreProp = player.getProperties().get("Score");
    
if (scoreProp == null) {
    throw new IllegalArgumentException("Invalid entity, Score property not found!");
}
    
scoreProp.setValue(scoreProp.getValueAsInteger() + 1;

// Store the updated score
tableClient.execute("gamers", TableOperation.merge(player));

Sample – Update of entity property using POJO

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableResult;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// Entity type with a score property
public class GamerEntity extends TableServiceEntity {
    private int score;

    public int getScore() {
        return this.score;
    }

    public void setScore(int Score) {
        this.score = Score;
    }
}

// Retrieve entity
TableResult res = tableClient.execute("gamers", TableOperation.retrieve("Smith", "Jeff", GamerEntity.class));
GamerEntity player = res.getResultAsType();

// Update Score
player.setScore(player.getScore() + 1);


// Store the updated score
tableClient.execute("gamers", TableOperation.merge(player));

Serialization

There are three main extension points in the table client that allow a user to customize serialization and deserialization of entities. Although completely optional, these extension points enable a number of use-specific or NoSQL scenarios.

EntityResolver

The EntityResolver interface defines a single method (resolve) and allows client-side projection and processing for each entity during serialization and deserialization. This interface is designed to be implemented by an anonymous inner class to provide custom client side projections, query-specific filtering, and so forth. This enables key scenarios such as deserializing a collection of heterogeneous entities from a single query.

Sample – Use EntityResolver to perform client side projection

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.EntityResolver;
import com.microsoft.windowsazure.services.table.client.TableQuery;

// Define the query to retrieve the entities, notice in this case we
// only need to retrieve the Email property.
TableQuery<Customer> query = TableQuery.from(tableName, Customer.class).select(new String[] { "Email" });

// Define a Entity resolver to mutate the entity payload upon retrieval.
// In this case we will simply return a String representing the customers Email 
// address.
EntityResolver<String> emailResolver = new EntityResolver<String>() {
@Override
public String resolve(String PartitionKey, String RowKey, Date timeStamp, HashMap<String, EntityProperty> props, String etag) {
    return props.get("Email").getValueAsString();
    }
};

// Display the results of the query, note that the query now returns
// Strings instead of entity types since this is the type of
// EntityResolver we created.
for (String projectedString : tableClient.execute(query, emailResolver)) {
    System.out.println(projectedString);
}

Annotations

@StoreAs

The @StoreAs annotation is used by a client to customize the serialized property name for a given property. If @StoreAs is not used, then the property name will be used in table storage. The @StoreAs annotation cannot be used to store PartitionKey, RowKey, or Timestamp, if a property is annoted as such it will be ignored by the serializer. Two common scenarios are to reduce the length of the property name for performance reasons, or to override the default name the property may have.

Sample – Alter a property name via the @StoreAs Annotation

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.StoreAs;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// This entity will store the CustomerPlaceOfResidenceProperty as “cpor” on the service.
public class StoreAsEntity extends TableServiceEntity {
    private String cpor;

    @StoreAs(name = "cpor")
    public String getCustomerPlaceOfResidence() {
            return this.cpor;
    }

    @StoreAs(name = "cpor")
    public void setCustomerPlaceOfResidence (String customerPlaceOfResidence) {
        this.cpor = customerPlaceOfResidence;
    }
}

@Ignore

The @Ignore annotation is used on the getter or setter to indicates to the default reflection-based serializer that it should ignore the property during serialization and deserialization.

Sample – Use @Ignore annotation to expose friendly client side property that is backed by PartitionKey

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// In this sample, the Customer ID is used as the PartitionKey.  A property 
// CustomerID is exposed on the client side to allow friendly access, but
// is annotated with @Ignore to prevent it from being duplicated in the
// table entity.
public class OnlineStoreBaseEntity extends TableServiceEntity {
    @Ignore
    public String getCustomerID() {
        return this.getPartitionKey();
    }

    @Ignore
    public void setCustomerID(String customerID) {
        this.setPartitionKey(customerID);
    }
}

TableEntity.readEntity and TableEntity.writeEntity methods

While they are part of the TableEntity interface, the TableEntity .readEntity and TableEntity writeEntity methods provide the third major extension points to serialization. By implementing or overriding these methods in an object a client can customize how entities are stored, and potentially improve performance compared to the default reflection-based serializer. See the javadoc for the respective method for more information.

For more on the overall design object model of the Windows Azure Storage Client for Java see the related blog post in the Links section below.

Querying

There are two query constructs in the table client: a retrieve TableOperation which addresses a single unique entity, and a TableQuery which is a standard query mechanism used against multiple entities in a table. Both querying constructs need to be used in conjunction with either a class type that implements the TableEntity interface or with an EntityResolver which will provide custom deserialization logic.

Retrieve

A retrieve operation is a query which addresses a single entity in the table by specifying both its PartitionKey and RowKey. This is exposed via TableOperation.retrieve and TableBatchOperation.retrieve and executed like a typical operation via the CloudTableClient.

Sample – Retrieve a single entity

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();

// Retrieve the entity with partition key of "Smith" and row key of "Jeff"
TableOperation retrieveSmithJeff = TableOperation.retrieve("Smith", "Jeff", CustomerEntity.class);

// Submit the operation to the table service and get the specific entity.
CustomerEntity specificEntity = tableClient.execute("people", retrieveSmithJeff).getResultAsType();

TableQuery

Unlike TableOperation and BatchTableOperation the TableQuery requires a source table name as part of its definition. TableQuery contains a static factory method from used to create a new query and provides methods for fluent query construction. The code below produces a query to take the top 5 results from the customers table which have a RowKey greater than 5.

Sample – Query top 5 entities with RowKey greater than or equal to 5

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.TableQuery;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
TableQuery<TableServiceEntity> query = 
TableQuery.from(“customers”, TableServiceEntity.class).
where(TableQuery.generateFilterCondition("RowKey", QueryComparisons.GREATER_THAN_OR_EQUAL, "5")).take(5);

The TableQuery is strong typed and must be instantiated with a class type that is accessible and contains a nullary constructor; otherwise an exception will be thrown. The class type must also implement the TableEntity interface. If the client wishes to use a resolver to deserialize entities they may specify one via execute on CloudTableClient and specify the TableServiceEntity class type as demonstrated above.

The TableQuery object provides methods for take, select, where, and source table name. There are static methods provided such as generateFilterCondition and joinFilter which construct other filter strings. Also note, generateFilterCondition provides several overloads that can handle all supported types, some examples are listed below:

// 1. Filter on String
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, "foo");

// 2. Filter on UUID
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, uuid));

// 3. Filter on Long
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50L);

// 4. Filter on Double
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50.50);

// 5. Filter on Integer
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50);

// 6. Filter on Date
TableQuery.generateFilterCondition("Prop", QueryComparisons.LESS_THAN, new Date());

// 7. Filter on Boolean
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, true);

// 8. Filter on Binary
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, new byte[] { 0x01, 0x02, 0x03 });

Sample – Query all entities with a PartitionKey=”SamplePK” and RowKey greater than or equal to “5”

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.TableConstants;
import com.microsoft.windowsazure.services.table.client.TableQuery;
import com.microsoft.windowsazure.services.table.client.TableQuery.Operators;
import com.microsoft.windowsazure.services.table.client.TableQuery.QueryComparisons;

String pkFilter = TableQuery.generateFilterCondition(TableConstants.PARTITION_KEY, QueryComparisons.EQUAL,"samplePK");

String rkFilter = TableQuery.generateFilterCondition(TableConstants.ROW_KEY, QueryComparisons.GREATER_THAN_OR_EQUAL, "5");

String combinedFilter = TableQuery.combineFilters(pkFilter, Operators.AND, rkFilter);

TableQuery<SampleEntity> query = TableQuery.from(tableName, SampleEntity.class).where(combinedFilter);

Note: There is no logical expression tree provided in the current release, and as a result repeated calls to the fluent methods on TableQuery overwrite the relevant aspect of the query.

Scenarios

NoSQL

A common pattern in a NoSQL datastore is to work with storing related entities with different schema in the same table. A frequent example relates to customers and orders which are stored in the same table. In our case, the PartitionKey for both Customer and Order will be a unique CustomerID which will allow us to retrieve and alter a customer and their respective orders together. The challenge becomes how to work with these heterogeneous entities on the client side in an efficient and usable manner. We discuss this here, and you can also download sample code.

The table client provides an EntityResolver interface which allows client side logic to execute during deserialization. In the scenario detailed above, let’s use a base entity class named OnlineStoreEntity which extends TableServiceEntity.

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

public abstract class OnlineStoreEntity extends TableServiceEntity {
    @Ignore
    public String getCustomerID() {
       return this.getPartitionKey();
    }

    @Ignore
    public void setCustomerID(String customerID) {
        this.setPartitionKey(customerID);
    }
}

Let’s also define two additional entity types, Customer and Order which derive from OnlineStoreEntity and prepend their RowKey with an entity type enumeration, “0001” for customers and “0002” for Orders. This will allow us to query for just a customer, their orders, or both—while also providing a persisted definition as to what client side type is used to interact with the object. Given this, let’s define a class that implements the EntityResolver interface to assist in deserializing the heterogeneous types.

Sample – Using EntityResolver to deserialize heterogeneous entities

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.EntityResolver;

EntityResolver<OnlineStoreEntity> webStoreResolver = new EntityResolver<OnlineStoreEntity>() {
@Override
public OnlineStoreEntity resolve(String partitionKey, String rowKey, Date timeStamp, HashMap<String, EntityProperty> properties, String etag) throws StorageException {
     OnlineStoreEntity ref = null;
     
     if (rowKey.startsWith("0001")) {
          // Customer
          ref = new Customer();
     }
     else if (rowKey.startsWith("0002")) {
         // Order
         ref = new Order();
     }
     else {
         throw new IllegalArgumentException(String.format("Unknown entity type detected! RowKey: %s", rowKey));
     }

     ref.setPartitionKey(partitionKey);
     ref.setRowKey(rowKey);
     ref.setTimestamp(timeStamp);
     ref.setEtag(etag);
     ref.readEntity(properties, null);
     return ref;
     }
};

Now, on iterating through the results with the following code:

for (OnlineStoreEntity entity : tableClient.execute(customerAndOrderQuery, webStoreResolver)) {
     System.out.println(entity.getClass());
}

It will output:

class tablesamples.NoSQL$Customer
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
….

For the complete OnlineStoreSample sample please see the Samples section below.

Heterogeneous update

In some cases it may be required to update entities regardless of their type or other properties. Let’s say we have a table named “employees”. This table contains entity types for developers, secretaries, contractors, and so forth. The example below shows how to query all entities in a given partition (in our example the state the employee works in is used as the PartitionKey) and update their salaries regardless of job position. Since we are using merge, the only property that is going to be updated is the Salary property, and all other information regarding the employee will remain unchanged.

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableBatchOperation;
import com.microsoft.windowsazure.services.table.client.TableQuery;

TableQuery<DynamicTableEntity> query = TableQuery.from("employees", DynamicTableEntity.class).where("PartitionKey eq 'Washington'").select(new String[] { "Salary" });

// Note for brevity sake this sample assumes there are 100 or less employees, however the client should ensure batches are kept to 100 operations or less.  

TableBatchOperation mergeBatch = new TableBatchOperation();
for (DynamicTableEntity ent : tableClient.execute(query)) {
    EntityProperty salaryProp = ent.getProperties().get("Salary");

    // Check to see if salary property is present
    if (salaryProp != null) {
        double currentSalary = salaryProp.getValueAsDouble();

      if (currentSalary < 50000) {
          // Give a 10% raise
          salaryProp.setValue(currentSalary * 1.1);
      } else if (currentSalary < 100000) {
          // Give a 5% raise
          salaryProp.setValue(currentSalary * 1.05);
      }

        mergeBatch.merge(ent);
    }
    else {
         throw new IllegalArgumentException("Entity does not contain salary!");
    }
}

// Execute batch to save changes back to the table service
tableClient.execute("employees", mergeBatch);

Complex Properties

The Windows Azure Table service provides two indexed columns that together provide the key for a given entity (PartitionKey and RowKey). A common best practice is to include multiple aspects of an entity in these keys since they can be queried efficiently. Using the @Ignore annotation, it is possible to define friendly client-side properties that are part of this complex key without persisting them individually.

Let’s say that we are creating a directory of all the people in America. By creating a complex key such as [STATE];[CITY] I can enable efficient queries for all people in a given state or city using a lexical comparison while utilizing only one indexed column. This optimization is exposed in a convenient way by providing friendly client properties on an object that mutate the key appropriately but are not actually persisted to the service.

Note: Take care when choosing to provide setters on columns backed by keys which could cause failures for some operations (delete, merge, replace) since you are effectively changing the identity of the entity.

The sample below illustrates how to provide friendly accessors to complex keys. When only providing getters the @Ignore annotation is optional, since the serializer will not use properties that do not expose a corresponding setter.

Sample – Complex Properties on a POJO using the @Ignore Annotation

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

public class Person extends TableServiceEntity {
    @Ignore
    public String getState() {
        return this.getPartitionKey().substring(0, this.getPartitionKey().indexOf(";"));
    }

    @Ignore
    public String getCity() {
        return this.getPartitionKey().substring(this.getPartitionKey().indexOf(";") + 1);
    }
}

Persisting 3^rd party objects

In some cases we may need to persist objects exposed by 3^rd party libraries, or those which do not fit the requirements of a TableEntity and cannot be modified to do so. In such cases, the recommended best practice is to encapsulate the 3^rd party object in a new client object that implements the TableEntity interface, and provide the custom serialization logic needed to persist the object to the table service via TableEntity.readEntity and TableEntity.writeEntity.

Note: when implementing readEntity/writeEntity, TableServiceEntity provides two static helper methods (readEntityWithReflection and writeEntityWithReflection) that expose the default reflection based serialization which will use the same rules as previously discussed.

Best Practices

When persisting inner classes they must be marked static and provide a nullary constructor to enable deserialization.
Consider batch restrictions when developing your application. While a single entity may be up to 1 MB and a batch can contain 100 operations, the 4 MB payload limit on a batch operation may decrease the total number of operations allowed in a single batch. All operations in a given batch must address entities that have identical PartitionKey values.
Class types should initialize property values to null / default. The Table service will not send null / removed properties to the client which will fail to overwrite these properties on the client side. As such, it is possible to perceive a data loss in this scenario as the non-default properties will have values that do not exist in the received entity.
Take Count on TableQuery is applied to each request and not rewritten between requests. If used in conjunction with the non-segmented execute method this will effectively alter the page size and not the maximum results. For example if we define a TableQuery with take(5) and executes it via executeSegmented we will receive 5 results (potentially less if there is a continuation token involved). However if we enumerate results via the Iterator returned by the execute method then we will eventually receive all results in the table 5 at a time. Please be aware of this distinction.
When implementing readEntity or working with DynamicTableEntity the user should always assume a given property does not exist in the HashMap as it may have been removed by another client or not selected via a projected query. Therefore, it is considered best practice to check for the existence of a property in the HashMap prior to retrieving it.
The EntityProperty class is utilized during serialization to encapsulate a given property for an entity and stores data in its serialized String form. Subsequently, each call to a get method will deserialize the data and each call to a setter / constructor will serialize it. Avoid repeated updates directly on an EntityProperty wherever possible. If your application needs to make repeated updates / reads to a property on a persisted type, use a POJO object directly.
The @StoreAs annotation is provided to customize serialization which can be utilized to provide friendly client side property names and potentially increase performance by decreasing payload size. For example, if there is an entity with many long named properties such as customerEmailAddress we could utilize the @StoreAs annotation to persist this property under the name “cea” which would decrease every payload by 17 bytes for this single property alone. For large entities with numerous properties the latency and bandwidth savings can become significant. Note: the @StoreAs annotation cannot be used to write the PartitionKey, RowKey, or Timestamp as these properties are written separately: attempting to do so will cause the annotated property to be skipped during serialization. To accomplish this scenario provide a friendly client side property annotated with the @Ignore annotation and set the PartitionKey, RowKey, or Timestamp property internally.

Table Samples

As part of the release of the Windows Azure Storage Client for Java we have provided a series of samples that address some common scenarios that users may encounter when developing cloud applications.

Setup

Download the samples jar
Configure the classpath to include the Windows Azure Storage Client for Java, which can be downloaded here.
Edit the Utility.java file to specify your connection string in storageConnectionString. Alternatively if you want to use local storage emulator that ships as part of the Windows Azure SDK you can uncomment the specified key in Utility.java.
Execute each sample via eclipse or command line. For some blob samples some command line arguments are required.

Samples

TableBasics - This sample illustrates basic use of the Table primitives provided. Scenarios covered are:

How to create a table client
Insert an entity and retrieve it
Insert a batch of entities and query against them
Projection (server and client side)
DynamicUpdate – update entities regardless of types using DynamicTableEntity and projection to optimize performance.

OnlineStoreSample – This sample illustrates a common scenario when using a schema-less datastore. In this example we define both customers and orders which are stored in the same table. By utilizing the EntityResolver we can query against the table and retrieve the heterogeneous entity collection in a type safe way.

Summary

This blog post has provided an in-depth overview of the table client in the recently released Windows Azure Storage Client for Java. We continue to maintain and evolve the libraries we provide based on upcoming features and customer feedback. Feel free to leave comments below,

Joe Giardino
Developer
Windows Azure Storage

Resources

Get the Windows Azure SDK for Java

Learn more about the Windows Azure Storage Client for Java

Learn more about Windows Azure Storage

↧

Windows Azure Storage Client for Java Overview

March 5, 2012, 9:49 am

≫ Next: Getting the Page Ranges of a Large Page Blob in Segments

≪ Previous: Windows Azure Storage Client for Java Tables Deep Dive

We released the Storage Client for Java with support for Windows Azure Blobs, Queues, and Tables. Our goal is to continue to improve the development experience when writing cloud applications using Windows Azure Storage. This release is a Community Technology Preview (CTP) and will be supported by Microsoft. As such, we have incorporated feedback from customers and forums for the current .NET libraries to help create a more seamless API that is both powerful and simple to use. This blog post serves as an overview of the library and covers some of the implementation details that will be helpful to understand when developing cloud applications in Java. Additionally, we’ve provided two additional blog posts that cover some of the unique features and programming models for the blob and table service.

Packages

The Storage Client for Java is distributed in the Windows Azure SDK for Java jar (see below for locations). For the optimal development experience import the client sub package directly (com.microsoft.windowsazure.services.[blob|queue|table].client). This blog post refers to this client layer.

The relevant packages are broken up by service:

Common

com.microsoft.windowsazure.services.core.storage – This package contains all storage primitives such as CloudStorageAccount, StorageCredentials, Retry Policies, etc.

Services

com.microsoft.windowsazure.services.blob.client – This package contains all the functionality for working with the Windows Azure Blob service, including CloudBlobClient, CloudBlob, etc.

com.microsoft.windowsazure.services.queue.client – This package contains all the functionality for working with the Windows Azure Queue service, including CloudQueueClient, CloudQueue, etc.

Services

While this document describes the common concepts for all of the above packages, it’s worth briefly summarizing the capabilities of each client library. Blob and Table each have some interesting features that warrant further discussion. For those, we’ve provided additional blog posts linked below. The client API surface has been designed to be easy to use and approachable, however to accommodate more advanced scenarios we have provided optional extension points when necessary.

Blob

The Blob API supports all of the normal Blob Operations (upload, download, snapshot, set/get metadata, and list), as well as the normal container operations (create, delete, list blobs). However we have gone a step farther and also provided some additional conveniences such as Download Resume, Sparse Page Blob support, simplified MD5 scenarios, and simplified access conditions.

To better explain these unique features of the Blob API, we have published an additional blog post which discusses these features in detail. You can also see additional samples in our article How to Use the Blob Storage Service from Java.

Sample – Upload a File to a Block Blob

// You will need these imports
import com.microsoft.windowsazure.services.blob.client.CloudBlobClient;
import com.microsoft.windowsazure.services.blob.client.CloudBlobContainer;
import com.microsoft.windowsazure.services.blob.client.CloudBlockBlob;
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;

// Initialize Account
CloudStorageAccount account = CloudStorageAccount.parse([ACCOUNT_STRING]);

// Create the blob client
CloudBlobClient blobClient = account.createCloudBlobClient();

// Retrieve reference to a previously created container
CloudBlobContainer container = blobClient.getContainerReference("mycontainer");

// Create or overwrite the "myimage.jpg" blob with contents from a local
// file
CloudBlockBlob blob = container.getBlockBlobReference("myimage.jpg");
File source = new File("c:\\myimages\\myimage.jpg");
blob.upload(new FileInputStream(source), source.length());

(Note: It is best practice to always provide the length of the data being uploaded if it is available; alternatively a user may specify -1 if the length is not known)

Table

The Table API provides a minimal client surface that is incredibly simple to use but still exposes enough extension points to allow for more advanced “NoSQL” scenarios. These include built in support for POJO, HashMap based “property bag” entities, and projections. Additionally, we have provided optional extension points to allow clients to customize the serialization and deserialization of entities which will enable more advanced scenarios such as creating composite keys from various properties etc.

Due to some of the unique scenarios listed above the Table service has some requirements and capabilities that differ from the Blob and Queue services. To better explain these capabilities and to provide a more comprehensive overview of the Table API we have published an in depth blog post which includes the overall design of Tables, the relevant best practices, and code samples for common scenarios. You can also see more samples in our article How to Use the Table Storage Service from Java.

Sample – Upload an Entity to a Table

// You will need these imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;

// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.parse([ACCOUNT_STRING]);

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
         
// Create a new customer entity.
CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");
customer1.setEmail("Walter@contoso.com");
customer1.setPhoneNumber("425-555-0101");

// Create an operation to add the new customer to the people table.
TableOperation insertCustomer1 = TableOperation.insert(customer1);

// Submit the operation to the table service.
tableClient.execute("people", insertCustomer1);

Queue

The Queue API includes convenience methods for all of the functionality available through REST. Namely creating, modifying and deleting queues, adding, peeking, getting, deleting, and updating messages, and also getting the message count. Here is a sample of creating a queue and adding a message, and you can also read How to Use the Queue Storage Service from Java.

Sample – Create a Queue and Add a Message to it

// You will need these imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.queue.client.CloudQueue;
import com.microsoft.windowsazure.services.queue.client.CloudQueueClient;
import com.microsoft.windowsazure.services.queue.client.CloudQueueMessage;
// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.parse([ACCOUNT_STRING]);

// Create the queue client
CloudQueueClient queueClient = storageAccount.createCloudQueueClient();

// Retrieve a reference to a queue
CloudQueue queue = queueClient.getQueueReference("myqueue");

// Create the queue if it doesn't already exist
queue.createIfNotExist();

// Create a message and add it to the queue
CloudQueueMessage message = new CloudQueueMessage("Hello, World");
queue.addMessage(message);

Design

When designing the Storage Client for Java, we set up a series of design guidelines to follow throughout the development process. In order to reflect our commitment to the Java community working with Azure, we decided to design an entirely new library from the ground up that would feel familiar to Java developers. While the basic object model is somewhat similar to our .NET Storage Client Library there have been many improvements in functionality, consistency, and ease of use which will address the needs of both advanced users and those using the service for the first time.

Guidelines

Convenient and performant – This default implementation is simple to use, however we will always be able to support the performance-critical scenarios. For example, Blob upload APIs require a length of data for authentication purposes. If this is unknown a user may pass -1, and the library will calculate this on the fly. However, for performance critical applications it is best to pass in the correct number of bytes.
Users own their requests – We have provided mechanisms that will allow users to determine the exact number of REST calls being made, the associated request ids, HTTP status codes, etc. (See OperationContext in the Object Model discussion below for more). We have also annotated every method that will potentially make a REST request to the service with the @DoesServiceRequest annotation. This all ensures that you, the developer, are able to easily understand and control the requests made by your application, even in scenarios like Retry, where the Java Storage Client library may make multiple calls before succeeding.
· Look and feel –

Naming is consistent. Logical antonyms are used for complimentary actions (i.e. upload and download, create and delete, acquire and release)
get/set prefixes follow Java conventions and are reserved for local client side “properties”
Minimal overloads per method. One with the minimum set of required parameters and one overload including all optional parameters which may be null. The one exception is listing methods have 2 minimum overloads to accommodate the common scenario of listing with prefix.

Minimal API Surface – In order to keep the API surface smaller we have reduced the number of extraneous helper methods. For example, Blob contains a single upload and download method that use Input / OutputStreams. If a user wishes to handle data in text or byte form, they can simply pass in the relevant stream.
Provide advanced features in a discoverable way – In order to keep the core API simple and understandable advanced features are exposed via either the RequestOptions or optional parameters.
Consistent Exception Handling - The library immediately will throw any exception encountered prior to making the request to the server. Any exception that occurs during the execution of the request will subsequently be wrapped inside a StorageException.
Consistency – objects are consistent in their exposed API surface and functionality. For example a Blob, Container, or Queue all expose an exists() method

Object Model

The Storage Client for Java uses local client side objects to interact with objects that reside on the server. There are additional features provided to help determine if an operation should execute, how it should execute, as well as provide information about what occurred when it was executed. (See Configuration and Execution below)

Objects

StorageAccount

The logical starting point is a CloudStorageAccount which contains the endpoint and credential information for a given storage account. This account then creates logical service clients for each appropriate service: CloudBlobClient, CloudQueueClient, and CloudTableClient. CloudStorageAccount also provides a static factory method to easily configure your application to use the local storage emulator that ships with the Windows Azure SDK.

A CloudStorageAccount can be created by parsing an account string which is in the format of:

"DefaultEndpointsProtocol=http[s];AccountName=<account name>;AccountKey=<account key>"

Optionally, if you wish to specify a non-default DNS endpoint for a given service you may include one or more of the following in the connection string.

“BlobEndpoint=<endpoint>”, “QueueEndpoint=<endpoint>”, “TableEndpoint=<endpoint>”

Sample – Creating a CloudStorageAccount from an account string

// Initialize Account
CloudStorageAccount account = CloudStorageAccount.parse([ACCOUNT_STRING]);

ServiceClients

Any service wide operation resides on the service client. Default configuration options such as timeout, retry policy, and other service specific settings that objects associated with the client will reference are stored here as well.

For example:

To turn on Storage Analytics for the blob service a user would call CloudBlobClient.uploadServiceProperties(properties)
To list all queues a user would call CloudQueueClient.listQueues()
To set the default timeout to 30 seconds for objects associated with a given client a user would call Cloud[Blob|Queue|Table]Client.setTimeoutInMs(30 * 1000)

Cloud Objects

Once a user has created a service client for the given service it’s time to start directly working with the Cloud Objects of that service. A CloudObject is a CloudBlockBlob, CloudPageBlob, CloudBlobContainer, and CloudQueue, each of which contains methods to interact with the resource it represents in the service.

Below are basic samples showing how to create a Blob Container, a Queue, and a Table. See the samples in the Services section for examples of how to interact with a CloudObjects.

Blobs

// Retrieve reference to a previously created container
CloudBlobContainer container = blobClient.getContainerReference("mycontainer");

// Create the container if it doesn't already exist
container.createIfNotExist()

Queues

// Retrieve a reference to a queue
CloudQueue queue = queueClient.getQueueReference("myqueue");

// Create the queue if it doesn't already exist
queue.createIfNotExist();

Tables

Note: You may notice that unlike blob and queue the table service does not use a CloudObject to represent an individual table, this is due to the unique nature of the table service which will is covered more in depth in the Tables deep dive blog post. Instead, table operations are performed via the CloudTableClient object:

// Create the table if it doesn't already exist
tableClient.createTableIfNotExists("people");

Configuration and Execution

In each maximum overload of each method provided in the library you will note there are two or three extra optional parameters provided depending on the service, all of which accept null to allow users to utilize just a subset of the features they require. For example to utilize only RequestOptions simply pass in null to AccessCondition and OperationContext. These objects for these optional parameters provide the user an easy way to determine if an operation should execute, how to execute it, and retrieve additional information about how it was executed when it completes.

AccessCondition

An AccessCondition’s primary purpose is to determine if an operation should execute, and is supported when using the Blob service. Specifically, AccessCondition encapsulates Blob leases as well as the If-Match, If-None-Match, If-Modified_Since, and the If-Unmodified-Since HTTP headers. An AccessCondition may be reused across operations as long as the given condition is still valid. For example, a user may only wish to delete a blob if it hasn’t been modified since last week. By using an AccessCondition, the library will send the HTTP "If-Unmodified-Since" header to the server which may not process the operation if the condition is not true. Additionally, blob leases can be specified through an AccessCondition so that only operations from users holding the appropriate lease on a blob may succeed.

AccessCondition provides convenient static factory methods to generate an AccessCondition instance for the most common scenarios (IfMatch, IfNoneMatch, IfModifiedSince, IfNotModifiedSince, and Lease) however it is possible to utilize a combination of these by simply calling the appropriate setter on the condition you are using.

The following example illustrates how to use an AccessCondition to only upload the metadata on a blob if it is a specific version.

blob.uploadMetadata(AccessCondition.generateIfMatchCondition(currentETag), null /* RequestOptions */, null/* OperationContext */);

Here are some Examples:

//Perform Operation if the given resource is not a specified version:
AccessCondition.generateIfNoneMatchCondition(eTag)

//Perform Operation if the given resource has been modified since a given date:
AccessCondition. generateIfModifiedSinceConditionlastModifiedDate)

//Perform Operation if the given resource has not been modified since a given date:
AccessCondition. generateIfNotModifiedSinceCondition(date)

//Perform Operation with the given lease id (Blobs only):
AccessCondition. generateLeaseCondition(leaseID)

//Perform Operation with the given lease id if it has not been modified since a given date:
AccessCondition condition = AccessCondition. generateLeaseCondition (leaseID);
condition. setIfUnmodifiedSinceDate(date);

RequestOptions

Each Client defines a service specific RequestOptions (i.e. BlobRequestOptions, QueueRequestOptions, and TableRequestOptions) that can be used to modify the execution of a given request. All service request options provide the ability to specify a different timeout and retry policy for a given operation; however some services may provide additional options. For example the BlobRequestOptions includes an option to specify the concurrency to use when uploading a given blob. RequestOptions are not stateful and may be reused across operations. As such, it is common for applications to design RequestOptions for different types of workloads. For example an application may define a BlobRequestOptions for uploading large blobs concurrently, and a BlobRequestOptions with a smaller timeout when uploading metadata.

The following example illustrates how to use BlobRequestOptions to upload a blob using up to 8 concurrent operations with a timeout of 30 seconds each.

BlobRequestOptions options = new BlobRequestOptions();

// Set ConcurrentRequestCount to 8
options.setConcurrentRequestCount(8);

// Set timeout to 30 seconds
options.setTimeoutIntervalInMs(30 * 1000); 

blob.upload(new ByteArrayInputStream(buff),
     blobLength,
     null /* AccessCondition */,
     options,
     null /* OperationContext */);

OperationContext

The OperationContext is used to provide relevant information about how a given operation executed. This object is by definition stateful and should not be reused across operations. Additionally the OperationContext defines an event handler that can be subscribed to in order to receive notifications when a response is received from the server. With this functionality, a user could start uploading a 100 GB blob and update a progress bar after every 4 MB block has been committed.

Perhaps the most powerful function of the OperationContext is to provide the ability for the user to inspect how an operation executed. For each REST request made against a server, the OperationContext stores a RequestResult object that contains relevant information such as the HTTP status code, the request ID from the service, start and stop date, etag, and a reference to any exception that may have occurred. This can be particularly helpful to determine if the retry policy was invoked and an operation took more than one attempt to succeed. Additionally, the Service Request ID and start/end times are useful when escalating an issue to Microsoft.

The following example illustrates how to use OperationContext to print out the HTTP status code of the last operation.

OperationContext opContext = new OperationContext();
queue.createIfNotExist(null /* RequestOptions */, opContext);
System.out.println(opContext.getLastResult().getStatusCode());

Retry Policies

Retry Policies have been engineered so that the policies can evaluate whether to retry on various HTTP status codes. Although the default policies will not retry 400 class status codes, a user can override this behavior by creating their own retry policy. Additionally, RetryPolicies are stateful per operation which allows greater flexibility in fine tuning the retry policy for a given scenario.

The Storage Client for Java ships with 3 standard retry policies which can be customized by the user. The default retry policy for all operations is an exponential backoff with up to 3 additional attempts as shown below:

new RetryExponentialRetry(  
    3000 /* minBackoff in milliseconds */,
    30000 /* delatBackoff in milliseconds */,
    90000 /* maxBackoff in milliseconds */,
    3 /* maxAttempts */);

With the above default policy, the retry will approximately occur after: 3,000ms, 35,691ms and 90,000ms

If the number of attempts should be increased, one can use the following:

new RetryExponentialRetry(  
    3000 /* minBackoff in milliseconds */,
    30000 /* delatBackoff in milliseconds */,
    90000 /* maxBackoff in milliseconds */,
    6 /* maxAttempts */);

With the above policy, the retry will approximately occur after: 3,000ms, 28,442ms and 80,000ms, 90,000ms, 90,000ms and 90,000ms.

NOTE: the time provided is an approximation because the exponential policy introduces a +/-20% random delta as described below.

NoRetry - Operations will not be retried

LinearRetry - Represents a retry policy that performs a specified number of retries, using a specified fixed time interval between retries.

ExponentialRetry (default) - Represents a retry policy that performs a specified number of retries, using a randomized exponential backoff scheme to determine the interval between retries. This policy introduces a +/- %20 random delta to even out traffic in the case of throttling.

A user can configure the retry policy for all operations directly on a service client, or specify one in the RequestOptions for a specific method call. The following illustrates how to configure a client to use a linear retry with a 3 second backoff between attempts and a maximum of 3 additional attempts for a given operation.

serviceClient.setRetryPolicyFactory(new RetryLinearRetry(3000,3));

TableRequestOptions options = new TableRequestOptions();
options.setRetryPolicyFactory(new RetryLinearRetry(3000, 3));

Custom Policies

There are two aspects of a retry policy, the policy itself and an associated factory. To implement a custom interface a user must derive from the abstract base class RetryPolicy and implement the relevant methods. Additionally, an associated factory class must be provided that implements the RetryPolicyFactory interface to generate unique instances for each logical operation. For simplicities sake the policies mentioned above implement the RetryPolicyFactory interface themselves, however it is possible to use two separate classes

Note about .NET Storage Client

During the development of the Java library we have identified many substantial improvements in the way our API can work. We are committed to bringing these improvements back to .NET while keeping in mind that many clients have built and deployed applications on the current API, so stay tuned.

Summary

We have put a lot of work into providing a truly first class development experience for the Java community to work with Windows Azure Storage. We very much appreciate all the feedback we have gotten from customers and through the forums, please keep it coming. Feel free to leave comments below,

Joe Giardino
Developer
Windows Azure Storage

Resources

Get the Windows Azure SDK for Java

Learn more about the Windows Azure Storage Client for Java

Learn more about Windows Azure Storage

↧

Getting the Page Ranges of a Large Page Blob in Segments

March 26, 2012, 8:38 am

≫ Next: CloudDrive::Mount() API takes a long time when the drive has millions of files

≪ Previous: Windows Azure Storage Client for Java Overview

One of the blob types supported by Windows Azure Storage is the Page Blob. Page Blobs provide efficient storage of sparse data by physically storing only pages that have been written and not cleared. Each page is 512 bytes in size. The Get Page Ranges REST service call returns a list of all contiguous page ranges that contain valid data. In the Windows Azure Storage Client Library, the method GetPageRanges exposes this functionality.

Get Page Ranges may fail in certain circumstances where the service takes too long to process the request. Like all Blob REST APIs, Get Page Ranges takes a timeout parameter that specifies the time a request is allowed, including the reading/writing over the network. However, the server is allowed a fixed amount of time to process the request and begin sending the response. If this server timeout expires then the request fails, even if the time specified by the API timeout parameter has not elapsed.

In a highly fragmented page blob with a large number of writes, populating the list returned by Get Page Ranges may take longer than the server timeout and hence the request will fail. Therefore, it is recommended that if your application usage pattern has page blobs with a large number of writes and you want to call GetPageRanges, then your application should retrieve a subset of the page ranges at a time.

For example, suppose a 500 GB page blob was populated with 500,000 writes throughout the blob. By default the storage client specifies a timeout of 90 seconds for the Get Page Ranges operation. If Get Page Ranges does not complete within the server timeout interval then the call will fail. This can be solved by fetching the ranges in groups of, say, 50 GB. This splits the work into ten requests. Each of these requests would then individually complete within the server timeout interval, allowing all ranges to be retrieved successfully.

To be certain that the requests complete within the server timeout interval, fetch ranges in segments spanning 150 MB each. This is safe even for maximally fragmented page blobs. If a page blob is less fragmented then larger segments can be used.

Client Library Extension

We present below a simple extension method for the storage client that addresses this issue by providing a rangeSize parameter and splitting the requests into ranges of the given size. The resulting IEnumerable object lazily iterates through page ranges, making service calls as needed.

As a consequence of splitting the request into ranges, any page ranges that span across the rangeSize boundary are split into multiple page ranges in the result. Thus for a range size of 10 GB, the following range spanning 40 GB

[0 – 42949672959]

would be split into four ranges spanning 10 GB each:

[0 – 10737418239]
[10737418240 – 21474836479]
[21474836480 – 32212254719]
[32212254720 – 42949672959].

With a range size of 20 GB the above range would be split into just two ranges.

Note that a custom timeout may be used by specifying a BlobRequestOptions object as a parameter, but the method below does not use any retry policy. The specified timeout is applied to each of the service calls individually. If a service call fails for any reason then GetPageRanges throws an exception.

namespace Microsoft.WindowsAzure.StorageClient
{
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Net;
    using Microsoft.WindowsAzure.StorageClient.Protocol;
 
    /// <summary>
    /// Class containing an extension method for the <see cref="CloudPageBlob"/> class.
    /// </summary>
    public static class CloudPageBlobExtensions
    {
        /// <summary>
        /// Enumerates the page ranges of a page blob, sending one service call as needed for each
        /// <paramref name="rangeSize"/> bytes.
        /// </summary>
        /// <param name="pageBlob">The page blob to read.</param>
        /// <param name="rangeSize">The range, in bytes, that each service call will cover. This must be a multiple of
        ///     512 bytes.</param>
        /// <param name="options">The request options, optionally specifying a timeout for the requests.</param>
        /// <returns>An <see cref="IEnumerable"/> object that enumerates the page ranges.</returns>
        public static IEnumerable<PageRange> GetPageRanges(
            this CloudPageBlob pageBlob,
            long rangeSize,
            BlobRequestOptions options)
        {
            int timeout;
 
            if (options == null || !options.Timeout.HasValue)
            {
                timeout = (int)pageBlob.ServiceClient.Timeout.TotalSeconds;
            }
            else
            {
                timeout = (int)options.Timeout.Value.TotalSeconds;
            }
 
            if ((rangeSize % 512) != 0)
            {
                throw new ArgumentOutOfRangeException("rangeSize", "The range size must be a multiple of 512 bytes.");
            }
 
            long startOffset = 0;
            long blobSize;
 
            do
            {
                // Generate a web request for getting page ranges
                HttpWebRequest webRequest = BlobRequest.GetPageRanges(
                    pageBlob.Uri,
                    timeout,
                    pageBlob.SnapshotTime,
                    null /* lease ID */);
 
                // Specify a range of bytes to search
                webRequest.Headers["x-ms-range"] = string.Format(
                    "bytes={0}-{1}",
                    startOffset,
                    startOffset + rangeSize - 1);
 
                // Sign the request
                pageBlob.ServiceClient.Credentials.SignRequest(webRequest);
 
                List<PageRange> pageRanges;
 
                using (HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse())
                {
                    // Refresh the size of the blob
                    blobSize = long.Parse(webResponse.Headers["x-ms-blob-content-length"]);
 
                    GetPageRangesResponse getPageRangesResponse = BlobResponse.GetPageRanges(webResponse);
 
                    // Materialize response so we can close the webResponse
                    pageRanges = getPageRangesResponse.PageRanges.ToList();
                }
 
                // Lazily return each page range in this result segment.
                foreach (PageRange range in pageRanges)
                {
                    yield return range;
                }
 
                startOffset += rangeSize;
            }
            while (startOffset < blobSize);
        }
    }
}

Usage Examples:

pageBlob.GetPageRanges(10 * 1024 * 1024 * 1024 /* 10 GB */, null);
pageBlob.GetPageRanges(150 * 1024 * 1024 /* 150 MB */, options /* custom timeout in options */);

Summary

For some fragmented page blobs, the GetPageRanges API call might not complete within the maximum server timeout interval. To solve this, the page ranges can be incrementally fetched for a fraction of the page blob at a time, thus decreasing the time any single service call takes. We present an extension method implementing this technique in the Windows Azure Storage Client Library.

Michael Roberson

↧

CloudDrive::Mount() API takes a long time when the drive has millions of files

April 26, 2012, 6:50 pm

≫ Next: CloudDrive: Possible Data Loss when calling Create() or CreateIfNotExist() on existing drives

≪ Previous: Getting the Page Ranges of a Large Page Blob in Segments

Windows Azure Drive is in Preview, and we have identified an issue with the CloudDrive::Mount() API where it will take 5 to 20 minutes to mount a drive that contains millions of files. In these cases, the majority of time used by CloudDrive::Mount is spent updating the ACLs (access control lists) on all the files on the drive. The Mount() API attempts to change these ACLs on the root of the drive so that lower privileged roles (web and worker roles) will be able to access the contents of the drive after it is mounted. However, the default setting for ACLs on NTFS is to inherit the ACLs from the parent, so these ACL changes are then propagated to all files on the drive.

The workaround for this issue is to mount the drive once the slow way, and then permanently break the ACL inheritance chain on the drive. At that point, the CloudDrive::Mount() API should always take less than one minute to mount the drive.

To break the ACL inheritance chain perform the following steps:

Mount the drive
Open a command shell
Run the following commands (assuming that z: is where the drive is mounted):
z:
cd \
icacls.exe * /inheritance:d
icacls.exe will print out a list of files and directories it is processing, followed by some statistics:
processed file: dir1
processed file: dir2
processed file: dir3
processed file: dir4
processed file: dir5
Successfully processed 5 files; Failed processing 0 files
Finally you should unmount the drive.

Once you have done the above, subsequent calls to CloudDrive::Mount will be faster.

Andrew Edwards

↧

CloudDrive: Possible Data Loss when calling Create() or CreateIfNotExist() on existing drives

April 26, 2012, 6:59 pm

≫ Next: PartitionKey or RowKey containing the percent ‘%’ character causes some Windows Azure Tables APIs to fail

≪ Previous: CloudDrive::Mount() API takes a long time when the drive has millions of files

Windows Azure Drive is in Preview, and we recently identified a timing bug in the CloudDrive Client Library (SDK 1.6 and earlier) which can cause your CloudDrive to be accidentally deleted when you call ‘Create()’ or ‘CreateIfNotExist()’ on an existing drive. For your existing drive to be accidently deleted, there must be a period of unavailability of your Windows Azure Storage account during the call to ‘Create()’ or ‘CreateIfNotExist()’.

Your service is more likely to hit this bug if you frequently call ‘Create()’, which is sometimes done if you use the following pattern where you call ‘Create()’ before you call ‘Mount()’ to ensure that the drive exists before you try to mount it:

try
{
    drive.Create(...);
}
catch(CloudDriveException)
{
    ...
}

drive.Mount(...);

Another common pattern can occur when using the new ‘CreateIfNotExist()’ API followed by a ‘Mount()’ call:

drive.CreateIfNotExist(...);
drive.Mount(...);

We will fix this timing bug in SDK 1.7.

To avoid this timing bug now, you should add a test for the existence of the blob before attempting to create it using the following code:

CloudPageBlob pageBlob =
    new CloudPageBlob(drive.Uri.AbsoluteUri, drive.Credentials);

try
{
    pageBlob.FetchAttributes();
}
catch (StorageClientException ex)
{
    if (ex.ErrorCode.Equals(StorageErrorCode.ResourceNotFound))
    {
        // Blob not found, try to create it
        drive.Create(...);
    }
}

Andrew Edwards

↧

PartitionKey or RowKey containing the percent ‘%’ character causes some Windows Azure Tables APIs to fail

May 28, 2012, 3:12 pm

≫ Next: Character Encoding Issues Related to Copy Blob API

≪ Previous: CloudDrive: Possible Data Loss when calling Create() or CreateIfNotExist() on existing drives

Description and Symptoms

We have identified an issue that would affect services using Windows Azure Tables whenever the percent character ‘%’ appears as part of the PartitionKey or RowKey.

The affected APIs are GET entity, Merge Entity, Update Entity, Delete Entity, Insert Or Merge Entity and Insert Or Replace Entity APIs. If any of these APIs are invoked with a PartitionKey or RowKey that contains the ‘%’ character, the user could erroneously receive a 404 Not Found or 400 Bad Request error code. In addition, in the case of upsert (Insert Or Merge Entity and Insert Or Replace APIs), the request might succeed but the stored string might not be what the user intended it to be.

Note that Insert Entity, Entity Group Transactions and Query Entities APIs are not affected since the PartitionKey and RowKey is not part of the URL path segment.

Root Cause

The Windows Azure Table Service is double decoding the URL path segment when processing a request which is resulting in an erroneous interpretation of the string whenever the ‘%’ character appears. Note that the query string portion of the URL is not affected by this issue nor is any URL that appears as part of the HTTP body. Therefore, any other property filters used in a query will be unaffected by this issue – only PartitionKey and RowKey are affected.

Here is an example of how this issue occurs: Inserting an entity with PartitionKey = “Metric%25” and RowKey = “Count” would succeed, since PartitionKey, RowKey and custom values are part of the request payload and not the URL path segment. Now, when you intend to retrieve this existing entity, the Get Entity HTTP URL will look like:

http://foo.table.core.windows.net/Metrics(PartitionKey='Metric%2525',RowKey='Count')

However due to the double decoding bug, the PartitionKey is getting interpreted as “Metric%” on the server side which is not what the user intended. In this case, a 404 Not Found is returned.

Workarounds

If you did not currently commit any entities where ‘%’ is used as part of the PartitionKey or RowKey we suggest that you consider the following:

Avoid using ‘%’ as part of your PartitionKey and RowKey and consider replacing it with another character, for example ‘-‘.
Consider using URL safe Base64 encoding for your PartitionKey and RowKey values.

Note: Do not double encode your PartitionKey and RowKey values as a workaround, since this would not be compatible with future Windows Azure Tables releases when a fix is applied on the server side.

In case you already have inserted entities where ‘%’ appears as part of the PartitionKey or RowKey, we suggest the following workarounds:

For Get Entity:
- Use the Entity Group Transaction with an inner GET Entity command (refer to the example in the subsequent section)
- Use the Query Entities API by relying on the $Filter when retrieving a single entity. While this is not possible for users of the Windows Azure Storage Client library or the WCF Data Services Client library, this workaround is available to users who have control over the wire protocol. As an example, consider the following URL syntax when querying for the same entity mentioned in the “Root Cause” section above:
  http://foo.table.core.windows.net/Metrics()?$filter=(PartitionKey%20eq%20'Metric%2525')%20and%20(RowKey%20eq%20'Count')
For Update Entity, Merge Entity, Delete Entity, Insert Or Merge Entity and Insert Or Replace Entity APIs, use the Entity Group Transaction with the inner operation that you wish to perform. (refer the example in the subsequent section)

Windows Storage Client Library Workaround Code Example

Consider the case where the user has already inserted an entity with PartitionKey = “Metric%25” and RowKey = “Count”. The following code shows how to use the Windows Azure Storage Client Library in order to retrieve and update that entity. The code uses the Entity Group Transaction workaround mentioned in the previous section. Note that both the Get Entity and Update Entity operations are performed as a batch operation.

// Creating a Table Service Context
TableServiceContext tableServiceContext = new TableServiceContext(tableClient.BaseUri.ToString(), tableClient.Credentials);
 
// Create a single point query
DataServiceQuery<MetricEntity> getEntityQuery = (DataServiceQuery<MetricEntity>) 
     from entity in tableServiceContext.CreateQuery<MetricEntity>(customersTableName)
     where entity.PartitionKey == "Metric%25" && entity.RowKey == "Count"
     select entity;
 
// Create an entity group transaction with an inner Get Entity request
DataServiceResponse batchResponse = tableServiceContext.ExecuteBatch(getEntityQuery);
            
// There is only one response as part of this batch
QueryOperationResponse response = (QueryOperationResponse) batchResponse.First();
 
if (response.StatusCode == (int) HttpStatusCode.OK)
{
    IEnumerator queryResponse = response.GetEnumerator();
    queryResponse.MoveNext();
    // Read this single entity
    MetricEntity  singleEntity = (MetricEntity)queryResponse.Current;
 
    // Updating the entity
    singleEntity.MetricValue = 100;
    tableServiceContext.UpdateObject(singleEntity);
    
    // Make sure to save with the Batch option
    tableServiceContext.SaveChanges(SaveChangesOptions.Batch);
}

Java Storage Client Workaround Code Example

As the issue discussed above is related to the service, the same behavior will exhibit when performing single entity operations using the Storage Client Library for Java. However, it is also possible to use Entity Group Transaction to work around this issue. The latest version that can be used to implement the proposed workaround can be found in here.

// Define a batch operation.
TableBatchOperation batchOperation = new TableBatchOperation();

// Retrieve the entity
batchOperation.retrieve("Metric%25", "Count", MetricEntity.class);

// Submit the operation to the table service.
tableClient.execute("foo", batchOperation);

For more on working with Tables via the Java Storage Client see: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/03/05/windows-azure-storage-client-for-java-tables-deep-dive.aspx

Long Term Fix

We will be fixing this issue as part of a version change in a future release. We will update this post with the storage version that contains the fix.

We apologize for any inconvenience this may have caused.

Jean Ghanem

↧

Character Encoding Issues Related to Copy Blob API

May 28, 2012, 3:23 pm

≫ Next: 10x Price Reduction for Windows Azure Storage Transactions

≪ Previous: PartitionKey or RowKey containing the percent ‘%’ character causes some Windows Azure Tables APIs to fail

This blog applies to the 2011-08-18 storage version or earlier of the Copy Blob API and the Windows Azure Storage Client Library version 1.6.

Two separate problems are discussed in this blog:

Over REST, the service expects the ‘+’ character appearing as part of the x-ms-copy-source header to be percent encoded. When the ‘+’ is not URL encoded, the service would interpret it as space ‘ ’ character.
The Windows Azure Storage Client Library is not URL percent encoding the x-ms-copy-source header value. This leads to a misinterpretation of x-ms-copy-source blob names that include the percent ‘%’ character.

When using Copy Blob, character ‘+’ appearing as part of the x-ms-copy-source header must be URL percent encoded

When using the Copy Blob API, the x-ms-copy-source header value must be URL percent encoded. However, when the server is decoding the string, it is converting character ‘+’ to a space which might not be compatible with the encoding rule applied by the client and in particular, the Windows Azure Storage Client Library.

Example: Assume that an application wants to copy from a source blob with the following key information: AccountName = “foo” ContainerName = “container” BlobName = “audio+video.mp4”

Using the Windows Azure Storage Client Library, the following value for the x-ms-copy-source header is generated and transmitted over the wire:

x-ms-copy-source: /foo/container/audio+video.mp4

When the data is received by the server, the blob name would then be interpreted as “audio video.mp4” which is not what the user intended. A compatible header would be:

x-ms-copy-source: /foo/container/audio%2bvideo.mp4

In that case, the server when decoding this header would interpret the blob name correctly as “audio+video.mp4”

NOTE: The described server behavior in this blog does not apply to the request URL but only applies to the x-ms-copy-source header that is used as part of the Copy Blob API with version 2011-08-18 or earlier.

To get correct Copy Blob behavior, please consider applying the following encoding rules for the x-ms-copy-source header:

URL percent encode character ‘+’ to “%2b”.
URL percent encode space i.e. character ‘ ‘ to “%20”. Note that if you currently happen to encode character space to character ‘+’, the current server behavior will interpret it as a space when decoding. However, this behavior is not compatible with the rule to decode request URLs where character ‘+’ is kept as a ‘+’ after decoding.
In case you are using the Windows Azure Storage Client Library, please apply the workaround at the end of this post.

Windows Azure Storage Client Library is not URL encoding the x-ms-copy-source header

As described in the previous section, x-ms-copy-source header must be URL percent encoded. However the Windows Azure Storage Client Library is transmitting the blob name in an un-encoded manner. Therefore any blob name that has percent ‘%’ in its name followed by a hex number will be misinterpreted on the server side.

Example: Assume that an application wants to copy from a source blob with the following key information: AccountName = “foo” ContainerName = “container” BlobName = “data%25.txt”

Using the Windows Azure Storage Client Library, the following un-encoded value for the x-ms-copy-source header is generated and transmitted over the wire:

x-ms-copy-source: /foo/container/data%25.txt

Data received by the server will be URL decoded and therefore the blob name would be interpreted as “data%.txt” which is not what the user intended. A compatible header would be:

x-ms-copy-source: /foo/container/data%2525.txt

In that case, the server when decoding this header would interpret the blob name correctly as “data%25.txt”

Note that this bug exists in Version 1.6 of the client library and will be fixed in future releases.

As described in the previous sections, the current behavior of Copy Blob APIs exposed by the client library will not work properly in case the characters ‘+’ or ‘%’ appear as part of the source blob name. The affected APIs are CloudBlob.CopyFromBlob and CloudBlob.BeginCopyFromBlob.

To get around this issue, we have provided the following extension method which creates a safe CloudBlob object that can be used as the sourceBlob with any of the copy blob APIs. Please note that the returned object should not be used to access the blob or to perform any action on it.

Note: This workaround is needed for Windows Azure Storage Library version 1.6.

Windows Azure Storage Client Library Code Workaround

Note: This workaround is needed for Windows Azure Storage Library version 1.6.

public static class CloudBlobCopyExtensions
{
    /// <summary>
    /// This method converts a CloudBlob to a version that can be safely used as a source for the CopyFromBlob or BeginCopyFromBlob APIs only.
    /// The returned object must not be used to access the blob, neither should any of its API be invoked.
    /// This method should only be used against storage version 2011-08-18 or earlier
    /// and with Windows Azure Storage Client Versions 1.6     /// </summary>
    /// <param name="originBlob">The source blob this being copied</param>
    /// <returns>CloudBlob that can be safely used as a source for the CopyFromBlob or BeginCopyFromBlob APIs only.</returns>
    public static CloudBlob GetCloudBlobReferenceAsSourceBlobForCopy(this CloudBlob originBlob)
        {
            UriBuilder uriBuilder = new UriBuilder();
            Uri srcUri = originBlob.Uri;
 
            // Encode the segment using UrlEncode
            string encodedBlobName = HttpUtility.UrlEncode(
                                        HttpUtility.UrlEncode(
                                            originBlob.Name));
 
            string firstPart = srcUri.OriginalString.Substring(
                0, srcUri.OriginalString.Length - Uri.EscapeUriString(originBlob.Name).Length);
            string encodedUrl = firstPart + encodedBlobName;
 
            return new CloudBlob(
                encodedUrl,
                originBlob.SnapshotTime,
                originBlob.ServiceClient);
        }

}

Here is how the above method can be used:

// Create a blob by uploading data to it
CloudBlob someBlob = container.GetBlobReference("a+b.txt");
someBlob.UploadText("test");
 
CloudBlob destinationBlob = container.GetBlobReference("a+b(copy).txt");
                
// The below object should only be used when issuing a copy. Do not use sourceBlobForCopy to access the blob
CloudBlob sourceBlobForCopy = someBlob.GetCloudBlobReferenceAsSourceBlobForCopy();
destinationBlob.CopyFromBlob(sourceBlobForCopy);

We will update this blog once we have fixed the service. We apologize for any inconvenience that this may have caused.

Jean Ghanem

↧

10x Price Reduction for Windows Azure Storage Transactions

June 7, 2012, 10:22 pm

≫ Next: New Storage Features on the Windows Azure Portal

≪ Previous: Character Encoding Issues Related to Copy Blob API

We heard you loud and clear that you want cheaper transaction costs for Windows Azure Blobs, Tables, Queues, and Drives. We are therefore very pleased today to slash transaction prices 10 fold for Windows Azure Storage and CDN. This means that it now costs $0.01 for 100,000 transactions ($1 per 10 million). This applies to all transactions for Windows Azure Storage Blobs (PutBlob, GetBlob, DeleteBlob, etc), Tables (PutEntity, GetEntity, Table Queries, Batch Transactions, etc), Queues (PutMessage, GetMessage, DeleteMessage, UpdateMessage, etc), as well as transactions to VHD’s stored in Windows Azure Storage from Drives and the new IaaS Data Disks that was just released. Pricing details can be found here.

Windows Azure Storage service was built from ground up to provide storage at massive scale that is highly available and durable. We have provided a storage solution that scales out and load balances automatically, so it does not require manual sharding techniques to be applied. Our storage stack is layered to provide different types of storage abstractions, as described in our SOSP paper. It provides the following four data abstractions:

Windows Azure Blob Service: supports storing large scale unstructured data. Think of it as your file store in the cloud. It empowers developers to build internet scale applications like a document store, media sharing for social networking sites, device backups, etc. In addition, our Windows Azure CDN can be utilized to ensure that the blobs stored are delivered to end users efficiently by making use of the 24+ worldwide caching locations.

Windows Azure Table Service: is a NoSQL structured store system that auto scales hence enabling users to build applications requiring massive scale structured store. It provides an OData interface to access the structured store system. Distributed systems that require massive scale can benefit from storing its structured data in this NoSQL store – example scenarios include: keeping track of users for social sites that can grow to support millions of users, CRM data, queryable metadata for massive number of items/objects, etc.

Windows Azure Queue Service: is an asynchronous messaging system that enables reliable inter-role or component communication for large scale distributed systems. It provides a lease-based message processing system to effectively deal with failures during message processing. It also allows updating of messages that enables more efficient continuation on failure. Example scenarios – web role enqueues work for worker roles to process asynchronously (image processing, virus scan, report building etc.), queues are used for workflow like order processing, etc.

Windows Azure Disks, Images and Drives: A Windows Azure Virtual Machine allows you to easily deploy an IaaS Windows Server or Linux virtual machine and hence migrate your legacy applications in the cloud without having to change them. With a Windows Azure Virtual Machine, you need to associate at least one disk to the VM for your operating system. This disk is a VHD stored as a page blob in Windows Azure Storage. In addition, you can attach multiple data disks with the virtual machine and these data disks are VHDs stored as page blobs. All VHD’s are fixed formatted and all writes to the disk are converted to PutPage transactions that are set to your storage account in the Windows Azure Blob Service, which provides durability for all writes to the IaaS disks. In addition, if you take an image of your virtual machine, it is also stored as a VHD formatted page blob in the Windows Azure Blob Service. These images can then be used to load virtual machines. Then for PaaS, we also have Windows Azure Drives, which allow Windows Azure PaaS roles to dynamically network mount a page blob formatted as a single volume VHD. Both Disks (used for IaaS) and Drives (used for PaaS) are network mounted durable VHDs stored in page blobs, and all transactions to those blobs count towards the billable transactions for the storage account in which they are contained.

To get started, please visit the Windows Azure website and register your Windows Azure Storage account. We provide an easy to use and open REST APIs in addition to client libraries in various languages such as .NET, Java, Node.js, etc. hence making the storage service available to large number of developers. You can download easy to use storage client libraries for your favorite language here and start building applications that require large scale storage.

The following resources provide additional information:

Windows Azure Virtual Machines: Windows Server 2008 R2 and Linux
What is considered a transaction for Blobs, Tables and Queues for billing?
Windows Azure Blob Service: .NET, Java and Node.js libraries
Windows Azure Table Service: .NET, Java and Node.js libraries
Windows Azure Queue Service: .NET, Java and Node.js libraries

Brad Calder

↧

New Storage Features on the Windows Azure Portal

June 8, 2012, 9:26 am

≫ Next: Introducing Locally Redundant Storage for Windows Azure Storage

≪ Previous: 10x Price Reduction for Windows Azure Storage Transactions

We are excited to announce several new storage features available on the Windows Azure Portal. With the updated Portal you have the ability to choose the level of redundancy for your storage, enable/disable and configure both Metrics and Logging, and view metrics in tabular and graphic format.

When you create your storage account, you can now configure the type of redundant storage that meets your business needs – Geo Redundant Storage (geo-replication is enabled) or Locally Redundant Storage (geo-replication is disabled). To learn more about the different types of redundant storage, please read this blog post. You can also update your storage selection (enable/disable geo replication) after your account has been created in the ‘Storage/Configure’ section of the portal.

In the portal you can now also configure Windows Storage Analytics. You can use the portal to enable/disable Metrics and Logging as well as to configure all settings; for full details on analytics, please read this blog post. After you configure metrics, you can choose which of the available metrics you want to monitor in the portal. You can also select which metrics to plot on metrics charts Note that all available metrics (based on what you configured) will be captured as described in this blog post.

As always, the portal also provides the ability to update your storage keys and delete your storage account. Please note that if you delete your storage account, there is no way to restore your data and the account name will no longer be reserved.

The following resources provide additional information:

For details on account creation in the portal, please review this How To documentation.
For details on selecting your redundant storage choice for existing accounts, and related pricing information, please review this How To documentation.
For details on how to configure analytics in the Portal, please review this How To documentation.
For more details on configuring analytics via APIs, please review this blog post.

We hope you enjoy the new storage features in the portal, and welcome your feedback and suggestions for further improvements!

Monilee Atkinson

↧

Introducing Locally Redundant Storage for Windows Azure Storage

June 8, 2012, 9:58 am

≫ Next: New Blob Lease Features: Infinite Leases, Smaller Lease Times, and More

≪ Previous: New Storage Features on the Windows Azure Portal

We are excited to offer two types of redundant storage for Windows Azure: Locally Redundant Storage and Geo Redundant Storage.

Locally Redundant Storage (LRS) provides highly durable and available storage within a single location (sub region). We maintain an equivalent of 3 copies (replicas) of your data within the primary location as described in our SOSP paper; this ensures that we can recover from common failures (disk, node, rack) without impacting your storage account’s availability and durability. All storage writes are performed synchronously across three replicas in three separate fault domains before success is returned back to the client. If there was a major data center disaster, where part of a data center was lost, we would contact customers about potential data loss for Locally Redundant Storage using the customer’s subscription contact information.

Geo Redundant Storage (GRS) provides our highest level of durability by additionally storing your data in a second location (sub region) within the same region hundreds of miles away from the primary location. All Windows Azure Blob and Table data is geo-replicated, but Queue data is not geo-replicated at this time. With Geo Redundant Storage we maintain 3 copies (replicas) of your data in both the primary location and in the secondary location. This ensures that each data center can recover from common failures on its own and also provides a geo-replicated copy of the data in case of a major disaster. As in LRS, data updates are committed to the primary location before success is returned back to the client. Once this is completed, with GRS these updates are asynchronously geo-replicated to the secondary location. For more information about geo replication, please see Introducing Geo-Replication for Windows Azure.

Geo Redundant Storage is enabled by default for all existing storage accounts in production today. You can choose to disable this by turning off geo-replication in the Windows Azure portal for your accounts. You can also configure your redundant storage option when you create a new account via the Windows Azure Portal.

Pricing Details: The default storage is Geo Redundant Storage, and its current pricing does not change. The current price of GRS is the same as it was before the announced pricing change. With these changes, we are pleased to announce that Locally Redundant Storage is offered at a discounted price (23% to 34% depending upon how much data is stored) relative to the price of GRS. Note if you have turned off geo-replication and choose to enable geo-replication at a later time, this action will incur a one-time bandwidth charge to bootstrap your data from the primary to its secondary location. The amount of bandwidth charged for this bootstrap will be equal to the amount of data in your storage account at the time of bootstrap. The price of the bandwidth for the bootstrap is the egress (outbound data transfer) rates for the region (zone) your storage account is in. After the boostrap is done, there are no additional bandwidth charges to geo-replicate your data from the primary to the secondary. Also, if you use GRS from the start for your storage account, there is no boostrap bandwidth charge.For full details, please review the pricing details.

Some customers may choose Locally Redundant Storage for storage that does not require the additional durability of Geo Redundant Storage and want to benefit from the discounted price. This data typically falls into the categories of (a) non-critical or temporary data (such as logs), or (b) data that can be recreated if it is ever lost from sources stored elsewhere. An example of the latter is encoded media files that could be recreated from the golden bits stored in another Windows Azure Storage account that uses Geo Redundant Storage. In addition, some companies have geographical restrictions about what countries their data can be stored in, and choosing Locally Redundant Storage ensures that the data is only stored in the location chosen for the storage account (details on where data is replicated for Geo Redundant Storage can be found here).

Monilee Atkinson and Brad Calder

↧

New Blob Lease Features: Infinite Leases, Smaller Lease Times, and More

June 12, 2012, 8:28 am

≫ Next: Introducing Table SAS (Shared Access Signature), Queue SAS and update to Blob SAS

≪ Previous: Introducing Locally Redundant Storage for Windows Azure Storage

We are excited to introduce some new features with the Lease Blob API with the 2012-02-12 version of Windows Azure Blob Storage service. The 2012-02-12 version also includes some versioned changes. This blog post covers the new features and changes as well as some scenarios and code snippets. The code snippets show how to use lease APIs using the Windows Azure Storage Client Library 1.7.1 (available on GitHub) that supports the 2012-02-12 version of the REST API.

We will begin by giving a brief description of what is new and what semantics have changed when compared to earlier versions and then deep dive into some scenarios that these changes enable. The following is the list of new features that 2012-02-12 version brings for leases:

You can acquire leases for 15s up to 60s or you can acquire a lease for an infinite time period.
You can change the lease id on an active lease.
You can provide the lease id when trying to acquire a lease.
You can provide a time period up to which a lease should continue to remain active when breaking an acquired lease.
Lease is now available for containers to prevent clients from deleting a container which may be in use.

The 2012-02-12 version also brings about some versioned changes when compared to previous versions of the REST API. The following are the list of versioned changes:

You have to provide lease duration when trying to acquire a lease on a blob. If the lease duration is not provided, the call to acquire a lease will fail with 400 (Bad Request). Previous versions of the API did not take lease duration as the lease duration was fixed to 60s.
Once a lease is released, it cannot be broken or renewed. Breaking or renewing a lease that has been released will fail with 409 (Conflict). Previously these operations were allowed. Applications that require a lease to be given up by calling Break Lease would now fail with 409 (Conflict). This error should be ignored since the lease is not active any more.
You can now call Break Lease on a breaking or broken lease hence making break operations idempotent. In previous versions, when a lease has already been broken, a new Break Lease request failed with 409 (Conflict). Applications that want to shorten the duration of a break can now provide shorter duration than the remaining break period (See Breaking Leases section for more details).

Acquire Lease - Lease ID and Duration

Earlier versions of acquire operation did not allow users to specify the lease-id nor the duration. The duration was fixed to 60 seconds and the lease-id was determined by the service. Once a lease id was returned in the response of acquire, it could not be changed. With the 2012-02-12 version, we allow the option for users to propose the lease-id and also specify duration from 15s up to 60s or define the lease duration to be infinite.

An important property of acquire is that as long as the proposed lease-id matches the existing lease-id on a blob with an active lease, the acquire operation will succeed. The advantage of proposing the lease-id on an acquire operation is that if the acquire operation succeeds on server but fails before the server can send the response to the client (i.e. intermittent network errors), then the client can retry with the same proposed lease-id and recover from failure on success response, knowing it still holds the lease. Another property of the proposed lease-id on acquire operations is that on each successful acquire operation, the lease is set to expire once the specified duration of that operation elapses. This allows a client to change the lease duration by reissuing the acquire operation using the same proposed lease-id. Here is a sample code that acquires a lease by proposing the lease-id for 60 seconds. The code later reacquires the lease by reducing lease duration to 15 seconds. For the lease to remain active after the provided time period, the client application would need to periodically call renew before the lease period expires.

CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(blobName);

// acquire lease for 30s with a proposed lease id
// NOTE: null duration will acquire an infinite lease
blob.AcquireLease(TimeSpan.FromSeconds(30), leaseId);
     
   …

// re-acquire lease but reduce the duration to 15s
// could also re-acquire with increased duration, 60s for example
blob.AcquireLease(TimeSpan.FromSeconds(15), leaseId);

Why would you change the lease ID?

As we mentioned above, with 2012-02-12 version, we now allow the lease id to be changed. The Change Lease operation takes the existing lease-id and proposed new id and changes the lease-id to the proposed id. Change Lease is valuable in scenarios where the lease ownership needs to be transferred from component A to component B of your system. For example, component A has a lease on a blob, but needs to allow component B to operate on it. Component B could remember the current lease-id passed to it from component A, changes it to a new lease-id, performs its operation, then change the lease-id back to the previous one that component A knows about. This would allow component B to own exclusive access to the blob, prevent the prior owner from modifying the blob until it is done, and then give access back to it.

Another example is to a workflow where, we just want to keep changing the lease-id as the blob passes through the different parts of the workflow. Let us consider a blog publishing process flow that consists of running the document through:

A service that deletes all objectionable words/phrases
A service that runs spell correction
A service that formats the document

Each of the above steps involves changing the content of the document and is done by a separate service. In this scenario, each service will receive a lease on the document which should be maintained to ensure no one else changes the blog. In addition each service will also change the lease id to prevent previous owners from inadvertently changing the document and to ensure that only it can work on the document upon receiving the request to start processing the document. Once it completes its processing step, it will submit the request to next service in the pipeline passing it the lease id it maintained.

string newLeaseId = Guid.NewGuid().ToString();


blob.ChangeLease(
     // new proposed leaseId
     newLeaseId, 
     // leaseId is the id received from the previous service
     AccessCondition.GenerateLeaseCondition(leaseId));

// change duration required by this service
blob.AcquireLease(TimeSpan.FromSeconds(30), newLeaseId);

Breaking Leases

Break operation is used to release existing lease by rejecting future requests to renew the lease and does not require the issuer to know the current lease-id being held. This is generally used by administrators to reset the lease state. In previous versions, Break Lease allows the lease to be held until the remaining time on the lease elapses. With 2012-02-12 version, this is still the default behavior, with an added option to specify the break period which defines how long to wait before the current lease expires.

The user can specify a break period between 0 and 60 seconds. However, this is only a proposed value, as the actual break period will be the minimum of this value and the remaining time left on the lease. In other words, the client will now be able to shorten the break period from the remaining time left on the lease, but not extend it.

// If duration to break is null, it implies that lease is active for 
// remaining duration, otherwise min(break period, lease remaining time)
blob.BreakLease(TimeSpan.FromSeconds(20));

Infinite Leases

With 2012-02-12 version, infinite leases can be acquired by sending the lease duration to be -1 for the REST API. The storage client library’s Acquire Lease allows null to be passed in as duration to acquire the infinite lease. An infinite lease will never expire, unless explicitly released or broken and hence acts like a lock.

blob.AcquireLease(null /* infinite lease */, leaseId);
// Note: Acquire can be called again with a valid duration to convert an 
// infinite lease to a finite one.

A useful scenario for infinite leases could be for a blob used by a client that wishes to have a lease on at all times (i.e. acquire a lock on the blob). Instead of having to renew the lease continuously as in previous versions, the client now just needs one acquire to specify infinite lease duration.

For Break Lease on an infinite lease, the default behavior is to break the lease immediately. The break operation also has the option for duration to be specified for the break period.

Container Leases

We have added the Lease Container API to prevent container deletion. Holding a lease on a container does not prevent anyone from adding, deleting, or updating any blob content in the container. This is meant to only prevent deletion of the container itself. The lease operations provided are similar to those provided on the blob with the only exception that the lease is a “delete” lease. The operations are:

Acquire lease – Issuer must provide lease duration and optionally propose lease-id
Change lease-id - to allow changing the current id to a new lease-id
Renew lease - to renew the duration
Break lease - to break existing lease without having knowledge of existing lease-id
Release lease - so that another prospective owner can acquire the lease

Consider a scenario where all blobs need to be moved to a different account. Multiple instances can be used in parallel and each instance will work on a given container. When an instance of the job starts, it acquires an infinite lease on the container to prevent anyone from deleting the container prematurely. In addition, since an instance would try to acquire a lease, it will fail if the container is being worked on by a different instance – hence preventing two job instances from migrating the same container.

CloudBlobClient client = storageAccount.CreateCloudBlobClient();
// each migration job is assigned a fixed instance id and it will be used 
// as the lease id.
string leaseId = instanceId; 
IEnumerable<CloudBlobContainer> containerList = client.ListContainers();

foreach (CloudBlobContainer container in containerList)
{
    try
    {
        container.AcquireLease(null /* Infinite lease */, leaseId);
        // if successful - start migration job which will delete the container 
        // once it completes migration 
          …
    }
    catch (Exception e)
    {
        // Check for lease conflict exception – implies some other instance 
        // is working on this container
    }
 }

Lease Properties

With 2012-02-12 version and later the service returns lease specific properties for container and blobs on List Containers, List Blobs, Get Container Properties, Get Blob Properties and Get Blob. The lease specific properties returned are:

x-ms-lease-status (or LeaseStatus): Returns the status of the lease on the blob or container. The possible values are locked or unlocked.

x-ms-lease-state (or LeaseState): Returns the state of the lease on the blob or container. The possible values are available, leased, expired, breaking, or broken. This information can be used by applications for diagnostics or to take further action. For example: If the lease is in breaking, broken or expired state, one of the redundant master instances may try to acquire the lease.

While the lease status tells you if the lease is active or not, the lease state property provides more granular information. Example – the lease status may be locked but state may be breaking.

x-ms-lease-duration (or LeaseDuration): Returns the duration type of the lease – finite or infinite.

Weiping Zhang, Michael Roberson, Jai Haridas, Brad Calder

↧

Introducing Table SAS (Shared Access Signature), Queue SAS and update to Blob SAS

June 12, 2012, 8:38 am

≫ Next: Introducing Asynchronous Cross-Account Copy Blob

≪ Previous: New Blob Lease Features: Infinite Leases, Smaller Lease Times, and More

We’re excited to announce that, as part of version 2012-02-12, we have introduced Table Shared Access Signatures (SAS), Queue SAS and updates to Blob SAS. In this blog, we will highlight usage scenarios for these new features along with sample code using the Windows Azure Storage Client Library v1.7.1, which is available on GitHub.

Shared Access Signatures allow granular access to tables, queues, blob containers, and blobs. A SAS token can be configured to provide specific access rights, such as read, write, update, delete, etc. to a specific table, key range within a table, queue, blob, or blob container; for a specified time period or without any limit. The SAS token appears as part of the resource’s URI as a series of query parameters. Prior to version 2012-02-12, Shared Access Signature could only grant access to blobs and blob containers.

SAS Update to Blob in version 2012-02-12

In the 2012-02-12 version, Blob SAS has been extended to allow unbounded access time to a blob resource instead of the previously limited one hour expiry time for non-revocable SAS tokens. To make use of this additional feature, the sv (signed version) query parameter must be set to "2012-02-12" which would allow the difference between se (signed expiry, which is mandatory) and st (signed start, which is optional) to be larger than one hour. For more details, refer to the MSDN documentation.

Best Practices When Using SAS

The following are best practices to follow when using Shared Access Signatures.

Always use HTTPS when making SAS requests. SAS tokens are sent over the wire as part of a URL, and can potentially be leaked if HTTP is used. A leaked SAS token grants access until it either expires or is revoked.
Use server stored access policies for revokable SAS. Each container, table, and queue can now have up to five server stored access policies at once. Revoking one of these policies invalidates all SAS tokens issued using that policy. Consider grouping SAS tokens such that logically related tokens share the same server stored access policy. Avoid inadvertently reusing revoked access policy identifiers by including a unique string in them, such as the date and time the policy was created.
Don’t specify a start time or allow at least five minutes for clock skew. Due to clock skew, a SAS token might start or expire earlier or later than expected. If you do not specify a start time, then the start time is considered to be now, and you do not have to worry about clock skew for the start time.
Limit the lifetime of SAS tokens and treat it as a Lease. Clients that need more time can request an updated SAS token.
Be aware of version: Starting 2012-02-12 version, SAS tokens will contain a new version parameter (sv). sv defines how the various parameters in the SAS token must be interpreted and the version of the REST API to use to execute the operation. This implies that services that are responsible for providing SAS tokens to client applications for the version of the REST protocol that they understand. Make sure clients understand the REST protocol version specified by sv when they are given a SAS to use.

Table SAS

SAS for table allows account owners to grant SAS token access by defining the following restriction on the SAS policy:

1. Table granularity: users can grant access to an entire table (tn) or to a table range defined by a table (tn) along with a partition key range (startpk/endpk) and row key range (startrk/endrk).

To better understand the range to which access is granted, let us take an example data set:

Row Number	PartitionKey	RowKey
1	PK001	RK001
2	PK001	RK002
3	PK001	RK003
…	…	…
300	PK001	RK300
301	PK002	RK001
302	PK002	RK002
303	PK002	RK003
…	…	…
600	PK002	RK300
601	PK003	RK001
602	PK003	RK002
603	PK003	RK003
…	…	…
900	PK003	RK300

The permission is specified as range of rows from (starpk,startrk) until (endpk, endrk).

Example 1: (starpk,startrk) =(,) (endpk, endrk)=(,)
Allowed Range = All rows in table

Example 2: (starpk,startrk) =(PK002,) (endpk, endrk)=(,)
Allowed Range = All rows starting from row # 301

Example 3: (starpk,startrk) =(PK002,) (endpk, endrk)=(PK002,)
Allowed Range = All rows starting from row # 301 and ending at row # 600

Example 3: (starpk,startrk) =(PK001,RK002) (endpk, endrk)=(PK003,RK003)
Allowed Range = All rows starting from row # 2 and ending at row # 603.
NOTE: The row (PK002, RK100) is accessible and the row key limit is hierarchical and not absolute (i.e. it is not applied as startrk <= rowkey <= endrk).

2. Access permissions (sp): user can grant access rights to the specified table or table range such as Query (r), Add (a), Update (u), Delete (d) or a combination of them.

3. Time range (st/se): users can limit the SAS token access time. Start time (st) is optional but Expiry time (se) is mandatory, and no limits are enforced on these parameters. Therefore a SAS token may be valid for a very large time period.

4. Server stored access policy (si): users can either generate offline SAS tokens where the policy permissions described above is part of the SAS token, or they can choose to store specific policy settings associated with a table. These policy settings are limited to the time range (start time and end time) and the access permissions. Stored access policy provides additional control over generated SAS tokens where policy settings could be changed at any time without the need to re-issue a new token. In addition, revoking SAS access would become possible without the need to change the account’s key.

For more information on the different policy settings for Table SAS and the REST interface, please refer to the SAS MSDN documentation.

Though non-revocable Table SAS provides large time period access to a resource, we highly recommend that you always limit its validity to a minimum required amount of time in case the SAS token is leaked or the holder of the token is no longer trusted. In that case, the only way to revoke access is to rotate the account’s key that was used to generate the SAS, which would also revoke any other SAS tokens that were already issued and are currently in use. In cases where large time period access is needed, we recommend that you use a server stored access policy as described above.

Most Shared Access Signature usage falls into two different scenarios:

A service granting access to clients, so those clients can access their parts of the storage account or access the storage account with restricted permissions. Example: a Windows Phone app for a service running on Windows Azure. A SAS token would be distributed to clients (the Windows Phone app) so it can have direct access to storage.
A service owner who needs to keep his production storage account credentials confined within a limited set of machines or Windows Azure roles which act as a key management system. In this case, a SAS token will be issued on an as-needed basis to worker or web roles that require access to specific storage resources. This allows services to reduce the risk of getting their keys compromised.

Along with the different usage scenarios, SAS token generation usually follows the models below:

A SAS Token Generator or producer service responsible for issuing SAS tokens to applications, referred to as SAS consumers. The SAS token generated is usually for limited amount of time to control access. This model usually works best with the first scenario described earlier where a phone app (SAS consumer) would request access to a certain resource by contacting a SAS generator service running in the cloud. Before the SAS token expires, the consumer would again contact the service for a renewed SAS. The service can refuse to produce any further tokens to certain applications or users, for example in the scenario where a user’s subscription to the service has expired. Diagram 1 illustrates this model.

Diagram 1: SAS Consumer/Producer Request Flow

The communication channel between the application (SAS consumer) and SAS Token Generator could be service specific where the service would authenticate the application/user (for example, using OAuth authentication mechanism) before issuing or renewing the SAS token. We highly recommend that the communication be a secure one in order to avoid any SAS token leak. Note that steps 1 and 2 would only be needed whenever the SAS token approaches its expiry time or the application is requesting access to a different resource. A SAS token can be used as long as it is valid which means multiple requests could be issued (steps 3 and 4) before consulting back with the SAS Token Generator service.
A one-time generated SAS token tied to a signed identifier controlled as part of a stored access policy. This model would work best in the second scenario described earlier where the SAS token could either be part of a worker role configuration file, or issued once by a SAS token generator/producer service where maximum access time could be provided. In case access needs to be revoked or permission and/or duration changed, the account owner can use the Set Table ACL API to modify the stored policy associated with issued SAS token.

Table SAS - Sample Scenario Code

In this section we will provide a usage scenario for Table SAS along with a sample code using the Storage Client Library 1.7.1.

Consider an address book service implementation that needs to scale to a large number of users. The service allows its customers to store their address book in the cloud and access it anywhere using a wide range of clients such as a phone app, desktop app, a website, etc. which we will refer to as the client app. Once a user subscribes to the service, he would be able to add, edit, and query his address book entries. One way to build such system is to run a service in Windows Azure Compute consisting of web and worker roles. The service would act as a middle tier between the client app and the Windows Azure storage system. After the service authenticates it, the client app would be able to access its own address book through a web interface defined by the service. The service would then service all of the client requests by accessing a Windows Azure Table where the address book entries for each of the customer reside. Since the service is involved in processing every request issued by the client, the service would need to scale out its number of Windows Azure Compute instances linearly with the growth of its customer base.

With Table SAS, this scenario becomes simpler to implement. Table SAS can be used to allow the client app to directly access the customer’s address book data that is stored in a Windows Azure Table. This approach would tremendously improve the scalability of the system and reduce cost by removing the service involvement out of the way whenever the client app accesses the address book data. The service role in this case would then be restricted to processing users’ subscription to the service and to generate SAS tokens that are used by the client app to access the stored data directly. Since the token can be granted for any selected time period, the application would need to communicate with the service generating the token only once every selected time period for a given type of access per table. This way, the usage of Table SAS will improve the performance and helps in easily scaling up the system while decreasing the operation cost since fewer servers are needed in this case.

The design of the system using Table SAS would be as follows: A Windows Azure Table called “AddressBook” will be used to store the address book entries for all the customers. The PartitionKey will be the customer’s username or customerID and the RowKey will represent the address book entry key defined as the contact’s LastName,FirstName. This means that all the entries for a certain customer would share the same PartitionKey, the customerID, so the whole address book will be contained within the same PatitionKey for a customer. The following C# class describes the address book entity.

[DataServiceKey("PartitionKey", "RowKey")]
public class AddressBookEntry
{
    public AddressBookEntry(string partitionKey, string rowKey)
    {
        this.PartitionKey = partitionKey;
        this.RowKey = rowKey;
    }
 
    public AddressBookEntry() { }
 
    /// <summary>
    /// Account CustomerID
    /// </summary>
    public string PartitionKey { get; set; }
 
    /// <summary>
   /// Contact Identifier LastName,FirstName
    /// </summary>
    public string RowKey { get; set; }
 
    /// <summary>
    /// The last modified time of the entity set by
    /// the Windows Azure Storage
    /// </summary>
    public DateTime Timestamp { get; set; }
 
    public string Address { get; set; }
 
    public string Email { get; set; }
 
    public string PhoneNumber { get; set; }
}

The address book service consists of the following 2 components:

A SAS token producer, which is running as part of a service on Windows Azure Compute, accepts requests from the client app asking for a SAS token to give it access to a particular customer’s address book data. This service would first authenticate the client app through its preferred authentication scheme, and then it would generate a SAS token that grants access to the “AddressBook” table while restricting the view to the PartitionKey that is equal to the customerID. Full permission access would be given in order to allow the client app to query (r), update (u), add (a) and delete (d) address book entries. Access time would be restricted to 30 minutes in case the service decides to deny access to certain customers in case, for example his address book subscription expired. In this case, no further renewal for the SAS token would be permitted. The 30 minute period largely reduces the load on SAS token producer compared to a service that acts as a proxy for every request.
The client app is responsible for interacting with the customer where it would query, update, insert, and delete address book entries. The client app would first contact the SAS producer service in order to retrieve a SAS token and caches it locally while the token is still valid. The SAS token would be used with any Table REST request against the Windows Azure Storage. The client app would request a new SAS token whenever the current one approaches its expiry time. A standard approach is to renew the SAS every N minutes, where N is half of the time the allocated SAS tokens are valid. For this example, the SAS tokens are valid for 30 minutes, so the client renews the SAS once every 15 minutes. This gives the client time to alert and retry if there is any issue obtaining a SAS renewal. It also helps in cases where application and network latencies cause requests to be delayed in reaching the Windows Azure Storage system.

The SAS Producer code can be found below. It is represented by the SasProducer class that implements the RequestSasToken responsible for issuing a SAS token to the client app. In this example, the communication between the client app and the SAS producer is assumed to be a method call for illustration purposes where the client app would simply invoke the RequestSasToken method whenever it requires a new token to be generated.

/// <summary>
/// The producer class that controls access to the address book
/// by generating sas tokens to clients requesting access to their
/// own address book data
/// </summary>
public class SasProducer
{
    /* ... */
 
    /// <summary>
    /// Issues a SAS token authorizing access to the address book for a given customer ID.
    /// </summary>
    /// <param name="customerId">The customer ID requesting access.</param>
    /// <returns>A SAS token authorizing access to the customer's address book entries.</returns>
    public string RequestSasToken(string customerId)
    {
        // Omitting any authentication code since this is beyond the scope of
        // this sample code
 
        // creating a shared access policy that expires in 30 minutes.
        // No start time is specified, which means that the token is valid immediately.
        // The policy specifies full permissions.
        SharedAccessTablePolicy policy = new SharedAccessTablePolicy()
        {
            SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(
                SasProducer.AccessPolicyDurationInMinutes),
            Permissions = SharedAccessTablePermissions.Add
                | SharedAccessTablePermissions.Query
                | SharedAccessTablePermissions.Update
                | SharedAccessTablePermissions.Delete
        };
 
        // Generate the SAS token. No access policy identifier is used which
        // makes it a non-revocable token
        // limiting the table SAS access to only the request customer's id
        string sasToken = this.addressBookTable.GetSharedAccessSignature(
            policy   /* access policy */,
            null     /* access policy identifier */,
            customerId /* start partition key */,
            null     /* start row key */,
            customerId /* end partition key */,
            null     /* end row key */);
 
        return sasToken;
    }
 }

Note that by not setting the SharedAccessStartTime, Windows Azure Storage would assume that the SAS is valid upon the receipt of the request.

The client app code can be found below. It is represented by the Client class that exposes public methods for manipulating the customer’s address book such UpsertEntry and LookupByName which internally would request from the service front-end, represented by the SasProducer, a SAS token if needed.

/// <summary>
/// The address book client class.
/// </summary>
public class Client
{
    /// <summary>
    /// When to refresh the credentials, measured as a number of minutes before expiration.
    /// </summary>
    private const int SasRefreshThresholdInMinutes = 15;
 
    /// <summary>
    /// the cached copy of the sas credentials of the customer's addressbook
    /// </summary>
    private StorageCredentialsSharedAccessSignature addressBookSasCredentials;
 
    /// <summary>
    /// Sas expiration time, used to determine when a refresh is needed
    /// </summary>
    private DateTime addressBookSasExpiryTime;
 
    /* ... */
 
    /// <summary>
    /// Gets the Table SAS storage credentials accessing the address book
    /// of this particular customer.
    /// The method automatically refreshes the credentials as needed
    /// and caches it locally
    /// </summary>
    public StorageCredentials GetAddressBookSasCredentials()
    {
        // Refresh the credentials if needed.
        if (this.addressBookSasCredentials == null ||
            DateTime.UtcNow.AddMinutes(SasRefreshThresholdInMinutes) >= this.addressBookSasExpiryTime)
        {
            this.RefreshAccessCredentials();
        }
 
        return this.addressBookSasCredentials;
    }
 
    /// <summary>
    /// Requests a new SAS token from the producer, and updates the cached credentials
    /// and the expiration time.
    /// </summary>
    public void RefreshAccessCredentials()
    {
        // Request the SAS token.
        string sasToken = this.addressBookService.RequestSasToken(this.customerId);
 
        // Create credentials using the new token.
        this.addressBookSasCredentials = new StorageCredentialsSharedAccessSignature(sasToken);
        this.addressBookSasExpiryTime = DateTime.UtcNow.AddMinutes(
            SasProducer.AccessPolicyDurationInMinutes);
    }
 
    /// <summary>
    /// Retrieves the address book entry for the given contact name.
    /// </summary>
    /// <param name="contactname">
    /// The lastName,FirstName for the requested address book entry.</param>
    /// <returns>An address book entry with a certain contact card</returns>
    public AddressBookEntry LookupByName(string contactname)
    {
        StorageCredentials credentials = GetAddressBookSasCredentials();
        CloudTableClient tableClient = new CloudTableClient(this.tableEndpoint, credentials);
 
        TableServiceContext context = tableClient.GetDataServiceContext();
 
        CloudTableQuery<AddressBookEntry> query = 
            (from entry in context.CreateQuery<AddressBookEntry>(Client.AddressBookTableName)
             where entry.PartitionKey == this.customerId && entry.RowKey == contactname
             select entry).AsTableServiceQuery();
 
        return query.Execute().SingleOrDefault();
    }
 
    /// <summary>
    /// Inserts a new address book entry or updates an existing entry.
    /// </summary>
    /// <param name="entry">The address book entry to insert or merge.</param>
    public void UpsertEntry(AddressBookEntry entry)
    {
        StorageCredentials credentials = GetAddressBookSasCredentials();
        CloudTableClient tableClient = new CloudTableClient(this.tableEndpoint, credentials);
 
        TableServiceContext context = tableClient.GetDataServiceContext();
 
        // Set the correct customer ID.
        entry.PartitionKey = this.customerId;
 
        // Upsert the entry (Insert or Merge).
        context.AttachTo(Client.AddressBookTableName, entry);
        context.UpdateObject(entry);
        context.SaveChangesWithRetries();
    }
}

Stored Access Policy Sample Code

As an extension to the previous example, consider that the address book service is implementing a garbage collector (GC) that would delete the address book data for users that are no longer consumers of the service. In this case, and in order to avoid the chance of having the storage account credentials be compromised, the GC worker role would use a Table SAS token with maximum access time that is backed by a stored access policy associated with a signed identifier. The Table SAS token would grant access to the “AddressBook” table without specifying any range restrictions on the PartitionKey and RowKey but with delete-only permission. In case the SAS token gets leaked, the service owner would be able to revoke the SAS access by deleting the signed identifier associated with the “AddressBook” table as will be highlighted later through code. To be sure that the SAS access does not get inadvertently reinstated after revocation, the policy identifier has as part of its name the policy’s date and time of creation. (See the section on Best Practices When Using SAS below.)

In addition, assume that the GC worker role would come to be aware of the customerID that it needs to GC is through a Queue called “gcqueue”. Whenever a customer subscription expires, a message is enqueued into the “gcqueue” queue. The GC worker role would keep polling that queue at a regular interval. Once a customerID is dequeued, the worker role would delete that customer’s data and on completion, deletes the queue message associated with that customer. For the same reasons a SAS token is used to access the “AddressBook” table, the GC worker thread would also use a Queue SAS token associated with the “gcqueue” queue while using a stored access policy as well. The permissions needed in this case would be Process-only. More details on Queue SAS are available in the subsequent sections of this post.

To build this additional GC feature, the SAS token producer will be extended to generate a one-time Table SAS token against the “AddressBook” table and a one-time Queue SAS token against the “gcqueue” Queue by associating them with stored access signed identifiers with their respective table and queue as explained earlier. The GC role upon initialization, would contact the SAS token producer in order to retrieve these two SAS tokens.

The additional code needed as part of the SAS producer is as follow.

public const string GCQueueName = "gcqueue";

/// <summary>
/// The garbage collection queue.
/// </summary>
private CloudQueue gcQueue;

/// <summary>
/// Generates an address book table and a GC queue 
/// revocable SAS tokens that is used by the GC worker role
/// </summary>
/// <param name="tableSasToken">
/// An out parameter which returns a revocable SAS token to 
/// access the AddressBook table with delele only permissions</param>
/// <param name="queueSasToken">
/// An out parameter which returns a revocable SAS token to 
/// access the gcqueue with process permissions</param>
public void GetGCSasTokens(out string tableSasToken, out string queueSasToken)
{
    string gcPolicySignedIdentifier = "GCAccessPolicy" + DateTime.UtcNow.ToString();
 
    // Create the GC worker's address book SAS policy 
    // that will be associated with a signed identifer
    TablePermissions addressBookPermissions = new TablePermissions();
    SharedAccessTablePolicy gcTablePolicy = new SharedAccessTablePolicy()
    {
        // Providing the max duration
        SharedAccessExpiryTime = DateTime.MaxValue,
        // Permission is granted to query and delete entries.
        Permissions = SharedAccessTablePermissions.Query | SharedAccessTablePermissions.Delete
    };
 
    // Associate the above policy with a signed identifier
    addressBookPermissions.SharedAccessPolicies.Add(
        gcPolicySignedIdentifier,
        gcTablePolicy);
 
    // The below call will result in a Set Table ACL request to be sent to 
    // Windows Azure Storage in order to store the policy and associate it with the 
    // "GCAccessPolicy" signed identifier that will be referred to
    // by the generated SAS token
    this.addressBookTable.SetPermissions(addressBookPermissions);
 
    // Create the SAS tokens using the above policies.
    // There are no restrictions on partition key and row key.
    // It also uses the signed identifier as part of the token.
    // No requests will be sent to Windows Azure Storage when the below call is made.
    tableSasToken = this.addressBookTable.GetSharedAccessSignature(
        new SharedAccessTablePolicy(),
        gcPolicySignedIdentifier,
        null /* start partition key */,
        null /* start row key */,
        null /* end partition key */,
        null /* end row key */);
 
    // Initializing the garbage collection queue and creating a Queue SAS token
    // by following similar steps as the table SAS
    CloudQueueClient queueClient = 
        this.serviceStorageAccount.CreateCloudQueueClient();
    this.gcQueue = queueClient.GetQueueReference(GCQueueName);
    this.gcQueue.CreateIfNotExist();
 
    // Create the GC queue SAS policy.
    QueuePermissions gcQueuePermissions = new QueuePermissions();
    SharedAccessQueuePolicy gcQueuePolicy = new SharedAccessQueuePolicy()
    {
        // Providing the max duration
        SharedAccessExpiryTime = DateTime.MaxValue,
        // Permission is granted to process queue messages.
        Permissions = SharedAccessQueuePermissions.ProcessMessages
    };
 
    // Associate the above policy with a signed identifier
    gcQueuePermissions.SharedAccessPolicies.Add(
        gcPolicySignedIdentifier,
        gcQueuePolicy);
 
    // The below call will result in a Set Queue ACL request to be sent to 
    // Windows Azure Storage in order to store the policy and associate it with the 
    // "GCAccessPolicy" signed identifier that will be referred to
    // by the generated SAS token
    this.gcQueue.SetPermissions(gcQueuePermissions);
 
    // Create the SAS tokens using the above policy which 
    // uses the signed identifier as part of the token.
    // No requests will be sent to Windows Azure Storage when the below call is made.
    queueSasToken = this.gcQueue.GetSharedAccessSignature(
        new SharedAccessQueuePolicy(),
        gcPolicySignedIdentifier);
}

Whenever customer’s data needs to be deleted the following method will be called which is assumed to be part of the SasProducer class for simplicity.

/// <summary>
/// Flags the given customer ID for garbage collection.
/// </summary>
/// <param name="customerId">The customer ID to delete.</param>
public void DeleteCustomer(string customerId)
{
    // Add the customer ID to the GC queue.
    CloudQueueMessage message = new CloudQueueMessage(customerId);
    this.gcQueue.AddMessage(message);
}

In case a SAS token needs to be revoked, the following method would need to be invoked. Once the below method is called, any malicious user who might have gained access to these SAS tokens will be denied access. The garbage collector could in this case request new token from the SAS Producer.

/// <summary>
/// Revokes Revocable SAS access to a Table that is associated
/// with a policy referred to by the signedIdentifier
/// </summary>
/// <param name="table">
/// Reference to the CloudTable in question. 
/// The table must be created with a signed key access, 
/// since otherwise Set/Get Table ACL would fail</param>
/// <param name="signedIdentifier">the SAS signedIdentifier to revoke</param>
public void RevokeAccessToTable(CloudTable table, string signedIdentifier)
{
    // Retrieve the current policies and SAS signedIdentifier 
    // associated with the table by invoking Get Table ACL
    TablePermissions tablePermissions = table.GetPermissions();
    
    // Attempt to remove the signedIdentifier to revoke from the list
    bool success = tablePermissions.SharedAccessPolicies.Remove(signedIdentifier);
 
    if (success)
    {
        // Commit the changes by invoking Set Table ACL 
        // without the signedidentifier that needs revoking
        this.addressBookTable.SetPermissions(tablePermissions);
    }
    // else the signedIdentifier does not exist, therefore no need to 
    // call Set Table ACL
}

The garbage collection code that uses the generated SAS tokens is as follow.

/// <summary>
/// The garbage collection worker class.
/// </summary>
public class GCWorker
{
    /// <summary>
    /// The address book table.
    /// </summary>
    private CloudTable addressBook;
 
    /// <summary>
    /// The garbage collection queue.
    /// </summary>
    private CloudQueue gcQueue;
 
    /// <summary>
    /// Initializes a new instance of the GCWorker class
    /// by passing in the required SAS credentials to access the 
    /// AddressBook Table and the gcqueue Queue
    /// </summary>
    public GCWorker(
        string tableEndpoint,
        string sasTokenForAddressBook,
        string queueEndpoint,
        string sasTokenForQueue)
    {
        StorageCredentials credentialsForAddressBook = 
            new StorageCredentialsSharedAccessSignature(sasTokenForAddressBook);
        CloudTableClient tableClient = 
            new CloudTableClient(tableEndpoint, credentialsForAddressBook);
        this.addressBook = 
            tableClient.GetTableReference(SasProducer.AddressBookTableName);
 
        StorageCredentials credentialsForQueue = 
            new StorageCredentialsSharedAccessSignature(sasTokenForQueue);
        CloudQueueClient queueClient = 
            new CloudQueueClient(queueEndpoint, credentialsForQueue);
        this.gcQueue = 
            queueClient.GetQueueReference(SasProducer.GCQueueName);
    }
 
    /// <summary>
    /// Starts the GC worker, which polls the GC queue for messages 
    /// containing customerID to be garbage collected.
    /// </summary>
    public void Start()
    {
        while (true)
        {
            // Get a message from the queue by settings its visibility time to 2 minutes
            CloudQueueMessage message = this.gcQueue.GetMessage(TimeSpan.FromMinutes(2));
 
            // If there are no messages, sleep and retry.
            if (message == null)
            {
                Thread.Sleep(TimeSpan.FromMinutes(1));
                continue;
            }
 
            // The account name is the message body.
            string customerIDToGC = message.AsString;
 
            // Create a context for querying and modifying the address book.
            TableServiceContext context = this.addressBook.ServiceClient.GetDataServiceContext();
 
            // Find all entries in a given account.
            CloudTableQuery<AddressBookEntry> query = 
                (from entry in context.CreateQuery<AddressBookEntry>(this.addressBook.Name)
                 where entry.PartitionKey == customerIDToGC
                 select entry).AsTableServiceQuery();
 
            int numberOfEntriesInBatch = 0;
 
            // Delete entries in batches since all of the contact entries share 
            // the same partitionKey
            foreach (AddressBookEntry r in query.Execute())
            {
                context.DeleteObject(r);
                numberOfEntriesInBatch++;
 
                if (numberOfEntriesInBatch == 100)
                {
                    // Commit the batch of 100 deletions to the service.
                    context.SaveChangesWithRetries(SaveChangesOptions.Batch);
                    numberOfEntriesInBatch = 0;
                }
            }
 
            if (numberOfEntriesInBatch > 0)
            {
                // Commit the remaining deletions (if any) to the service.
                context.SaveChangesWithRetries(SaveChangesOptions.Batch);
            }
 
            // Delete the message from the queue.
            this.gcQueue.DeleteMessage(message);
        }
    }
}

For completion, we are providing the following Main method code to illustrate the above classes and allow you to test the sample code.

public static void Main()
{
    string accountName = "someaccountname";
    string accountKey = "someaccountkey";
    
    string tableEndpoint = string.Format(
        "http://{0}.table.core.windows.net", accountName);
    string queueEndpoint = string.Format(
        "http://{0}.queue.core.windows.net", accountName);
 
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
        string.Format(
            "DefaultEndpointsProtocol=http;AccountName={0};AccountKey={1}",
             accountName, accountKey));

    SasProducer sasProducer = new SasProducer(storageAccount);
    
    string sasTokenForAddressBook, sasTokenForQueue;
    // Get the revocable GC SAS tokens
    sasProducer.GetGCSasTokens(out sasTokenForAddressBook, out sasTokenForQueue);
    
    // Initialize and start the GC Worker
    GCWorker gcWorker = new GCWorker(
        tableEndpoint,
        sasTokenForAddressBook,
        queueEndpoint,
        sasTokenForQueue);
    ThreadPool.QueueUserWorkItem((state) => gcWorker.Start());
 
    string customerId = "davidhamilton";
 
    // Create a client object
    Client client = new Client(sasProducer, tableEndpoint, customerId);
 
    // Add some address book entries
    AddressBookEntry contactEntry = new AddressBookEntry
    {
        RowKey = "Harp,Walter",
        Address = "1345 Fictitious St, St Buffalo, NY 98052",
        PhoneNumber = "425-555-0101"
    };
 
    client.UpsertEntry(contactEntry);
 
    contactEntry = new AddressBookEntry
    {
        RowKey = "Foster,Jonathan",
        Email = "Jonathan@fourthcoffee.com"
    };
 
    client.UpsertEntry(contactEntry);
 
    contactEntry = new AddressBookEntry
    {
        RowKey = "Miller,Lisa",
        PhoneNumber = "425-555-2141"
    };
 
    client.UpsertEntry(contactEntry);
 
    // Update Walter's Contact entry with an email address
    contactEntry = new AddressBookEntry
    {
        RowKey = "Harp,Walter",
        Email = "Walter@contoso.com"
    };
 
    client.UpsertEntry(contactEntry);
 
    // Look up an entry
    contactEntry = client.LookupByName("Foster,Jonathan");
 
    // Delete the customer
    sasProducer.DeleteCustomer(customerId);
 
    // Wait for GC
    Thread.Sleep(TimeSpan.FromSeconds(120));
}

Queue SAS

SAS for queue allows account owners to grant SAS access to a queue by defining the following restriction on the SAS policy:

Access permissions (sp): users can grant access rights to the specified queue such as Read or Peek at messages (r), Add message (a), Update message (u), and Process message (p) which allows the Get Messages and Delete Message REST APIs to be invoked, or a combination of permissions. Note that Process message (p) permissions potentially allow a client to get and delete every message in the queue. Therefore the clients receiving these permissions must be sufficiently trusted for the queue being accessed.
Time range (st/se): users can limit the SAS token access time. You can also choose to provide access for maximum duration.
Server stored access policy (si): users can either generate offline SAS tokens where the policy permissions described above is part of the SAS token, or they can choose to store specific policy settings associated with a table. These policy settings are limited to the time range (start time and end time) and the access permissions. Stored access policies provide additional control over generated SAS tokens where policy settings could be changed at any time without the need to re-issue a new token. In addition, revoking SAS access would become possible without the need to change the account’s key.

For more information on the different policy settings for Queue SAS and the REST interface, please refer to the SAS MSDN documentation.

A typical scenario where Queue SAS can be used is for a notification system where the notification producer would need add-only access to the queue and the consumer needs processing and read access to the queue.

As an example, consider a video processing service that works on videos provided by its customers. The source videos are stored as part of the customer’s Windows Azure Storage account. Once the video is processed by the processing service, the resultant video is stored back as part of the customer’s account. The service provides transcoding to different video quality such as 240p, 480p and 720p. Whenever there are new videos to be processed, the customer client app would send a request to the service which includes the source video blob, the destination video blob and the requested video transcoding quality. The service would then transcode the source video and stores the resultant video back to the customer account location denoted by the destination blob. To design such service without Queue SAS, the system design would include 3 different components:

Client, creates a SAS token access to the source video blob with read permissions and a destination blob SAS token access with write permissions. The client then sends a request to the processing service front-end along with the needed video transcoding quality.
Video processing service front-end, accepts requests by first authenticating the sender using its own preferred authentication scheme. Once authenticated, the front-end enqueues a work item into a Windows Azure Queue called “videoprocessingqueue” that gets processed by a number of video processor worker role instances.
Video processor worker role: the worker role would dequeue work items from the “videoprocessingqueue” and processes the request by transcoding the video. The worker role could also extend the visibility time of the work item if more processing time is needed.

The above system design would require that the number of front-ends to scale up with the increased number of requests and customer count in order to be able to keep up with the service demand. In addition, client applications are not isolated from unavailability of video processing service front-ends. Having the client application directly interface with the scalable, highly available and durable Queue using SAS would greatly alleviate this requirement and would help make the service run more efficiently and with less computational resources. It also decouples the client applications from availability of video processing service front-ends. In this case, the front-end role could instead issue SAS tokens granting access to the “viodeprocessingqueue” with add message permission for, say, 2 hours. The client can then use the SAS token in order to enqueue requests. When using Queue SAS, the load on the front-end greatly decreases, since the enqueue requests go directly from the client to storage, instead of through the front-end service. The system design would then look like:

Client, which creates a SAS token access to the source video blob with read permissions and a destination blob SAS token access with write permissions. The client would then contact the front-end and retrieves a SAS token for the “videoprocessingqueue” queue and then enqueues a video processing work item. The client would cache the SAS token for 2 hours and renew it well before it expires.
Video processing service front-end, which accepts requests by first authenticating the sender. Once authenticated, it would issue SAS tokens to the “videoprocessingqueue” queue with add message permission and duration limited to 2 hours.
Video processor worker role: The responsibility of this worker role would remain unchanged from the previous design.

We will now highlight the usage of Queue SAS through code for the video processing service. Authentication and actual video transcoding code will be omitted for simplicity reasons.

We will first define the video processing work item referred to as TranscodingWorkItem as follow.

/// <summary>
/// Enum representing the target video quality requested by the client
/// </summary>
public enum VideoQuality
{
    quality240p,
    quality480p,
    quality720p
}

/// <summary>
/// class representing the queue message Enqueued by the client
/// and processed by the video processing worker role
/// </summary>
public class TranscodingWorkItem
{
    /// <summary>
    /// Blob URL for the source Video that needs to be transcoded
    /// </summary>
    public string SourceVideoUri { get; set; }
 
    /// <summary>
    /// Blob URl for the resultant video that would be produced
    /// </summary>
    public string DestinationVideoUri { get; set; }
 
    /// <summary>
    /// SAS token for the source video with read-only access
    /// </summary>
    public string SourceSasToken { get; set; }
 
    /// <summary>
    /// SAS token for destination video with write-only access
    /// </summary>
    public string DestinationSasToken { get; set; }
 
    /// <summary>
    /// The requested video quality
    /// </summary>
    public VideoQuality TargetVideoQuality { get; set; }
 
    /// <summary>
    /// Converts the xml representation of the queue message into a TranscodingWorkItem object
    /// This API is used by the Video Processing Worker role
    /// </summary>
    /// <param name="messageContents">XML snippet representing the TranscodingWorkItem</param>
    /// <returns></returns>
    public static TranscodingWorkItem FromMessage(string messageContents)
    {
        XmlSerializer mySerializer = new XmlSerializer(typeof(TranscodingWorkItem));
        StringReader reader = new StringReader(messageContents);
        return (TranscodingWorkItem)mySerializer.Deserialize(reader);
    }
    /// <summary>
    /// Serializes this TranscodingWorkItem object to an xml string that would be 
    /// used a queue message.
    /// This API is used by the client
    /// </summary>
    /// <returns></returns>
    public string ToMessage()
    {
        XmlSerializer mySerializer = new XmlSerializer(typeof(TranscodingWorkItem));
        StringWriter writer = new StringWriter();
        mySerializer.Serialize(writer, this);
        writer.Close();
 
        return writer.ToString();
    }
}

Below, we will highlight the code needed by the front-end part of the service. It will be acting as a SAS generator. This component will generate 2 types of SAS tokens; a non-revocable one that is limited to 2 hours consumed by clients and a one-time, maximum duration, revocable one that is used by the video processing worker role.

/// <summary>
/// SAS Generator component that is running as part of the service front-end
/// </summary>
public class SasProducer
{
    /* ... */

    /// <summary>
    /// API invoked by Clients in order to get a SAS token 
    /// that allows them to add messages to the queue.
    /// The token will have add-message permission with a 2 hour limit.
    /// </summary>
    /// <returns>A SAS token authorizing access to the video processing queue.</returns>
    public string GetClientSasToken()
    {
        // The shared access policy should expire in two hours.
        // No start time is specified, which means that the token is valid immediately.
        // The policy specifies add-message permissions.
        SharedAccessQueuePolicy policy = new SharedAccessQueuePolicy()
        {
            SharedAccessExpiryTime = DateTime.UtcNow.Add(SasProducer.SasTokenDuration),
            Permissions = SharedAccessQueuePermissions.Add
        };
 
        // Generate the SAS token. No access policy identifier is used 
        // which makes it non revocable.
        // the token is generated by the client without issuing any calls
        // against the Windows Azure Storage.
        string sasToken = this.videoProcessingQueue.GetSharedAccessSignature(
            policy   /* access policy */,
            null     /* access policy identifier */);
 
        return sasToken;
    }
 
    /// <summary>
    /// This method will generate a revocable SAS token that will be used by 
    /// the video processing worker roles. The role will have process and update
    /// message permissions.
    /// </summary>
    /// <returns></returns>
    public string GetSasTokenForProcessingMessages()
    {
        // A signed identifier is needed to associate a SAS with a server stored policy
        string workerPolicySignedIdentifier = 
            "VideoProcessingWorkerAccessPolicy" + DateTime.UtcNow.ToString();
 
        // Create the video processing worker's queue SAS policy.
        // Permission is granted to process and update queue messages.            
        QueuePermissions workerQueuePermissions = new QueuePermissions();
        SharedAccessQueuePolicy workerQueuePolicy = new SharedAccessQueuePolicy()
        {    
            // Making the duration max
            SharedAccessExpiryTime = DateTime.MaxValue,
            Permissions = SharedAccessQueuePermissions.ProcessMessages | SharedAccessQueuePermissions.Update
        };
 
        // Associate the above policy with a signed identifier
        workerQueuePermissions.SharedAccessPolicies.Add(
            workerPolicySignedIdentifier,
            workerQueuePolicy);
 
        // The below call will result in a Set Queue ACL request to be sent to 
        // Windows Azure Storage in order to store the policy and associate it with the 
        // "VideoProcessingWorkerAccessPolicy" signed identifier that will be referred to
        // by the SAS token
        this.videoProcessingQueue.SetPermissions(workerQueuePermissions);
 
        // Use the signed identifier in order to generate a SAS token. No requests will be
        // sent to Windows Azure Storage when the below call is made.
        string revocableSasTokenQueue = this.videoProcessingQueue.GetSharedAccessSignature(
            new SharedAccessQueuePolicy(),
            workerPolicySignedIdentifier);
 
        return revocableSasTokenQueue;
    }
}

We will now look at the client library code that is running as part of the customer’s application. We will assume that the communication between the client and service front-end is a simple method call invoked on the SasProducer object. In reality, this could be an HTTPS web request that is processed by the front-end and the SAS token is returned as part of the HTTPS response. The client library will use the customer’s storage credentials in order to create SAS to the source and destination video blobs. It would also retrieve the processing video Queue SAS token from the service and enqueues a transcoding work item into it.

/// <summary>
/// A class representing the client using the video processing service.
/// </summary>
public class Client
{
    /// <summary>
    /// When to refresh the credentials, measured as a number of minutes before expiration.
    /// </summary>
    private const int CredsRefreshThresholdInMinutes = 60;
 
    /// <summary>
    /// The handle to the video processing service, for requesting sas tokens
    /// </summary>
    private SasProducer videoProcessingService;
 
    /// <summary>
    /// a cached copy of the SAS credentials.
    /// </summary>
    private StorageCredentialsSharedAccessSignature serviceQueueSasCredentials;
 
    /// <summary>
    /// Expiration time for the service SAS token.
    /// </summary>
    private DateTime serviceQueueSasExpiryTime;
 
    /// <summary>
    /// the video processing service storage queue endpoint that is used to
    /// enqueue workitems to
    /// </summary>
    private string serviceQueueEndpoint;
 
    /// <summary>
    /// Initializes a new instance of the Client class.
    /// </summary>
    /// <param name="service">
    /// A handle to the video processing service object.</param>
    /// <param name="serviceQueueEndpoint">
    /// The video processing service storage queue endpoint that is used to
    /// enqueue workitems to</param>
    public Client(SasProducer service, string serviceQueueEndpoint)
    {
        this.videoProcessingService = service;
        this.serviceQueueEndpoint = serviceQueueEndpoint;
    }
 
    /// <summary>
    /// Called by the application in order to request a video to
    /// be transcoded.
    /// </summary>
    /// <param name="clientStorageAccountName">
    /// The customer's storage account name; Not to be confused
    /// with the service account info</param>
    /// <param name="clientStorageKey">the customer's storage account key.
    /// It is used to generate the SAS access to the customer's videos</param>
    /// <param name="sourceVideoBlobUri">The raw source blob uri</param>
    /// <param name="destinationVideoBlobUri">The raw destination blob uri</param>
    /// <param name="videoQuality">the video quality requested</param>
    public void SubmitTranscodeVideoRequest(
        string clientStorageAccountName,
        string clientStorageKey,
        string sourceVideoBlobUri,
        string destinationVideoBlobUri,
        VideoQuality videoQuality)
    {
        // Create a reference to the customer's storage account
        // that will be used to generate SAS tokens to the source and destination
        // videos
        CloudStorageAccount clientStorageAccount = CloudStorageAccount.Parse(
            string.Format("DefaultEndpointsProtocol=http;AccountName={0};AccountKey={1}", 
            clientStorageAccountName, clientStorageKey));
 
        CloudBlobClient blobClient = clientStorageAccount.CreateCloudBlobClient();
 
        CloudBlob sourceVideo = new CloudBlob(
            sourceVideoBlobUri /*blobUri*/,
            blobClient /*serviceClient*/);
 
        CloudBlob destinationVideo = new CloudBlob(
            destinationVideoBlobUri /*blobUri*/,
            blobClient /*serviceClient*/);
        
        // Create the SAS policies for the videos
        // The permissions are restricted to read-only for the source 
        // and write-only for the destination.
        SharedAccessBlobPolicy sourcePolicy = new SharedAccessBlobPolicy
        {
            // Allow 24 hours for reading and transcoding the video
            SharedAccessExpiryTime = DateTime.UtcNow.AddHours(24),
            Permissions = SharedAccessBlobPermissions.Read
        };
 
        SharedAccessBlobPolicy destinationPolicy = new SharedAccessBlobPolicy
        {
            // Allow 24 hours for reading and transcoding the video
            SharedAccessExpiryTime = DateTime.UtcNow.AddHours(24),
            Permissions = SharedAccessBlobPermissions.Write
        };
 
        // Generate SAS tokens for the source and destination
        string sourceSasToken = sourceVideo.GetSharedAccessSignature(
            sourcePolicy,
            null /* access policy identifier */);
 
        string destinationSasToken = destinationVideo.GetSharedAccessSignature(
            destinationPolicy,
            null /* access policy identifier */);
 
        // Create a workitem for transcoding the video
        TranscodingWorkItem workItem = new TranscodingWorkItem
        {
            SourceVideoUri = sourceVideo.Uri.AbsoluteUri,
            DestinationVideoUri = destinationVideo.Uri.AbsoluteUri,
            SourceSasToken = sourceSasToken,
            DestinationSasToken = destinationSasToken,
            TargetVideoQuality = videoQuality
        };
 
        // Get the credentials for the service queue. This would use the cached
        // credentials in case they did not expire, otherwise it would contact the
        // video processing service
        StorageCredentials serviceQueueSasCrendials = GetServiceQueueSasCredentials();
        CloudQueueClient queueClient = new CloudQueueClient(
            this.serviceQueueEndpoint /*baseAddress*/,
            serviceQueueSasCrendials /*credentials*/);
 
        CloudQueue serviceQueue = queueClient.GetQueueReference(SasProducer.WorkerQueueName);
 
        // Add the workitem to the queue which would 
        // result in a Put Message API to be called on a SAS URL
        CloudQueueMessage message = new CloudQueueMessage(
            workItem.ToMessage() /*content*/);
        serviceQueue.AddMessage(message);
    }
 
    /// <summary>
    /// Gets the SAS storage credentials object for accessing 
    /// the video processing queue.
    /// This method will automatically refresh the credentials as needed.
    /// </summary>
    public StorageCredentials GetServiceQueueSasCredentials()
    {
        // Refresh the credentials if needed.
        if (this.serviceQueueSasCredentials == null ||
            DateTime.UtcNow.AddMinutes(CredsRefreshThresholdInMinutes) 
                >= this.serviceQueueSasExpiryTime)
        {
            this.RefreshAccessCredentials();
        }
 
        return this.serviceQueueSasCredentials;
    }
 
    /// <summary>
    /// Request a new SAS token from the service, and updates the 
    /// cached credentials and the expiration time.
    /// </summary>
    /// <returns>True if the credentials were refreshed, false otherwise.</returns>
    public void RefreshAccessCredentials()
    {
        // Request the SAS token. This is currently emulated as a 
        // method call against the SasProducer object
 
        string sasToken = this.videoProcessingService.GetClientSasToken();
 
        // Create credentials using the new token.
        this.serviceQueueSasCredentials = new StorageCredentialsSharedAccessSignature(sasToken);
        this.serviceQueueSasExpiryTime = DateTime.UtcNow.Add(SasProducer.SasTokenDuration);
    }
}

We then look at the video processing worker role code. The code uses SAS tokens that can either be passed in as part of a configuration file or the video processing role could contact the SAS Producer role to get such info.

/// <summary>
/// A class representing a video processing worker role
/// </summary>
public class VideoProcessingWorker
{
    public const string WorkerQueueName = "videoprocessingqueue";
 
    /// <summary>
    /// A reference to the video processing queue
    /// </summary>
    private CloudQueue videoProcessingQueue;
 
    /// <summary>
    /// Initializes a new instance of the VideoProcessngWorker class.
    /// </summary>
    /// <param name="sasTokenForWorkQueue">
    /// The SAS token for accessing the work queue.</param>
    /// <param name="storageAccountName">
    /// The storage account name used by this service</param>
    public VideoProcessingWorker(string sasTokenForWorkQueue, string storageAccountName)
    {
        string queueEndpoint = 
            string.Format("http://{0}.queue.core.windows.net", storageAccountName);
 
        StorageCredentials queueCredendials = 
            new StorageCredentialsSharedAccessSignature(sasTokenForWorkQueue);
        CloudQueueClient queueClient = 
            new CloudQueueClient(queueEndpoint, queueCredendials);
        this.videoProcessingQueue = 
            queueClient.GetQueueReference(VideoProcessingWorker.WorkerQueueName);
    }
 
    /// <summary>
    /// Starts the worker, which polls the queue for messages containing videos to be transcoded.
    /// </summary>
    public void Start()
    {
        while (true)
        {
            // Get a message from the queue by setting an initial visibility timeout to 5 minutes
            CloudQueueMessage message = this.videoProcessingQueue.GetMessage(
                TimeSpan.FromMinutes(5) /*visibilityTimeout*/);
 
            // If there are no messages, sleep and retry.
            if (message == null)
            {
                Thread.Sleep(TimeSpan.FromSeconds(5));
                continue;
            }
 
            TranscodingWorkItem workItem;
 
            try
            {
                // Deserialize the work item
                workItem = TranscodingWorkItem.FromMessage(message.AsString);
            }
            catch (InvalidOperationException)
            {
                // The message is malformed
                // Log an error (or an alert) and delete it from the queue
                this.videoProcessingQueue.DeleteMessage(message);
                continue;
            }
 
            // Create the source and destination CloudBlob objects
            // from the workitem's blob uris and sas tokens
            StorageCredentials sourceCredentials = 
                new StorageCredentialsSharedAccessSignature(workItem.SourceSasToken);
            CloudBlob sourceVideo = new CloudBlob(workItem.SourceVideoUri, sourceCredentials);
 
            StorageCredentials destinationCredentials = 
                new StorageCredentialsSharedAccessSignature(workItem.DestinationSasToken);
            CloudBlob destinationVideo = 
                new CloudBlob(workItem.DestinationVideoUri, destinationCredentials);
 
            // Process the video
            this.ProcessVideo(sourceVideo, destinationVideo, workItem.TargetVideoQuality);
 
            // Delete the message from the queue.
            this.videoProcessingQueue.DeleteMessage(message);
        }
    }
 
    /// <summary>
    /// Transcodes the video.
    /// This does not do any actual video processing.
    /// </summary>
    private void ProcessVideo(
        CloudBlob sourceVideo,
        CloudBlob destinationVideo,
        VideoQuality targetVideoQuality)
    {
        Stream inStream = sourceVideo.OpenRead();
        Stream outStream = sourceVideo.OpenWrite();
 
        // This is where the real work is done.
        // In this example, we just write inStream to outStream plus some extra text.
 
        byte[] buffer = new byte[1024];
        int count = 1;
 
        while (count != 0)
        {
            count = inStream.Read(buffer, 0, buffer.Length);
            outStream.Write(buffer, 0, count);
        }
 
        // Write the extra text
        using (TextWriter writer = new StreamWriter(outStream))
        {
            writer.WriteLine(" (transcoded to {0})", targetVideoQuality);
        }
    }
}

For completion, we are providing the following Main method code that would allow you to test the above sample code.

public static void Main()
{
    string serviceAccountName = "someserviceaccountname";
    string serviceAccountKey = "someserviceAccountKey";
 
    string serviceQueueEndpoint = 
        string.Format("http://{0}.queue.core.windows.net", serviceAccountName);
 
    // Set up the SAS producer as part of the fron-end
    SasProducer sasProducer = new SasProducer(serviceAccountName, serviceAccountKey);
 
    // Get the SAS token for max time period that is used by the service worker role
    string sasTokenForQueue = sasProducer.GetSasTokenForProcessingMessages();
 
    // Start the video processing worker
    VideoProcessingWorker transcodingWorker = 
        new VideoProcessingWorker(sasTokenForQueue, "someAccountName");
    ThreadPool.QueueUserWorkItem((state) => transcodingWorker.Start());
 
    // Set up the client library
    Client client = new Client(sasProducer, serviceQueueEndpoint);
 
    // Use the client libary to submit transcoding workitems
    string customerAccountName = "clientaccountname";
    string customerAccountKey = "CLIENTACCOUNTKEY";
 
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
        string.Format("DefaultEndpointsProtocol=http;AccountName={0};AccountKey={1}",
        customerAccountName,
        customerAccountKey));
 
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
    
    // Create a source container
    CloudBlobContainer sourceContainer = 
        blobClient.GetContainerReference("sourcevideos");
    sourceContainer.CreateIfNotExist();
 
    // Create destination container
    CloudBlobContainer destinationContainer = 
        blobClient.GetContainerReference("transcodedvideos");
    destinationContainer.CreateIfNotExist();
 
    List<CloudBlob> sourceVideoList = new List<CloudBlob>();
 
    // Upload 10 source videos
    for (int i = 0; i < 10; i++)
    {
        CloudBlob sourceVideo = sourceContainer.GetBlobReference("Video" + i);
 
        // Upload the video
        // This example uses a placeholder string
        sourceVideo.UploadText("Content of video" + i);
 
        sourceVideoList.Add(sourceVideo);
    }
             
    // Submit Video Processing Requests to the service using Queue SAS
    for (int i = 0; i < 10; i++)
    {
        CloudBlob sourceVideo = sourceVideoList[i];
        CloudBlob destinationVideo = 
            destinationContainer.GetBlobReference("Video" + i);
 
        client.SubmitTranscodeVideoRequest(
            customerAccountName,
            customerAccountKey,
            sourceVideo.Uri.AbsoluteUri,
            destinationVideo.Uri.AbsoluteUri,
            VideoQuality.quality480p);
    }
 
    // Let the worker finish processing
    Thread.Sleep(TimeSpan.FromMinutes(5));
}

Jean Ghanem, Michael Roberson, Weiping Zhang, Jai Haridas, Brad Calder

↧

Header Related Changes

Quoted ETags

Return Accept-Ranges Header

Additional Range Format

Sample Range GET Blob Request

Sample Range GET Blob Repsonse

If-Match Condition on Non-Existent Blob

Blob Service Settings and DefaultServiceVersion

Set DefaultServiceVersion property

Sample REST Request

Get Storage Service Properties

Sample REST Request

Sample Library and Usage

InsertOrMerge Usage Example

InsertOrMerge Entity API Sample Code

InsertOrReplace Entity API Sample Code

That was then

This is now

Storing Data in Two Locations for Durability

Geo-Replication Costs and Disabling Geo-Replication

How Geo-Replication Works

How Geo-Failover Works

Order of Geo-Replication and Transaction Consistency

Summary

New Blob, Tables and Queue features

Table Projection (Select)

Improved Blob download experience

Queue InsertMessage with visibility timeout

MD5

Sparse Page Blob

Download Resume

Best Practices

Summary

Resources

Design

Packages

Object Model

Execution

Actions

Entities

Serialization

Querying

Scenarios

Persisting 3rd party objects

Best Practices

Table Samples

Setup

Samples

Summary

Resources

Packages

Services

Blob

Table

Queue

Design

Object Model

Objects

Configuration and Execution

Retry Policies

Note about .NET Storage Client

Summary

Resources

Client Library Extension

Summary

Description and Symptoms

Root Cause

Workarounds

Windows Storage Client Library Workaround Code Example

Java Storage Client Workaround Code Example

Long Term Fix

When using Copy Blob, character ‘+’ appearing as part of the x-ms-copy-source header must be URL percent encoded

Windows Azure Storage Client Library is not URL encoding the x-ms-copy-source header

Windows Azure Storage Client Library Code Workaround

SAS Update to Blob in version 2012-02-12

Best Practices When Using SAS

Table SAS

Table SAS - Sample Scenario Code

Stored Access Policy Sample Code

Queue SAS

Persisting 3^rd party objects