Quantcast
Channel: Microsoft Azure Storage Team Blog
Viewing all 167 articles
Browse latest View live

Introducing Locally Redundant Storage for Windows Azure Storage

$
0
0

We are excited to offer two types of redundant storage for Windows Azure: Locally Redundant Storage and Geo Redundant Storage.

Locally Redundant Storage (LRS) provides highly durable and available storage within a single location (sub region). We maintain an equivalent of 3 copies (replicas) of your data within the primary location as described in our SOSP paper; this ensures that we can recover from common failures (disk, node, rack) without impacting your storage account’s availability and durability. All storage writes are performed synchronously across three replicas in three separate fault domains before success is returned back to the client. If there was a major data center disaster, where part of a data center was lost, we would contact customers about potential data loss for Locally Redundant Storage using the customer’s subscription contact information.

Geo Redundant Storage (GRS) provides our highest level of durability by additionally storing your data in a second location (sub region) within the same region hundreds of miles away from the primary location. All Windows Azure Blob and Table data is geo-replicated, but Queue data is not geo-replicated at this time. With Geo Redundant Storage we maintain 3 copies (replicas) of your data in both the primary location and in the secondary location. This ensures that each data center can recover from common failures on its own and also provides a geo-replicated copy of the data in case of a major disaster. As in LRS, data updates are committed to the primary location before success is returned back to the client. Once this is completed, with GRS these updates are asynchronously geo-replicated to the secondary location. For more information about geo replication, please see Introducing Geo-Replication for Windows Azure.

Geo Redundant Storage is enabled by default for all existing storage accounts in production today. You can choose to disable this by turning off geo-replication in the Windows Azure portal for your accounts. You can also configure your redundant storage option when you create a new account via the Windows Azure Portal.

Pricing Details: The default storage is Geo Redundant Storage, and its current pricing does not change. The current price of GRS is the same as it was before the announced pricing change. With these changes, we are pleased to announce that Locally Redundant Storage is offered at a discounted price (23% to 34% depending upon how much data is stored) relative to the price of GRS. Note if you have turned off geo-replication and choose to enable geo-replication at a later time, this action will incur a one-time bandwidth charge to bootstrap your data from the primary to its secondary location. The amount of bandwidth charged for this bootstrap will be equal to the amount of data in your storage account at the time of bootstrap. The price of the bandwidth for the bootstrap is the egress (outbound data transfer) rates for the region (zone) your storage account is in. After the boostrap is done, there are no additional bandwidth charges to geo-replicate your data from the primary to the secondary.  Also, if you use GRS from the start for your storage account, there is no boostrap bandwidth charge.For full details, please review the pricing details.

Some customers may choose Locally Redundant Storage for storage that does not require the additional durability of Geo Redundant Storage and want to benefit from the discounted price. This data typically falls into the categories of (a) non-critical or temporary data (such as logs), or (b) data that can be recreated if it is ever lost from sources stored elsewhere. An example of the latter is encoded media files that could be recreated from the golden bits stored in another Windows Azure Storage account that uses Geo Redundant Storage. In addition, some companies have geographical restrictions about what countries their data can be stored in, and choosing Locally Redundant Storage ensures that the data is only stored in the location chosen for the storage account (details on where data is replicated for Geo Redundant Storage can be found here).

Monilee Atkinson and Brad Calder


New Blob Lease Features: Infinite Leases, Smaller Lease Times, and More

$
0
0

We are excited to introduce some new features with the Lease Blob API with the 2012-02-12 version of Windows Azure Blob Storage service. The 2012-02-12 version also includes some versioned changes. This blog post covers the new features and changes as well as some scenarios and code snippets. The code snippets show how to use lease APIs using the Windows Azure Storage Client Library 1.7.1 (available on GitHub) that supports the 2012-02-12 version of the REST API.

We will begin by giving a brief description of what is new and what semantics have changed when compared to earlier versions and then deep dive into some scenarios that these changes enable. The following is the list of new features that 2012-02-12 version brings for leases:

  1. You can acquire leases for 15s up to 60s or you can acquire a lease for an infinite time period.
  2. You can change the lease id on an active lease.
  3. You can provide the lease id when trying to acquire a lease.
  4. You can provide a time period up to which a lease should continue to remain active when breaking an acquired lease.
  5. Lease is now available for containers to prevent clients from deleting a container which may be in use.

The 2012-02-12 version also brings about some versioned changes when compared to previous versions of the REST API. The following are the list of versioned changes:

  1. You have to provide lease duration when trying to acquire a lease on a blob. If the lease duration is not provided, the call to acquire a lease will fail with 400 (Bad Request). Previous versions of the API did not take lease duration as the lease duration was fixed to 60s.
  2. Once a lease is released, it cannot be broken or renewed. Breaking or renewing a lease that has been released will fail with 409 (Conflict). Previously these operations were allowed.  Applications that require a lease to be given up by calling Break Lease would now fail with 409 (Conflict). This error should be ignored since the lease is not active any more.
  3. You can now call Break Lease on a breaking or broken lease hence making break operations idempotent. In previous versions, when a lease has already been broken, a new Break Lease request failed with 409 (Conflict). Applications that want to shorten the duration of a break can now provide shorter duration than the remaining break period (See Breaking Leases section for more details).

Acquire Lease - Lease ID and Duration

Earlier versions of acquire operation did not allow users to specify the lease-id nor the duration. The duration was fixed to 60 seconds and the lease-id was determined by the service. Once a lease id was returned in the response of acquire, it could not be changed. With the 2012-02-12 version, we allow the option for users to propose the lease-id and also specify duration from 15s up to 60s or define the lease duration to be infinite.

An important property of acquire is that as long as the proposed lease-id matches the existing lease-id on a blob with an active lease, the acquire operation will succeed. The advantage of proposing the lease-id on an acquire operation is that if the acquire operation succeeds on server but fails before the server can send the response to the client (i.e. intermittent network errors), then the client can retry with the same proposed lease-id and recover from failure on success response, knowing it still holds the lease. Another property of the proposed lease-id on acquire operations is that on each successful acquire operation, the lease is set to expire once the specified duration of that operation elapses. This allows a client to change the lease duration by reissuing the acquire operation using the same proposed lease-id. Here is a sample code that acquires a lease by proposing the lease-id for 60 seconds. The code later reacquires the lease by reducing lease duration to 15 seconds. For the lease to remain active after the provided time period, the client application would need to periodically call renew before the lease period expires.

CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(blobName);// acquire lease for 30s with a proposed lease id
// NOTE: null duration will acquire an infinite leaseblob.AcquireLease(TimeSpan.FromSeconds(30), leaseId);…

// re-acquire lease but reduce the duration to 15s
// could also re-acquire with increased duration, 60s for exampleblob.AcquireLease(TimeSpan.FromSeconds(15), leaseId); 

 

Why would you change the lease ID?

As we mentioned above, with 2012-02-12 version, we now allow the lease id to be changed. The Change Lease operation takes the existing lease-id and proposed new id and changes the lease-id to the proposed id. Change Lease is valuable in scenarios where the lease ownership needs to be transferred from component A to component B of your system. For example, component A has a lease on a blob, but needs to allow component B to operate on it. Component B could remember the current lease-id passed to it from component A, changes it to a new lease-id, performs its operation, then change the lease-id back to the previous one that component A knows about. This would allow component B to own exclusive access to the blob, prevent the prior owner from modifying the blob until it is done, and then give access back to it.

Another example is to a workflow where, we just want to keep changing the lease-id as the blob passes through the different parts of the workflow. Let us consider a blog publishing process flow that consists of running the document through:

  1. A service that deletes all objectionable words/phrases
  2. A service that runs spell correction
  3. A service that formats the document

Each of the above steps involves changing the content of the document and is done by a separate service. In this scenario, each service will receive a lease on the document which should be maintained to ensure no one else changes the blog. In addition each service will also change the lease id to prevent previous owners from inadvertently changing the document and to ensure that only it can work on the document upon receiving the request to start processing the document. Once it completes its processing step, it will submit the request to next service in the pipeline passing it the lease id it maintained.

string newLeaseId = Guid.NewGuid().ToString();

blob.ChangeLease(
// new proposed leaseId
newLeaseId,
// leaseId is the id received from the previous serviceAccessCondition.GenerateLeaseCondition(leaseId));// change duration required by this serviceblob.AcquireLease(TimeSpan.FromSeconds(30), newLeaseId);

 

Breaking Leases

Break operation is used to release existing lease by rejecting future requests to renew the lease and does not require the issuer to know the current lease-id being held. This is generally used by administrators to reset the lease state. In previous versions, Break Lease allows the lease to be held until the remaining time on the lease elapses. With 2012-02-12 version, this is still the default behavior, with an added option to specify the break period which defines how long to wait before the current lease expires.

The user can specify a break period between 0 and 60 seconds. However, this is only a proposed value, as the actual break period will be the minimum of this value and the remaining time left on the lease. In other words, the client will now be able to shorten the break period from the remaining time left on the lease, but not extend it.

// If duration to break is null, it implies that lease is active for 
// remaining duration, otherwise min(break period, lease remaining time)blob.BreakLease(TimeSpan.FromSeconds(20)); 

Infinite Leases

With 2012-02-12 version, infinite leases can be acquired by sending the lease duration to be -1 for the REST API. The storage client library’s Acquire Lease allows null to be passed in as duration to acquire the infinite lease. An infinite lease will never expire, unless explicitly released or broken and hence acts like a lock.

blob.AcquireLease(null /* infinite lease */, leaseId);// Note: Acquire can be called again with a valid duration to convert an 
// infinite lease to a finite one.
 

A useful scenario for infinite leases could be for a blob used by a client that wishes to have a lease on at all times (i.e. acquire a lock on the blob). Instead of having to renew the lease continuously as in previous versions, the client now just needs one acquire to specify infinite lease duration.

For Break Lease on an infinite lease, the default behavior is to break the lease immediately. The break operation also has the option for duration to be specified for the break period.

Container Leases

We have added the Lease Container API to prevent container deletion. Holding a lease on a container does not prevent anyone from adding, deleting, or updating any blob content in the container. This is meant to only prevent deletion of the container itself. The lease operations provided are similar to those provided on the blob with the only exception that the lease is a “delete” lease. The operations are:

  • Acquire lease – Issuer must provide lease duration and optionally propose lease-id
  • Change lease-id - to allow changing the current id to a new lease-id
  • Renew lease - to renew the duration
  • Break lease - to break existing lease without having knowledge of existing lease-id
  • Release lease - so that another prospective owner can acquire the lease

Consider a scenario where all blobs need to be moved to a different account. Multiple instances can be used in parallel and each instance will work on a given container. When an instance of the job starts, it acquires an infinite lease on the container to prevent anyone from deleting the container prematurely. In addition, since an instance would try to acquire a lease, it will fail if the container is being worked on by a different instance – hence preventing two job instances from migrating the same container.

CloudBlobClient client = storageAccount.CreateCloudBlobClient();// each migration job is assigned a fixed instance id and it will be used // as the lease id.string leaseId = instanceId; IEnumerable<CloudBlobContainer> containerList = client.ListContainers();foreach (CloudBlobContainer container in containerList) {try{ container.AcquireLease(null /* Infinite lease */, leaseId);
// if successful - start migration job which will delete the container // once it completes migration … } catch (Exception e) {// Check for lease conflict exception – implies some other instance // is working on this container } }

Lease Properties

With 2012-02-12 version and later the service returns lease specific properties for container and blobs on List Containers, List Blobs, Get Container Properties, Get Blob Properties and Get Blob. The lease specific properties returned are:

x-ms-lease-status (or LeaseStatus): Returns the status of the lease on the blob or container. The possible values are locked or unlocked.

x-ms-lease-state (or LeaseState): Returns the state of the lease on the blob or container. The possible values are available, leased, expired, breaking, or broken. This information can be used by applications for diagnostics or to take further action. For example: If the lease is in breaking, broken or expired state, one of the redundant master instances may try to acquire the lease.

While the lease status tells you if the lease is active or not, the lease state property provides more granular information. Example – the lease status may be locked but state may be breaking.

x-ms-lease-duration (or LeaseDuration): Returns the duration type of the lease – finite or infinite.

Weiping Zhang, Michael Roberson, Jai Haridas, Brad Calder

Introducing Table SAS (Shared Access Signature), Queue SAS and update to Blob SAS

$
0
0

We’re excited to announce that, as part of version 2012-02-12, we have introduced Table Shared Access Signatures (SAS), Queue SAS and updates to Blob SAS. In this blog, we will highlight usage scenarios for these new features along with sample code using the Windows Azure Storage Client Library v1.7.1, which is available on GitHub.

Shared Access Signatures allow granular access to tables, queues, blob containers, and blobs. A SAS token can be configured to provide specific access rights, such as read, write, update, delete, etc. to a specific table, key range within a table, queue, blob, or blob container; for a specified time period or without any limit. The SAS token appears as part of the resource’s URI as a series of query parameters. Prior to version 2012-02-12, Shared Access Signature could only grant access to blobs and blob containers.

SAS Update to Blob in version 2012-02-12

In the 2012-02-12 version, Blob SAS has been extended to allow unbounded access time to a blob resource instead of the previously limited one hour expiry time for non-revocable SAS tokens. To make use of this additional feature, the sv (signed version) query parameter must be set to "2012-02-12" which would allow the difference between se (signed expiry, which is mandatory) and st (signed start, which is optional) to be larger than one hour. For more details, refer to the MSDN documentation.

Best Practices When Using SAS

The following are best practices to follow when using Shared Access Signatures.

  1. Always use HTTPS when making SAS requests. SAS tokens are sent over the wire as part of a URL, and can potentially be leaked if HTTP is used. A leaked SAS token grants access until it either expires or is revoked.
  2. Use server stored access policies for revokable SAS. Each container, table, and queue can now have up to five server stored access policies at once. Revoking one of these policies invalidates all SAS tokens issued using that policy. Consider grouping SAS tokens such that logically related tokens share the same server stored access policy. Avoid inadvertently reusing revoked access policy identifiers by including a unique string in them, such as the date and time the policy was created.
  3. Don’t specify a start time or allow at least five minutes for clock skew. Due to clock skew, a SAS token might start or expire earlier or later than expected. If you do not specify a start time, then the start time is considered to be now, and you do not have to worry about clock skew for the start time.
  4. Limit the lifetime of SAS tokens and treat it as a Lease. Clients that need more time can request an updated SAS token.
  5. Be aware of version: Starting 2012-02-12 version, SAS tokens will contain a new version parameter (sv). sv defines how the various parameters in the SAS token must be interpreted and the version of the REST API to use to execute the operation. This implies that services that are responsible for providing SAS tokens to client applications for the version of the REST protocol that they understand. Make sure clients understand the REST protocol version specified by sv when they are given a SAS to use.

Table SAS

SAS for table allows account owners to grant SAS token access by defining the following restriction on the SAS policy:

1. Table granularity: users can grant access to an entire table (tn) or to a table range defined by a table (tn) along with a partition key range (startpk/endpk) and row key range (startrk/endrk).

To better understand the range to which access is granted, let us take an example data set:

Row Number

PartitionKey

RowKey

1

PK001

RK001

2

PK001

RK002

3

PK001

RK003

300

PK001

RK300

301

PK002

RK001

302

PK002

RK002

303

PK002

RK003

600

PK002

RK300

601

PK003

RK001

602

PK003

RK002

603

PK003

RK003

900

PK003

RK300

The permission is specified as range of rows from (starpk,startrk) until (endpk, endrk).

Example 1: (starpk,startrk) =(,) (endpk, endrk)=(,)
Allowed Range = All rows in table

Example 2: (starpk,startrk) =(PK002,) (endpk, endrk)=(,)
Allowed Range = All rows starting from row # 301

Example 3: (starpk,startrk) =(PK002,) (endpk, endrk)=(PK002,)
Allowed Range = All rows starting from row # 301 and ending at row # 600

Example 3: (starpk,startrk) =(PK001,RK002) (endpk, endrk)=(PK003,RK003)
Allowed Range = All rows starting from row # 2 and ending at row # 603.
NOTE: The row (PK002, RK100) is accessible and the row key limit is hierarchical and not absolute (i.e. it is not applied as startrk <= rowkey <= endrk).

2. Access permissions (sp): user can grant access rights to the specified table or table range such as Query (r), Add (a), Update (u), Delete (d) or a combination of them.

3. Time range (st/se): users can limit the SAS token access time. Start time (st) is optional but Expiry time (se) is mandatory, and no limits are enforced on these parameters. Therefore a SAS token may be valid for a very large time period.

4. Server stored access policy (si): users can either generate offline SAS tokens where the policy permissions described above is part of the SAS token, or they can choose to store specific policy settings associated with a table. These policy settings are limited to the time range (start time and end time) and the access permissions. Stored access policy provides additional control over generated SAS tokens where policy settings could be changed at any time without the need to re-issue a new token. In addition, revoking SAS access would become possible without the need to change the account’s key.

For more information on the different policy settings for Table SAS and the REST interface, please refer to the SAS MSDN documentation.

Though non-revocable Table SAS provides large time period access to a resource, we highly recommend that you always limit its validity to a minimum required amount of time in case the SAS token is leaked or the holder of the token is no longer trusted. In that case, the only way to revoke access is to rotate the account’s key that was used to generate the SAS, which would also revoke any other SAS tokens that were already issued and are currently in use. In cases where large time period access is needed, we recommend that you use a server stored access policy as described above.

Most Shared Access Signature usage falls into two different scenarios:

  1. A service granting access to clients, so those clients can access their parts of the storage account or access the storage account with restricted permissions. Example: a Windows Phone app for a service running on Windows Azure. A SAS token would be distributed to clients (the Windows Phone app) so it can have direct access to storage.
  2. A service owner who needs to keep his production storage account credentials confined within a limited set of machines or Windows Azure roles which act as a key management system. In this case, a SAS token will be issued on an as-needed basis to worker or web roles that require access to specific storage resources. This allows services to reduce the risk of getting their keys compromised.

Along with the different usage scenarios, SAS token generation usually follows the models below:

  • A SAS Token Generator or producer service responsible for issuing SAS tokens to applications, referred to as SAS consumers. The SAS token generated is usually for limited amount of time to control access. This model usually works best with the first scenario described earlier where a phone app (SAS consumer) would request access to a certain resource by contacting a SAS generator service running in the cloud. Before the SAS token expires, the consumer would again contact the service for a renewed SAS. The service can refuse to produce any further tokens to certain applications or users, for example in the scenario where a user’s subscription to the service has expired. Diagram 1 illustrates this model.

clip_image002

Diagram 1: SAS Consumer/Producer Request Flow

  • The communication channel between the application (SAS consumer) and SAS Token Generator could be service specific where the service would authenticate the application/user (for example, using OAuth authentication mechanism) before issuing or renewing the SAS token. We highly recommend that the communication be a secure one in order to avoid any SAS token leak. Note that steps 1 and 2 would only be needed whenever the SAS token approaches its expiry time or the application is requesting access to a different resource. A SAS token can be used as long as it is valid which means multiple requests could be issued (steps 3 and 4) before consulting back with the SAS Token Generator service.
  • A one-time generated SAS token tied to a signed identifier controlled as part of a stored access policy. This model would work best in the second scenario described earlier where the SAS token could either be part of a worker role configuration file, or issued once by a SAS token generator/producer service where maximum access time could be provided. In case access needs to be revoked or permission and/or duration changed, the account owner can use the Set Table ACL API to modify the stored policy associated with issued SAS token.

Table SAS - Sample Scenario Code

In this section we will provide a usage scenario for Table SAS along with a sample code using the Storage Client Library 1.7.1.

Consider an address book service implementation that needs to scale to a large number of users. The service allows its customers to store their address book in the cloud and access it anywhere using a wide range of clients such as a phone app, desktop app, a website, etc. which we will refer to as the client app. Once a user subscribes to the service, he would be able to add, edit, and query his address book entries. One way to build such system is to run a service in Windows Azure Compute consisting of web and worker roles. The service would act as a middle tier between the client app and the Windows Azure storage system. After the service authenticates it, the client app would be able to access its own address book through a web interface defined by the service. The service would then service all of the client requests by accessing a Windows Azure Table where the address book entries for each of the customer reside. Since the service is involved in processing every request issued by the client, the service would need to scale out its number of Windows Azure Compute instances linearly with the growth of its customer base.

With Table SAS, this scenario becomes simpler to implement. Table SAS can be used to allow the client app to directly access the customer’s address book data that is stored in a Windows Azure Table. This approach would tremendously improve the scalability of the system and reduce cost by removing the service involvement out of the way whenever the client app accesses the address book data. The service role in this case would then be restricted to processing users’ subscription to the service and to generate SAS tokens that are used by the client app to access the stored data directly. Since the token can be granted for any selected time period, the application would need to communicate with the service generating the token only once every selected time period for a given type of access per table. This way, the usage of Table SAS will improve the performance and helps in easily scaling up the system while decreasing the operation cost since fewer servers are needed in this case.

The design of the system using Table SAS would be as follows: A Windows Azure Table called “AddressBook” will be used to store the address book entries for all the customers. The PartitionKey will be the customer’s username or customerID and the RowKey will represent the address book entry key defined as the contact’s LastName,FirstName. This means that all the entries for a certain customer would share the same PartitionKey, the customerID, so the whole address book will be contained within the same PatitionKey for a customer. The following C# class describes the address book entity.

[DataServiceKey("PartitionKey", "RowKey")]public class AddressBookEntry{public AddressBookEntry(string partitionKey, string rowKey)
    {this.PartitionKey = partitionKey;this.RowKey = rowKey;
    }public AddressBookEntry() { }/// <summary>
    /// Account CustomerID/// </summary>public string PartitionKey { get; set; }/// <summary>
   /// Contact Identifier LastName,FirstName/// </summary>public string RowKey { get; set; }/// <summary>
    /// The last modified time of the entity set by/// the Windows Azure Storage/// </summary>public DateTime Timestamp { get; set; }public string Address { get; set; }public string Email { get; set; }public string PhoneNumber { get; set; }
}

The address book service consists of the following 2 components:

  1. A SAS token producer, which is running as part of a service on Windows Azure Compute, accepts requests from the client app asking for a SAS token to give it access to a particular customer’s address book data. This service would first authenticate the client app through its preferred authentication scheme, and then it would generate a SAS token that grants access to the “AddressBook” table while restricting the view to the PartitionKey that is equal to the customerID. Full permission access would be given in order to allow the client app to query (r), update (u), add (a) and delete (d) address book entries. Access time would be restricted to 30 minutes in case the service decides to deny access to certain customers in case, for example his address book subscription expired. In this case, no further renewal for the SAS token would be permitted. The 30 minute period largely reduces the load on SAS token producer compared to a service that acts as a proxy for every request.
  2. The client app is responsible for interacting with the customer where it would query, update, insert, and delete address book entries. The client app would first contact the SAS producer service in order to retrieve a SAS token and caches it locally while the token is still valid. The SAS token would be used with any Table REST request against the Windows Azure Storage. The client app would request a new SAS token whenever the current one approaches its expiry time. A standard approach is to renew the SAS every N minutes, where N is half of the time the allocated SAS tokens are valid. For this example, the SAS tokens are valid for 30 minutes, so the client renews the SAS once every 15 minutes. This gives the client time to alert and retry if there is any issue obtaining a SAS renewal. It also helps in cases where application and network latencies cause requests to be delayed in reaching the Windows Azure Storage system.

The SAS Producer code can be found below. It is represented by the SasProducer class that implements the RequestSasToken responsible for issuing a SAS token to the client app. In this example, the communication between the client app and the SAS producer is assumed to be a method call for illustration purposes where the client app would simply invoke the RequestSasToken method whenever it requires a new token to be generated.

/// <summary>
/// The producer class that controls access to the address book/// by generating sas tokens to clients requesting access to their/// own address book data/// </summary>public class SasProducer{/* ... *//// <summary>
    /// Issues a SAS token authorizing access to the address book for a given customer ID./// </summary>
    /// <param name="customerId">The customer ID requesting access.</param>
    /// <returns>A SAS token authorizing access to the customer's address book entries.</returns>public string RequestSasToken(string customerId)
    {// Omitting any authentication code since this is beyond the scope of
        // this sample code
        // creating a shared access policy that expires in 30 minutes.
        // No start time is specified, which means that the token is valid immediately.
        // The policy specifies full permissions.SharedAccessTablePolicy policy = new SharedAccessTablePolicy()
        {
            SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(SasProducer.AccessPolicyDurationInMinutes),
            Permissions = SharedAccessTablePermissions.Add
                | SharedAccessTablePermissions.Query
                | SharedAccessTablePermissions.Update
                | SharedAccessTablePermissions.Delete
        };// Generate the SAS token. No access policy identifier is used which
        // makes it a non-revocable token
        // limiting the table SAS access to only the request customer's idstring sasToken = this.addressBookTable.GetSharedAccessSignature(
            policy   /* access policy */,null     /* access policy identifier */,
            customerId /* start partition key */,null     /* start row key */,
            customerId /* end partition key */,null     /* end row key */);return sasToken;
    }
 }

Note that by not setting the SharedAccessStartTime, Windows Azure Storage would assume that the SAS is valid upon the receipt of the request.

The client app code can be found below. It is represented by the Client class that exposes public methods for manipulating the customer’s address book such UpsertEntry and LookupByName which internally would request from the service front-end, represented by the SasProducer, a SAS token if needed.

 

/// <summary>
/// The address book client class./// </summary>public class Client{/// <summary>
    /// When to refresh the credentials, measured as a number of minutes before expiration./// </summary>private const int SasRefreshThresholdInMinutes = 15;/// <summary>
    /// the cached copy of the sas credentials of the customer's addressbook/// </summary>private StorageCredentialsSharedAccessSignature addressBookSasCredentials;/// <summary>
    /// Sas expiration time, used to determine when a refresh is needed/// </summary>private DateTime addressBookSasExpiryTime;/* ... *//// <summary>
    /// Gets the Table SAS storage credentials accessing the address book/// of this particular customer./// The method automatically refreshes the credentials as needed/// and caches it locally/// </summary>public StorageCredentials GetAddressBookSasCredentials()
    {// Refresh the credentials if needed.if (this.addressBookSasCredentials == null ||
            DateTime.UtcNow.AddMinutes(SasRefreshThresholdInMinutes) >= this.addressBookSasExpiryTime)
        {this.RefreshAccessCredentials();
        }return this.addressBookSasCredentials;
    }/// <summary>
    /// Requests a new SAS token from the producer, and updates the cached credentials/// and the expiration time./// </summary>public void RefreshAccessCredentials()
    {// Request the SAS token.string sasToken = this.addressBookService.RequestSasToken(this.customerId);// Create credentials using the new token.this.addressBookSasCredentials = new StorageCredentialsSharedAccessSignature(sasToken);this.addressBookSasExpiryTime = DateTime.UtcNow.AddMinutes(SasProducer.AccessPolicyDurationInMinutes);
    }/// <summary>
    /// Retrieves the address book entry for the given contact name./// </summary>
    /// <param name="contactname">
    /// The lastName,FirstName for the requested address book entry.</param>
    /// <returns>An address book entry with a certain contact card</returns>public AddressBookEntry LookupByName(string contactname)
    {StorageCredentials credentials = GetAddressBookSasCredentials();CloudTableClient tableClient = new CloudTableClient(this.tableEndpoint, credentials);TableServiceContext context = tableClient.GetDataServiceContext();CloudTableQuery<AddressBookEntry> query = 
            (from entry in context.CreateQuery<AddressBookEntry>(Client.AddressBookTableName)where entry.PartitionKey == this.customerId && entry.RowKey == contactnameselect entry).AsTableServiceQuery();return query.Execute().SingleOrDefault();
    }/// <summary>
    /// Inserts a new address book entry or updates an existing entry./// </summary>
    /// <param name="entry">The address book entry to insert or merge.</param>public void UpsertEntry(AddressBookEntry entry)
    {StorageCredentials credentials = GetAddressBookSasCredentials();CloudTableClient tableClient = new CloudTableClient(this.tableEndpoint, credentials);TableServiceContext context = tableClient.GetDataServiceContext();// Set the correct customer ID.entry.PartitionKey = this.customerId;// Upsert the entry (Insert or Merge).context.AttachTo(Client.AddressBookTableName, entry);
        context.UpdateObject(entry);
        context.SaveChangesWithRetries();
    }
}

Stored Access Policy Sample Code

As an extension to the previous example, consider that the address book service is implementing a garbage collector (GC) that would delete the address book data for users that are no longer consumers of the service. In this case, and in order to avoid the chance of having the storage account credentials be compromised, the GC worker role would use a Table SAS token with maximum access time that is backed by a stored access policy associated with a signed identifier. The Table SAS token would grant access to the “AddressBook” table without specifying any range restrictions on the PartitionKey and RowKey but with delete-only permission. In case the SAS token gets leaked, the service owner would be able to revoke the SAS access by deleting the signed identifier associated with the “AddressBook” table as will be highlighted later through code. To be sure that the SAS access does not get inadvertently reinstated after revocation, the policy identifier has as part of its name the policy’s date and time of creation. (See the section on Best Practices When Using SAS below.)

In addition, assume that the GC worker role would come to be aware of the customerID that it needs to GC is through a Queue called “gcqueue”. Whenever a customer subscription expires, a message is enqueued into the “gcqueue” queue. The GC worker role would keep polling that queue at a regular interval. Once a customerID is dequeued, the worker role would delete that customer’s data and on completion, deletes the queue message associated with that customer. For the same reasons a SAS token is used to access the “AddressBook” table, the GC worker thread would also use a Queue SAS token associated with the “gcqueue” queue while using a stored access policy as well. The permissions needed in this case would be Process-only. More details on Queue SAS are available in the subsequent sections of this post.

To build this additional GC feature, the SAS token producer will be extended to generate a one-time Table SAS token against the “AddressBook” table and a one-time Queue SAS token against the “gcqueue” Queue by associating them with stored access signed identifiers with their respective table and queue as explained earlier. The GC role upon initialization, would contact the SAS token producer in order to retrieve these two SAS tokens.

The additional code needed as part of the SAS producer is as follow.

public const string GCQueueName = "gcqueue";/// <summary>
/// The garbage collection queue./// </summary>private CloudQueue gcQueue;/// <summary>
/// Generates an address book table and a GC queue /// revocable SAS tokens that is used by the GC worker role/// </summary>
/// <param name="tableSasToken">
/// An out parameter which returns a revocable SAS token to /// access the AddressBook table with delele only permissions</param>
/// <param name="queueSasToken">
/// An out parameter which returns a revocable SAS token to /// access the gcqueue with process permissions</param>public void GetGCSasTokens(out string tableSasToken, out string queueSasToken)
{string gcPolicySignedIdentifier = "GCAccessPolicy" + DateTime.UtcNow.ToString();// Create the GC worker's address book SAS policy 
    // that will be associated with a signed identiferTablePermissions addressBookPermissions = new TablePermissions();
    SharedAccessTablePolicy gcTablePolicy = new SharedAccessTablePolicy()
    {// Providing the max durationSharedAccessExpiryTime = DateTime.MaxValue,// Permission is granted to query and delete entries.Permissions = SharedAccessTablePermissions.Query | SharedAccessTablePermissions.Delete
    };// Associate the above policy with a signed identifieraddressBookPermissions.SharedAccessPolicies.Add(
        gcPolicySignedIdentifier,
        gcTablePolicy);// The below call will result in a Set Table ACL request to be sent to 
    // Windows Azure Storage in order to store the policy and associate it with the 
    // "GCAccessPolicy" signed identifier that will be referred to
    // by the generated SAS tokenthis.addressBookTable.SetPermissions(addressBookPermissions);// Create the SAS tokens using the above policies.
    // There are no restrictions on partition key and row key.
    // It also uses the signed identifier as part of the token.
    // No requests will be sent to Windows Azure Storage when the below call is made.tableSasToken = this.addressBookTable.GetSharedAccessSignature(new SharedAccessTablePolicy(),
        gcPolicySignedIdentifier,null /* start partition key */,null /* start row key */,null /* end partition key */,null /* end row key */);// Initializing the garbage collection queue and creating a Queue SAS token
    // by following similar steps as the table SASCloudQueueClient queueClient = this.serviceStorageAccount.CreateCloudQueueClient();this.gcQueue = queueClient.GetQueueReference(GCQueueName);this.gcQueue.CreateIfNotExist();// Create the GC queue SAS policy.QueuePermissions gcQueuePermissions = new QueuePermissions();
    SharedAccessQueuePolicy gcQueuePolicy = new SharedAccessQueuePolicy()
    {// Providing the max durationSharedAccessExpiryTime = DateTime.MaxValue,// Permission is granted to process queue messages.Permissions = SharedAccessQueuePermissions.ProcessMessages
    };// Associate the above policy with a signed identifiergcQueuePermissions.SharedAccessPolicies.Add(
        gcPolicySignedIdentifier,
        gcQueuePolicy);// The below call will result in a Set Queue ACL request to be sent to 
    // Windows Azure Storage in order to store the policy and associate it with the 
    // "GCAccessPolicy" signed identifier that will be referred to
    // by the generated SAS tokenthis.gcQueue.SetPermissions(gcQueuePermissions);// Create the SAS tokens using the above policy which 
    // uses the signed identifier as part of the token.
    // No requests will be sent to Windows Azure Storage when the below call is made.queueSasToken = this.gcQueue.GetSharedAccessSignature(new SharedAccessQueuePolicy(),
        gcPolicySignedIdentifier);
}

Whenever customer’s data needs to be deleted the following method will be called which is assumed to be part of the SasProducer class for simplicity.

/// <summary>
/// Flags the given customer ID for garbage collection./// </summary>
/// <param name="customerId">The customer ID to delete.</param>public void DeleteCustomer(string customerId)
{// Add the customer ID to the GC queue.CloudQueueMessage message = new CloudQueueMessage(customerId);this.gcQueue.AddMessage(message);
}

In case a SAS token needs to be revoked, the following method would need to be invoked. Once the below method is called, any malicious user who might have gained access to these SAS tokens will be denied access. The garbage collector could in this case request new token from the SAS Producer.

/// <summary>
/// Revokes Revocable SAS access to a Table that is associated/// with a policy referred to by the signedIdentifier/// </summary>
/// <param name="table">
/// Reference to the CloudTable in question. /// The table must be created with a signed key access, /// since otherwise Set/Get Table ACL would fail</param>
/// <param name="signedIdentifier">the SAS signedIdentifier to revoke</param>public void RevokeAccessToTable(CloudTable table, string signedIdentifier)
{// Retrieve the current policies and SAS signedIdentifier 
    // associated with the table by invoking Get Table ACLTablePermissions tablePermissions = table.GetPermissions();// Attempt to remove the signedIdentifier to revoke from the listbool success = tablePermissions.SharedAccessPolicies.Remove(signedIdentifier);if (success)
    {// Commit the changes by invoking Set Table ACL 
        // without the signedidentifier that needs revokingthis.addressBookTable.SetPermissions(tablePermissions);
    }// else the signedIdentifier does not exist, therefore no need to 
    // call Set Table ACL}

The garbage collection code that uses the generated SAS tokens is as follow.

/// <summary>
/// The garbage collection worker class./// </summary>public class GCWorker{/// <summary>
    /// The address book table./// </summary>private CloudTable addressBook;/// <summary>
    /// The garbage collection queue./// </summary>private CloudQueue gcQueue;/// <summary>
    /// Initializes a new instance of the GCWorker class/// by passing in the required SAS credentials to access the /// AddressBook Table and the gcqueue Queue/// </summary>public GCWorker(string tableEndpoint,string sasTokenForAddressBook,string queueEndpoint,string sasTokenForQueue)
    {StorageCredentials credentialsForAddressBook = new StorageCredentialsSharedAccessSignature(sasTokenForAddressBook);CloudTableClient tableClient = new CloudTableClient(tableEndpoint, credentialsForAddressBook);this.addressBook = 
            tableClient.GetTableReference(SasProducer.AddressBookTableName);StorageCredentials credentialsForQueue = new StorageCredentialsSharedAccessSignature(sasTokenForQueue);CloudQueueClient queueClient = new CloudQueueClient(queueEndpoint, credentialsForQueue);this.gcQueue = 
            queueClient.GetQueueReference(SasProducer.GCQueueName);
    }/// <summary>
    /// Starts the GC worker, which polls the GC queue for messages /// containing customerID to be garbage collected./// </summary>public void Start()
    {while (true)
        {// Get a message from the queue by settings its visibility time to 2 minutesCloudQueueMessage message = this.gcQueue.GetMessage(TimeSpan.FromMinutes(2));// If there are no messages, sleep and retry.if (message == null)
            {
                Thread.Sleep(TimeSpan.FromMinutes(1));continue;
            }// The account name is the message body.string customerIDToGC = message.AsString;// Create a context for querying and modifying the address book.TableServiceContext context = this.addressBook.ServiceClient.GetDataServiceContext();// Find all entries in a given account.CloudTableQuery<AddressBookEntry> query = 
                (from entry in context.CreateQuery<AddressBookEntry>(this.addressBook.Name)where entry.PartitionKey == customerIDToGCselect entry).AsTableServiceQuery();int numberOfEntriesInBatch = 0;// Delete entries in batches since all of the contact entries share 
            // the same partitionKeyforeach (AddressBookEntry r in query.Execute())
            {
                context.DeleteObject(r);
                numberOfEntriesInBatch++;if (numberOfEntriesInBatch == 100)
                {// Commit the batch of 100 deletions to the service.context.SaveChangesWithRetries(SaveChangesOptions.Batch);
                    numberOfEntriesInBatch = 0;
                }
            }if (numberOfEntriesInBatch > 0)
            {// Commit the remaining deletions (if any) to the service.context.SaveChangesWithRetries(SaveChangesOptions.Batch);
            }// Delete the message from the queue.this.gcQueue.DeleteMessage(message);
        }
    }
}

For completion, we are providing the following Main method code to illustrate the above classes and allow you to test the sample code.

public static void Main()
{string accountName = "someaccountname";string accountKey = "someaccountkey";string tableEndpoint = string.Format("http://{0}.table.core.windows.net", accountName);string queueEndpoint = string.Format("http://{0}.queue.core.windows.net", accountName);
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(string.Format("DefaultEndpointsProtocol=http;AccountName={0};AccountKey={1}",
             accountName, accountKey));

    SasProducer sasProducer = new SasProducer(storageAccount);string sasTokenForAddressBook, sasTokenForQueue;// Get the revocable GC SAS tokenssasProducer.GetGCSasTokens(out sasTokenForAddressBook, out sasTokenForQueue);// Initialize and start the GC WorkerGCWorker gcWorker = new GCWorker(
        tableEndpoint,
        sasTokenForAddressBook,
        queueEndpoint,
        sasTokenForQueue);
    ThreadPool.QueueUserWorkItem((state) => gcWorker.Start());string customerId = "davidhamilton";// Create a client objectClient client = new Client(sasProducer, tableEndpoint, customerId);// Add some address book entriesAddressBookEntry contactEntry = new AddressBookEntry
    {
        RowKey = "Harp,Walter",
        Address = "1345 Fictitious St, St Buffalo, NY 98052",
        PhoneNumber = "425-555-0101"};
    client.UpsertEntry(contactEntry);
    contactEntry = new AddressBookEntry
    {
        RowKey = "Foster,Jonathan",
        Email = "Jonathan@fourthcoffee.com"};
    client.UpsertEntry(contactEntry);
    contactEntry = new AddressBookEntry
    {
        RowKey = "Miller,Lisa",
        PhoneNumber = "425-555-2141"};
    client.UpsertEntry(contactEntry);// Update Walter's Contact entry with an email addresscontactEntry = new AddressBookEntry
    {
        RowKey = "Harp,Walter",
        Email = "Walter@contoso.com"};
    client.UpsertEntry(contactEntry);// Look up an entrycontactEntry = client.LookupByName("Foster,Jonathan");// Delete the customersasProducer.DeleteCustomer(customerId);// Wait for GCThread.Sleep(TimeSpan.FromSeconds(120));
}

 

Queue SAS

SAS for queue allows account owners to grant SAS access to a queue by defining the following restriction on the SAS policy:

  1. Access permissions (sp): users can grant access rights to the specified queue such as Read or Peek at messages (r), Add message (a), Update message (u), and Process message (p) which allows the Get Messages and Delete Message REST APIs to be invoked, or a combination of permissions. Note that Process message (p) permissions potentially allow a client to get and delete every message in the queue. Therefore the clients receiving these permissions must be sufficiently trusted for the queue being accessed.
  2. Time range (st/se): users can limit the SAS token access time. You can also choose to provide access for maximum duration.
  3. Server stored access policy (si): users can either generate offline SAS tokens where the policy permissions described above is part of the SAS token, or they can choose to store specific policy settings associated with a table. These policy settings are limited to the time range (start time and end time) and the access permissions. Stored access policies provide additional control over generated SAS tokens where policy settings could be changed at any time without the need to re-issue a new token. In addition, revoking SAS access would become possible without the need to change the account’s key.

For more information on the different policy settings for Queue SAS and the REST interface, please refer to the SAS MSDN documentation.

A typical scenario where Queue SAS can be used is for a notification system where the notification producer would need add-only access to the queue and the consumer needs processing and read access to the queue.

As an example, consider a video processing service that works on videos provided by its customers. The source videos are stored as part of the customer’s Windows Azure Storage account. Once the video is processed by the processing service, the resultant video is stored back as part of the customer’s account. The service provides transcoding to different video quality such as 240p, 480p and 720p. Whenever there are new videos to be processed, the customer client app would send a request to the service which includes the source video blob, the destination video blob and the requested video transcoding quality. The service would then transcode the source video and stores the resultant video back to the customer account location denoted by the destination blob. To design such service without Queue SAS, the system design would include 3 different components:

  • Client, creates a SAS token access to the source video blob with read permissions and a destination blob SAS token access with write permissions. The client then sends a request to the processing service front-end along with the needed video transcoding quality.
  • Video processing service front-end, accepts requests by first authenticating the sender using its own preferred authentication scheme. Once authenticated, the front-end enqueues a work item into a Windows Azure Queue called “videoprocessingqueue” that gets processed by a number of video processor worker role instances.
  • Video processor worker role: the worker role would dequeue work items from the “videoprocessingqueue” and processes the request by transcoding the video. The worker role could also extend the visibility time of the work item if more processing time is needed.

The above system design would require that the number of front-ends to scale up with the increased number of requests and customer count in order to be able to keep up with the service demand. In addition, client applications are not isolated from unavailability of video processing service front-ends. Having the client application directly interface with the scalable, highly available and durable Queue using SAS would greatly alleviate this requirement and would help make the service run more efficiently and with less computational resources. It also decouples the client applications from availability of video processing service front-ends. In this case, the front-end role could instead issue SAS tokens granting access to the “viodeprocessingqueue” with add message permission for, say, 2 hours. The client can then use the SAS token in order to enqueue requests. When using Queue SAS, the load on the front-end greatly decreases, since the enqueue requests go directly from the client to storage, instead of through the front-end service. The system design would then look like:

  • Client, which creates a SAS token access to the source video blob with read permissions and a destination blob SAS token access with write permissions. The client would then contact the front-end and retrieves a SAS token for the “videoprocessingqueue” queue and then enqueues a video processing work item. The client would cache the SAS token for 2 hours and renew it well before it expires.
  • Video processing service front-end, which accepts requests by first authenticating the sender. Once authenticated, it would issue SAS tokens to the “videoprocessingqueue” queue with add message permission and duration limited to 2 hours.
  • Video processor worker role: The responsibility of this worker role would remain unchanged from the previous design.

We will now highlight the usage of Queue SAS through code for the video processing service. Authentication and actual video transcoding code will be omitted for simplicity reasons.

We will first define the video processing work item referred to as TranscodingWorkItem as follow.
/// <summary>
/// Enum representing the target video quality requested by the client/// </summary>public enum VideoQuality{
    quality240p,
    quality480p,
    quality720p
}/// <summary>
/// class representing the queue message Enqueued by the client/// and processed by the video processing worker role/// </summary>public class TranscodingWorkItem{/// <summary>
    /// Blob URL for the source Video that needs to be transcoded/// </summary>public string SourceVideoUri { get; set; }/// <summary>
    /// Blob URl for the resultant video that would be produced/// </summary>public string DestinationVideoUri { get; set; }/// <summary>
    /// SAS token for the source video with read-only access/// </summary>public string SourceSasToken { get; set; }/// <summary>
    /// SAS token for destination video with write-only access/// </summary>public string DestinationSasToken { get; set; }/// <summary>
    /// The requested video quality/// </summary>public VideoQuality TargetVideoQuality { get; set; }/// <summary>
    /// Converts the xml representation of the queue message into a TranscodingWorkItem object/// This API is used by the Video Processing Worker role/// </summary>
    /// <param name="messageContents">XML snippet representing the TranscodingWorkItem</param>
    /// <returns></returns>public static TranscodingWorkItem FromMessage(string messageContents)
    {
        XmlSerializer mySerializer = new XmlSerializer(typeof(TranscodingWorkItem));StringReader reader = new StringReader(messageContents);return (TranscodingWorkItem)mySerializer.Deserialize(reader);
    }/// <summary>
    /// Serializes this TranscodingWorkItem object to an xml string that would be /// used a queue message./// This API is used by the client/// </summary>
    /// <returns></returns>public string ToMessage()
    {
        XmlSerializer mySerializer = new XmlSerializer(typeof(TranscodingWorkItem));StringWriter writer = new StringWriter();
        mySerializer.Serialize(writer, this);
        writer.Close();return writer.ToString();
    }
}

Below, we will highlight the code needed by the front-end part of the service. It will be acting as a SAS generator. This component will generate 2 types of SAS tokens; a non-revocable one that is limited to 2 hours consumed by clients and a one-time, maximum duration, revocable one that is used by the video processing worker role.

/// <summary>
/// SAS Generator component that is running as part of the service front-end/// </summary>public class SasProducer{/* ... *//// <summary>
    /// API invoked by Clients in order to get a SAS token /// that allows them to add messages to the queue./// The token will have add-message permission with a 2 hour limit./// </summary>
    /// <returns>A SAS token authorizing access to the video processing queue.</returns>public string GetClientSasToken()
    {// The shared access policy should expire in two hours.
        // No start time is specified, which means that the token is valid immediately.
        // The policy specifies add-message permissions.SharedAccessQueuePolicy policy = new SharedAccessQueuePolicy()
        {
            SharedAccessExpiryTime = DateTime.UtcNow.Add(SasProducer.SasTokenDuration),
            Permissions = SharedAccessQueuePermissions.Add
        };// Generate the SAS token. No access policy identifier is used 
        // which makes it non revocable.
        // the token is generated by the client without issuing any calls
        // against the Windows Azure Storage.string sasToken = this.videoProcessingQueue.GetSharedAccessSignature(
            policy   /* access policy */,null     /* access policy identifier */);return sasToken;
    }/// <summary>
    /// This method will generate a revocable SAS token that will be used by /// the video processing worker roles. The role will have process and update/// message permissions./// </summary>
    /// <returns></returns>public string GetSasTokenForProcessingMessages()
    {// A signed identifier is needed to associate a SAS with a server stored policystring workerPolicySignedIdentifier = "VideoProcessingWorkerAccessPolicy" + DateTime.UtcNow.ToString();// Create the video processing worker's queue SAS policy.
        // Permission is granted to process and update queue messages.            QueuePermissions workerQueuePermissions = new QueuePermissions();
        SharedAccessQueuePolicy workerQueuePolicy = new SharedAccessQueuePolicy()
        {    // Making the duration maxSharedAccessExpiryTime = DateTime.MaxValue,
            Permissions = SharedAccessQueuePermissions.ProcessMessages | SharedAccessQueuePermissions.Update
        };// Associate the above policy with a signed identifierworkerQueuePermissions.SharedAccessPolicies.Add(
            workerPolicySignedIdentifier,
            workerQueuePolicy);// The below call will result in a Set Queue ACL request to be sent to 
        // Windows Azure Storage in order to store the policy and associate it with the 
        // "VideoProcessingWorkerAccessPolicy" signed identifier that will be referred to
        // by the SAS tokenthis.videoProcessingQueue.SetPermissions(workerQueuePermissions);// Use the signed identifier in order to generate a SAS token. No requests will be
        // sent to Windows Azure Storage when the below call is made.string revocableSasTokenQueue = this.videoProcessingQueue.GetSharedAccessSignature(new SharedAccessQueuePolicy(),
            workerPolicySignedIdentifier);return revocableSasTokenQueue;
    }
}

We will now look at the client library code that is running as part of the customer’s application. We will assume that the communication between the client and service front-end is a simple method call invoked on the SasProducer object. In reality, this could be an HTTPS web request that is processed by the front-end and the SAS token is returned as part of the HTTPS response. The client library will use the customer’s storage credentials in order to create SAS to the source and destination video blobs. It would also retrieve the processing video Queue SAS token from the service and enqueues a transcoding work item into it.

/// <summary>
/// A class representing the client using the video processing service./// </summary>public class Client{/// <summary>
    /// When to refresh the credentials, measured as a number of minutes before expiration./// </summary>private const int CredsRefreshThresholdInMinutes = 60;/// <summary>
    /// The handle to the video processing service, for requesting sas tokens/// </summary>private SasProducer videoProcessingService;/// <summary>
    /// a cached copy of the SAS credentials./// </summary>private StorageCredentialsSharedAccessSignature serviceQueueSasCredentials;/// <summary>
    /// Expiration time for the service SAS token./// </summary>private DateTime serviceQueueSasExpiryTime;/// <summary>
    /// the video processing service storage queue endpoint that is used to/// enqueue workitems to/// </summary>private string serviceQueueEndpoint;/// <summary>
    /// Initializes a new instance of the Client class./// </summary>
    /// <param name="service">
    /// A handle to the video processing service object.</param>
    /// <param name="serviceQueueEndpoint">
    /// The video processing service storage queue endpoint that is used to/// enqueue workitems to</param>public Client(SasProducer service, string serviceQueueEndpoint)
    {this.videoProcessingService = service;this.serviceQueueEndpoint = serviceQueueEndpoint;
    }/// <summary>
    /// Called by the application in order to request a video to/// be transcoded./// </summary>
    /// <param name="clientStorageAccountName">
    /// The customer's storage account name; Not to be confused/// with the service account info</param>
    /// <param name="clientStorageKey">the customer's storage account key./// It is used to generate the SAS access to the customer's videos</param>
    /// <param name="sourceVideoBlobUri">The raw source blob uri</param>
    /// <param name="destinationVideoBlobUri">The raw destination blob uri</param>
    /// <param name="videoQuality">the video quality requested</param>public void SubmitTranscodeVideoRequest(string clientStorageAccountName,string clientStorageKey,string sourceVideoBlobUri,string destinationVideoBlobUri,
        VideoQuality videoQuality)
    {// Create a reference to the customer's storage account
        // that will be used to generate SAS tokens to the source and destination
        // videosCloudStorageAccount clientStorageAccount = CloudStorageAccount.Parse(string.Format("DefaultEndpointsProtocol=http;AccountName={0};AccountKey={1}", 
            clientStorageAccountName, clientStorageKey));CloudBlobClient blobClient = clientStorageAccount.CreateCloudBlobClient();CloudBlob sourceVideo = new CloudBlob(
            sourceVideoBlobUri /*blobUri*/,
            blobClient /*serviceClient*/);CloudBlob destinationVideo = new CloudBlob(
            destinationVideoBlobUri /*blobUri*/,
            blobClient /*serviceClient*/);// Create the SAS policies for the videos
        // The permissions are restricted to read-only for the source 
        // and write-only for the destination.SharedAccessBlobPolicy sourcePolicy = new SharedAccessBlobPolicy
        {// Allow 24 hours for reading and transcoding the videoSharedAccessExpiryTime = DateTime.UtcNow.AddHours(24),
            Permissions = SharedAccessBlobPermissions.Read
        };
        SharedAccessBlobPolicy destinationPolicy = new SharedAccessBlobPolicy
        {// Allow 24 hours for reading and transcoding the videoSharedAccessExpiryTime = DateTime.UtcNow.AddHours(24),
            Permissions = SharedAccessBlobPermissions.Write
        };// Generate SAS tokens for the source and destinationstring sourceSasToken = sourceVideo.GetSharedAccessSignature(
            sourcePolicy,null /* access policy identifier */);string destinationSasToken = destinationVideo.GetSharedAccessSignature(
            destinationPolicy,null /* access policy identifier */);// Create a workitem for transcoding the videoTranscodingWorkItem workItem = new TranscodingWorkItem
        {
            SourceVideoUri = sourceVideo.Uri.AbsoluteUri,
            DestinationVideoUri = destinationVideo.Uri.AbsoluteUri,
            SourceSasToken = sourceSasToken,
            DestinationSasToken = destinationSasToken,
            TargetVideoQuality = videoQuality
        };// Get the credentials for the service queue. This would use the cached
        // credentials in case they did not expire, otherwise it would contact the
        // video processing serviceStorageCredentials serviceQueueSasCrendials = GetServiceQueueSasCredentials();CloudQueueClient queueClient = new CloudQueueClient(this.serviceQueueEndpoint /*baseAddress*/,
            serviceQueueSasCrendials /*credentials*/);CloudQueue serviceQueue = queueClient.GetQueueReference(SasProducer.WorkerQueueName);// Add the workitem to the queue which would 
        // result in a Put Message API to be called on a SAS URLCloudQueueMessage message = new CloudQueueMessage(
            workItem.ToMessage() /*content*/);
        serviceQueue.AddMessage(message);
    }/// <summary>
    /// Gets the SAS storage credentials object for accessing /// the video processing queue./// This method will automatically refresh the credentials as needed./// </summary>public StorageCredentials GetServiceQueueSasCredentials()
    {// Refresh the credentials if needed.if (this.serviceQueueSasCredentials == null ||
            DateTime.UtcNow.AddMinutes(CredsRefreshThresholdInMinutes) >= this.serviceQueueSasExpiryTime)
        {this.RefreshAccessCredentials();
        }return this.serviceQueueSasCredentials;
    }/// <summary>
    /// Request a new SAS token from the service, and updates the /// cached credentials and the expiration time./// </summary>
    /// <returns>True if the credentials were refreshed, false otherwise.</returns>public void RefreshAccessCredentials()
    {// Request the SAS token. This is currently emulated as a 
        // method call against the SasProducer objectstring sasToken = this.videoProcessingService.GetClientSasToken();// Create credentials using the new token.this.serviceQueueSasCredentials = new StorageCredentialsSharedAccessSignature(sasToken);this.serviceQueueSasExpiryTime = DateTime.UtcNow.Add(SasProducer.SasTokenDuration);
    }
}

We then look at the video processing worker role code. The code uses SAS tokens that can either be passed in as part of a configuration file or the video processing role could contact the SAS Producer role to get such info.

/// <summary>
/// A class representing a video processing worker role/// </summary>public class VideoProcessingWorker{public const string WorkerQueueName = "videoprocessingqueue";/// <summary>
    /// A reference to the video processing queue/// </summary>private CloudQueue videoProcessingQueue;/// <summary>
    /// Initializes a new instance of the VideoProcessngWorker class./// </summary>
    /// <param name="sasTokenForWorkQueue">
    /// The SAS token for accessing the work queue.</param>
    /// <param name="storageAccountName">
    /// The storage account name used by this service</param>public VideoProcessingWorker(string sasTokenForWorkQueue, string storageAccountName)
    {string queueEndpoint = string.Format("http://{0}.queue.core.windows.net", storageAccountName);StorageCredentials queueCredendials = new StorageCredentialsSharedAccessSignature(sasTokenForWorkQueue);CloudQueueClient queueClient = new CloudQueueClient(queueEndpoint, queueCredendials);this.videoProcessingQueue = 
            queueClient.GetQueueReference(VideoProcessingWorker.WorkerQueueName);
    }/// <summary>
    /// Starts the worker, which polls the queue for messages containing videos to be transcoded./// </summary>public void Start()
    {while (true)
        {// Get a message from the queue by setting an initial visibility timeout to 5 minutesCloudQueueMessage message = this.videoProcessingQueue.GetMessage(
                TimeSpan.FromMinutes(5) /*visibilityTimeout*/);// If there are no messages, sleep and retry.if (message == null)
            {
                Thread.Sleep(TimeSpan.FromSeconds(5));continue;
            }
            TranscodingWorkItem workItem;try{// Deserialize the work itemworkItem = TranscodingWorkItem.FromMessage(message.AsString);
            }catch (InvalidOperationException)
            {// The message is malformed
                // Log an error (or an alert) and delete it from the queuethis.videoProcessingQueue.DeleteMessage(message);continue;
            }// Create the source and destination CloudBlob objects
            // from the workitem's blob uris and sas tokensStorageCredentials sourceCredentials = new StorageCredentialsSharedAccessSignature(workItem.SourceSasToken);CloudBlob sourceVideo = new CloudBlob(workItem.SourceVideoUri, sourceCredentials);StorageCredentials destinationCredentials = new StorageCredentialsSharedAccessSignature(workItem.DestinationSasToken);CloudBlob destinationVideo = new CloudBlob(workItem.DestinationVideoUri, destinationCredentials);// Process the videothis.ProcessVideo(sourceVideo, destinationVideo, workItem.TargetVideoQuality);// Delete the message from the queue.this.videoProcessingQueue.DeleteMessage(message);
        }
    }/// <summary>
    /// Transcodes the video./// This does not do any actual video processing./// </summary>private void ProcessVideo(CloudBlob sourceVideo,CloudBlob destinationVideo,
        VideoQuality targetVideoQuality)
    {Stream inStream = sourceVideo.OpenRead();Stream outStream = sourceVideo.OpenWrite();// This is where the real work is done.
        // In this example, we just write inStream to outStream plus some extra text.byte[] buffer = new byte[1024];int count = 1;while (count != 0)
        {
            count = inStream.Read(buffer, 0, buffer.Length);
            outStream.Write(buffer, 0, count);
        }// Write the extra textusing (TextWriter writer = new StreamWriter(outStream))
        {
            writer.WriteLine(" (transcoded to {0})", targetVideoQuality);
        }
    }
}

For completion, we are providing the following Main method code that would allow you to test the above sample code.

public static void Main()
{string serviceAccountName = "someserviceaccountname";string serviceAccountKey = "someserviceAccountKey";string serviceQueueEndpoint = string.Format("http://{0}.queue.core.windows.net", serviceAccountName);// Set up the SAS producer as part of the fron-endSasProducer sasProducer = new SasProducer(serviceAccountName, serviceAccountKey);// Get the SAS token for max time period that is used by the service worker rolestring sasTokenForQueue = sasProducer.GetSasTokenForProcessingMessages();// Start the video processing workerVideoProcessingWorker transcodingWorker = new VideoProcessingWorker(sasTokenForQueue, "someAccountName");
    ThreadPool.QueueUserWorkItem((state) => transcodingWorker.Start());// Set up the client libraryClient client = new Client(sasProducer, serviceQueueEndpoint);// Use the client libary to submit transcoding workitemsstring customerAccountName = "clientaccountname";string customerAccountKey = "CLIENTACCOUNTKEY";
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(string.Format("DefaultEndpointsProtocol=http;AccountName={0};AccountKey={1}",
        customerAccountName,
        customerAccountKey));
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();// Create a source containerCloudBlobContainer sourceContainer = 
        blobClient.GetContainerReference("sourcevideos");
    sourceContainer.CreateIfNotExist();// Create destination containerCloudBlobContainer destinationContainer = 
        blobClient.GetContainerReference("transcodedvideos");
    destinationContainer.CreateIfNotExist();
    List<CloudBlob> sourceVideoList = new List<CloudBlob>();// Upload 10 source videosfor (int i = 0; i < 10; i++)
    {
        CloudBlob sourceVideo = sourceContainer.GetBlobReference("Video" + i);// Upload the video
        // This example uses a placeholder stringsourceVideo.UploadText("Content of video" + i);
        sourceVideoList.Add(sourceVideo);
    }// Submit Video Processing Requests to the service using Queue SASfor (int i = 0; i < 10; i++)
    {
        CloudBlob sourceVideo = sourceVideoList[i];
        CloudBlob destinationVideo = 
            destinationContainer.GetBlobReference("Video" + i);
        client.SubmitTranscodeVideoRequest(
            customerAccountName,
            customerAccountKey,
            sourceVideo.Uri.AbsoluteUri,
            destinationVideo.Uri.AbsoluteUri,
            VideoQuality.quality480p);
    }// Let the worker finish processingThread.Sleep(TimeSpan.FromMinutes(5));
}

Jean Ghanem, Michael Roberson, Weiping Zhang, Jai Haridas, Brad Calder

Introducing Asynchronous Cross-Account Copy Blob

$
0
0

We are excited to introduce some changes to the Copy Blob API with 2012-02-12 version that allows you to copy blobs between storage accounts. This enables some interesting scenarios like:

  • Backup your blobs to another storage account without having to retrieve the content and saving it yourself
  • Migrate your blobs from one account to another efficiently with respect to cost and time

NOTE: To allow cross-account copy, the destination storage account needs to have been created on or after June 7th 2012. This limitation is only for cross-account copy, as accounts created prior can still copy within the same account. If the account is created before June 7th 2012, a copy blob operation across accounts will fail with HTTP Status code 400 (Bad Request) and the storage error code will be “CopyAcrossAccountsNotSupported.”

In this blog, we will go over some of the changes that were made along with some of the best practices to use this API. We will also show some sample code on using the new Copy Blob APIs with SDK 1.7.1 which is available on GitHub.

Changes to Copy Blob API

To enable copying between accounts, we have made the following changes:

Copy Source is now a URL

In versions prior to 2012-02-12, the source request header was specified as “/<account name>/<fully qualified blob name with container name and snapshot time if applicable >”. With 2012-02-12 version, we now require x-ms-copy-source to be specified as a URL. This is a versioned change, as specifying the old format with this new version will now fail with 400 (Bad Request). The new format allows users to specify a shared access signature or use a custom storage domain name. When specifying a source blob from a different account than the destination, the source blob must either be

  • A publicly accessible blob (i.e. the container ACL is set to be public)
  • A private blob, only if the source URL is pre-authenticated with a Shared Access Signature (i.e. pre-signed URL), allowing read permissions on the source blob

A copy operation preserves the type of the blob: a block blob will be copied as a block blob and a page blob will be copied to the destination as a page blob. If the destination blob already exists, it will be overwritten. However, if the destination type (for an existing blob) does not match the source type, the operation fails with HTTP status code 400 (Bad Request).

Note: The source blob could even be a blob outside of Windows Azure, as long as it is publicly accessible or accessible via some form of a Signed URL. For source blobs outside of Windows Azure, they will be copied to block blobs.

Copy is now asynchronous

Making copy asynchronous is a major change that greatly differs from previous versions. Previously, the Blob service returns a successful response back to the user only when the copy operation has completed. With version 2012-02-12, the Blob service will instead schedule the copy operation to be completed asynchronously: a success response only indicates that the copy operation has been successfully scheduled. As a consequence, a successful response from Copy Blob will now return HTTP status code 202 (Accepted) instead of 201 (Created).

A few important points:

  1. There can be only one pending copy operation to a given destination blob name URL at time. But a source blob can be a source for many outstanding copies at once.
  2. The asynchronous copy blob runs in the background using spare bandwidth capacity, so there is no SLA in terms of how fast a blob will be copied.
  3. Currently there is no limit on the number of pending copy blobs that can be queued up for a storage account, but a pending copy blob operation can live in the system for at most 2 weeks. If longer than that, then the copy blob operation will be terminated.
  4. If the source storage account is in a different location from the destination storage account, then the source storage account will be charged egress for the copy using the bandwidth rates as shown here.
  5. When a copy is pending, any attempt to modify, snapshot, or lease the destination blob will fail.

Below we break down the key concepts of the new Copy Blob API.

Copy Blob Scheduling: when the Blob service receives a Copy Blob request, it will first ensure that the source exists and it can be accessed. If source does not exist or cannot be accessed, an HTTP status code 400 (Bad Request) is returned. If any source access conditions are provided, they will be validated too. If conditions do not match, then an HTTP status code 412 (Precondition Failed) error is returned. Once the source is validated, the service then validates any conditions provided for the destination blob (if it exists). If condition checks fail on destination blob, an HTTP status code 412 (Precondition Failed) is returned. If there is already a pending copy operation, then the service returns an HTTP status code 409 (Conflict). Once the validations are completed, the service then initializes the destination blob before scheduling the copy and then returns a success response to the user. If the source is a page blob, the service will create a page blob with the same length as the source blob but all the bytes are zeroed out. If the source blob is a block blob, the service will commit a zero length block blob for the pending copy blob operation. The service maintains a few copy specific properties during the copy operation to allow clients to poll the status and progress of their copy operations.

Copy Blob Response: when a copy blob operation returns success to the client, this indicates the Blob service has successfully scheduled the copy operation to be completed. Two new response headers are introduced:

  1. x-ms-copy-status: The status of the copy operation at the time the response was sent. It can be one of the following:
    • success : Copy operation has completed. This is analogous to the scenario in previous versions where the copy operation has completed synchronously.
    • pending: Copy operation is still pending and the user is expected to poll the status of the copy. (See “Polling for Copy Blob properties” below.)
  2. x-ms-copy-id: The string token that is associated with the copy operation. This can be used when polling the copy status, or if the user wishes to abort a “pending” copy operation.

Polling for Copy Blob properties: we now provide the following additional properties that allow users to track the progress of the copy, using Get Blob Properties, Get Blob, or List Blobs:

  1. x-ms-copy-status (or CopyStatus): The current status of the copy operation. It can be one of the following:
    • pending: Copy operation is pending.
    • success: Copy operation completed successfully.
    • aborted: Copy operation was aborted by a client.
    • failed: Copy operation failed to complete due to an error.
  2. x-ms-copy-id (CopyId): The id returned by the copy operation which can be used to monitor the progress or abort a copy.
  3. x-ms-copy-status-description (CopyStatusDescription): Additional error information that can be used for diagnostics.
  4. x-ms-copy-progress (CopyProgress): The amount of the blob copied so far. This has the format X/Y where X=number of bytes copied and Y is the total number of bytes.
  5. x-ms-copy-completion-time (CopyCompletionTime): The completion time of the last copy.

These properties can be monitored to track the progress of a copy operation that returns “pending” status. However, it is important to note that except for Put Page, Put Block and Lease Blob operations, any other write operation (i.e., Put Blob, Put Block List, Set Blob Metadata, Set Blob Properties) on the destination blob will remove the properties pertaining to the copy operation.

Asynchronous Copy Blob: for the cases where the Copy Blob response returns with x-ms-copy-status set to “pending”, the copy operation will complete asynchronously.

  1. Block blobs: The source block blob will be retrieved using 4 MB chunks and copied to the destination.
  2. Page blobs: The source page blob’s valid ranges are retrieved and copied to destination

Copy Blob operations are retried on any intermittent failures such as network failures, server busy etc. but any failures are recorded in x-ms-copy-status-description which would let users know why the copy is still pending.

When the copy operation is pending, any writes to the destination blob is disallowed and the write operation will fail with HTTP status code 409 (Conflict). One would need to abort the copy before writing to the destination.

Data integrity during asynchronous copy: The Blob service will lock onto a version of the source blob by storing the source blob ETag at the time of copy. This is done to ensure that any source blob changes can be detected during the course of the copy operation. If the source blob changes during the copy, the ETag will no longer match its value at the start of the copy, causing the copy operation to fail.

Aborting the Copy Blob operation: To allow canceling a pending copy, we have introduced the Abort Copy Blob operation in the 2012-02-12 version of REST API. The Abort operation takes the copy-id returned by the Copy operation and will cancel the operation if it is in the “pending” state. An HTTP status code 409 (Conflict) is returned if the state is not pending or the copy-id does not match the pending copy. The blob’s metadata is retained but the content is zeroed out on a successful abort.

Best Practices

How to migrate blobs from a source account’s container to a destination container in another account?

With asynchronous copy, copying blobs from one account to another is simply as follow:

  1. List blobs in the source container.
  2. For each blob in the source container, copy the blob to a destination container.

Once all the blobs are queued for copy, the monitoring component can do the following:

  1. List all blobs in the destination container.
  2. Check the copy status; if it has failed or has been aborted, start a new copy operation.

Example: Here is a sample queuing of asynchronous copy. It will ignore snapshots and only copy base blobs. Error handling is excluded for brevity.

public static void CopyBlobs(
                CloudBlobContainer srcContainer,  string policyId, 
                CloudBlobContainer destContainer)
{// get the SAS token to use for all blobsstring blobToken = srcContainer.GetSharedAccessSignature(new SharedAccessBlobPolicy(), policyId);
    var srcBlobList = srcContainer.ListBlobs(true, BlobListingDetails.None);foreach (var src in srcBlobList)
    {
        var srcBlob = src as CloudBlob;// Create appropriate destination blob type to match the source blobCloudBlob destBlob;if (srcBlob.Properties.BlobType == BlobType.BlockBlob)
        {
            destBlob = destContainer.GetBlockBlobReference(srcBlob.Name);
        }else{
            destBlob = destContainer.GetPageBlobReference(srcBlob.Name);
        }// copy using src blob as SASdestBlob.StartCopyFromBlob(new Uri(srcBlob.Uri.AbsoluteUri + blobToken));
    }
}

Example: Monitoring code without error handling for brevity. NOTE: This sample assumes that no one else would start a different copy operation on the same destination blob. If such assumption is not valid for your scenario, please see “How do I prevent someone else from starting a new copy operation to overwrite my successful copy?” below.

public static void MonitorCopy(CloudBlobContainer destContainer)
{bool pendingCopy = true;while (pendingCopy)
    {
        pendingCopy = false;
        var destBlobList = destContainer.ListBlobs(true, BlobListingDetails.Copy);foreach (var dest in destBlobList)
        {
            var destBlob = dest as CloudBlob;if (destBlob.CopyState.Status == CopyStatus.Aborted ||
                destBlob.CopyState.Status == CopyStatus.Failed)
            {// Log the copy status description for diagnostics 
                // and restart copyLog(destBlob.CopyState);
                    pendingCopy = true;
                    destBlob.StartCopyFromBlob(destBlob.CopyState.Source);
            }else if (destBlob.CopyState.Status == CopyStatus.Pending)
            {// We need to continue waiting for this pending copy
                // However, let us log copy state for diagnosticsLog(destBlob.CopyState);

                pendingCopy = true;
            }// else we completed this pending copy}
        Thread.Sleep(waitTime);
    };
}
 
How do I prevent the source from changing until the copy completes?

In an asynchronous copy, once authorization is verified on source, the service locks to that version of the source by using the ETag value. If the source blob is modified when the copy operation is pending, the service will fail the copy operation with HTTP status code 412 (Precondition Failed). To ensure that source blob is not modified, the client can acquire and maintain a lease on the source blob. (See the Lease Blob REST API.)

With 2012-02-12 version, we have introduced the concept of lock (i.e. infinite lease) which makes it easy for a client to hold on to the lease. A good option is for the copy job to acquire an infinite lease on the source blob before issuing the copy operation. The monitor job can then break the lease when the copy completes.

Example: Sample code that acquires a lock (i.e. infinite lease) on source.

// Acquire infinite lease on source blob                                srcBlob.AcquireLease(null, leaseId);// copy using source blob as SAS and with infinite lease idstring cid = destBlob.StartCopyFromBlob(new Uri(srcBlob.Uri.AbsoluteUri + blobToken),                    null /* source access condition */,null /* destination access condition */,null /* request options */);
 
How do I prevent someone else from starting a new copy operation to overwrite my successful copy?

During a pending copy, the blob service ensures that no client requests can write to the destination blob. The copy blob properties are maintained on the blob after a copy is completed (failed/aborted/successful). However, these copy properties are removed when any write command like Put Blob, Put Block List, Set Blob Metadata or Set Blob Properties are issued on the destination blob. The following operations will however retain the copy properties: Lease Blob, Put Page, and Put Block. Hence, a monitoring component which may require providing confirmation that a copy is completed will need these properties to be retained until it verifies the copy. To prevent any writes on destination blob once the copy is completed, the copy job should acquire an infinite lease on destination blob and provide that as destination access condition when starting the copy blob operation. The copy operation only allows infinite leases on the destination blob. This is because the service prevents any writes to the destination blob and any other granular lease would require client to issue Renew Lease on the destination blob. Acquiring a lease on destination blob requires the blob to exist and hence client would need to create an empty blob before the copy operation is issued. To terminate an infinite lease on a destination blob with pending copy operation, you would have to abort the copy operation before issuing the break request on the lease.  

Weiping Zhang, Michael Roberson, Jai Haridas, Brad Calder

TechEd 2012: New Windows Azure Storage Features, Improved Manageability, and Lower Prices

$
0
0

We are very excited to release multiple improvements to Windows Azure Storage. These include price reductions, new manageability features, and new service features for Windows Azure Blobs, Tables, and Queues.

Jai Haridas will be presenting these features and more on Windows Azure Storage at Tech Ed 2012, so for more details please attend his talk today or view his talk online in a few days.

New Service Features

We’ve released a new version of the REST API, “2012-02-12”. We’ve updated the Java Storage Client library to reflect the new features. We’ve also released source code for a CTP of our .NET storage client library. This version contains the following new features:

  • Shared Access Signatures (Signed URLs) for Tables and Queues– similar to the Shared Access Signature feature previously available for Blobs, this allows account owners to issue URL access to specific resources such as tables, table ranges, queues, blobs and containers while specifying granular sets of permissions. In addition, there are some smaller improvements to Shared Access Signatures for Blobs. Learn more: Introducing Table SAS (Shared Access Signature), Queue SAS and update to Blob SAS
  • Expanded Blob Copy– For Blobs, we now support copying blobs between storage accounts and copy blob (even within accounts) is performed as an asynchronous operation. This is available in the new version, but will only work if the destination storage account was created on or after June 7, 2012. Of course, Copy Blob operations within the same account will continue to work for all accounts. Learn more: Introducing Asynchronous Cross-Account Copy Blob
  • Improved Blob Leasing– Leasing is now available for blob containers, and allows infinite lease duration. In addition, lease durations between 15-60 seconds are also supported. Changing the lease id (in order to rotate the lease-id across your components) is now supported. Learn more: New Blob Lease Features: Infinite Leases, Smaller Lease Times, and More

Improved Manageability

Users of the Windows Azure Management Portal will benefit from the following improvements in managing their storage accounts. These portal improvements are detailed further in New Storage Features on the Windows Azure Portal post.

  • Introducing Locally Redundant Storage - Storage users are now able turn off geo-replication by choosing Locally Redundant Storage (LRS). LRS provides highly durable and available storage within a single location (sub region).
  • Choosing Geo Redundant Storage or Locally Redundant Storage– By default storage accounts are configured for Geo Redundant Storage (GRS), meaning that Table and Blob data is replicated both within the primary location and also to a location hundreds of miles away (geo-replication). As detailed in this blog post, using LRS may be preferable in certain scenarios, and is available at a 23-34% discount compared to GRS. The price of GRS remains unchanged. Please note that a one-time bandwidth charge will apply if you choose to re-enable GRS after switching to LRS. You can also learn more about geo-replication in Introducing Geo-replication for Windows Azure Storage.
  • Configuration of Storage Analytics – While our analytics features (metrics and logging) have been available since last summer, configuring them required the user to call the REST API. In the new management portal, users can easily configure these features. To learn more about metrics and logging, see Windows Azure Storage Analytics.
  • Monitoring Storage Metrics– Storage users can now also monitor any desired set of metrics tracked in your account via the management portal.

Pricing

As mentioned above, users can reduce costs by choosing to use Locally Redundant Storage. Furthermore, we are excited to announce that we are reducing the pricing for storage transactions from $0.01 per 10,000 transactions to $0.01 per 100,000, reducing transaction costs by 90%! Learn more: 10x Price Reduction for Windows Azure Storage Transactions.

Summary

We’ve introduced a number of improvements to Windows Azure Storage and we invite you to read about each of them in the referenced blog posts. As always, we welcome your feedback and hope you’ll enjoy these new features!

Jeffrey Irwin and Brad Calder

USENIX Best Paper Award: Erasure Coding in Windows Azure Storage

$
0
0

We just published a paper describing how we Erasure Code data in Windows Azure Storage that won a Best Paper Award at the June 2012 USENIX Annual Technical Conference. This was joint work between Microsoft Research and the Windows Azure Storage team.

The paper can be found here.

Windows Azure Storage is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time that is highly available and durable. When using Windows Azure Storage, you have access to your data from anywhere, at any time, and only pay for what you use and store.

The internal details for how Windows Azure Storage works is described in our SOSP paper here. One of the areas only briefly touched on in the SOSP paper was the fact that in the background we lazily erasure code data to reduce its storage overhead while keeping your data durable and highly available.

In our USENIX paper we describe how we do erasure coding in Windows Azure Storage. For erasure coding, we introduce a new set of codes we call Local Reconstruction Codes (LRC). LRC reduces the number of erasure coding fragments that need to be read when reconstructing data fragments that are offline, while still keeping the storage overhead low. The important benefits of LRC are that it reduces the bandwidth and I/Os required for reconstruction reads over prior codes, while still allowing a significant reduction in storage overhead. It is optimized to efficiently reconstruct fragments in the face of (a) single fragment failures (e.g., failed disk, node, or rack), (b) when fragments are offline due to an upgrade, or (c) when access to a fragment is slow. In the paper we describe how LRC is used in Windows Azure Storage to provide low overhead durable storage with consistently low read latencies. In addition, we describe our erasure coding implementation and important design decisions that went into it.

Brad Calder

Exploring Windows Azure Drives, Disks, and Images

$
0
0

With the preview of Windows Azure Virtual Machines, we have two new special types of blobs stored in Windows Azure Storage: Windows Azure Virtual Machine Disks and Window Azure Virtual Machine Images. And of course we also have the existing preview of Windows Azure Drives. In the rest of this post, we will refer to these as storage, disks, images, and drives. This post explores what drives, disks, and images are and how they interact with storage.

Virtual Hard Drives (VHDs)

Drives, disks, and images are all VHDs stored as page blobs within your storage account. There are actually several slightly different VHD formats: fixed, dynamic, and differencing. Currently, Windows Azure only supports the format named ‘fixed’. This format lays the logical disk out linearly within the file format, such that disk offset X is stored at blob offset X. At the end of the blob, there is a small footer that describes the properties of the VHD. All of this stored in the page blob adheres to the standard VHD format, so you can take this VHD and mount it on your server on-premises if you choose to. Often, the fixed format wastes space because most disks have large unused ranges in them. However, we store our ‘fixed’ VHDs as a page blob, which is a sparse format, so we get the benefits of both the ‘fixed’ and ‘expandable’ disks at the same time.

Uploading VHDs to Windows Azure Storage

You can upload your VHD into your storage account to use it for either PaaS or IaaS. When you are uploading your VHD into storage, you will want to use a tool that understands that page blobs are sparse, and only uploads the portions of the VHD that have actual data in them. Also, if you have dynamic VHDs, you want to use a tool that will convert your dynamic VHD into a fixed VHD as it is doing the upload. CSUpload will do both of these things for you, and it is included as part of the Windows Azure SDK.

Persistence and Durability

Since drives, disks, and images are all stored in storage, your data will be persisted even when your virtual machine has to be moved to another physical machine. This means your data gets to take advantage of the durability offered by the Windows Azure Storage architecture, where all of your non-buffered and flushed writes to the disk/drive are replicated 3 times in storage to make it durable before returning success back to your application.

Drives (PaaS)

Drives are used by the PaaS roles (Worker Role, Web Role, and VM Role) to mount a VHD and assign a drive letter. There are many details about how you use these drives here. Drives are implemented with a kernel mode driver that runs within your VM, so your disk IO to and from the drive in the VM will cause network IO to and from the VM to your page blob in Windows Azure Storage. The follow diagram shows the driver running inside the VM, communicating with storage through the VM’s virtual network adapter.

azuredrive

PaaS roles are allowed to mount up to 16 drives per role.

Disks (IaaS)

When you create a Windows Azure Virtual Machine, the platform will attach at least one disk to the VM for your operating system disk. This disk will also be a VHD stored as a page blob in storage. As you write to the disk in the VM, the changes to the disk will be made to the page blob inside storage. You can also attach additional disks to your VM as data disks, and these will be stored in storage as page blobs as well.

Unlike for drives, the code that communicates with storage on behalf of your disk is not within your VM, so doing IO to the disk will not cause network activity in the VM, although it will cause network activity on the physical node. The following diagram shows how the driver runs in the host operating system, and the VM communicates through the disk interface to the driver, which then communicates through the host network adapter to storage.

azuredrive

There are limits to the number of disks a virtual machine can mount, varying from 16 data disks for an extra-large virtual machine, to one data disk for an extra small virtual machine. Details can be found here.

IMPORTANT: The Windows Azure platform holds an infinite lease on all the page blobs that it considers disks in your storage account so that you don’t accidently delete the underlying page blob, container, nor the storage account while the VM is using the VHD. If you want to delete the underlying page blob, the container it is within, or the storage account, you will need to detach the disk from the VM first as shown here:

detatchdisk

And then select the disk you want to detach and then delete:

detachdiskfromvm

Then you need to remove the disk from the portal:

vmdisks

and then you can select ‘delete disk’ from the bottom of the window:

deletedisk

Note: when you delete the disk you are not deleting the disk (VHD page blob) in your storage account. You are only disassociating it from the images that can be used for Windows Azure Virtual Machines. After you have done all of the above, you will be able to delete the disk from your storage account, using Windows Azure Storage REST APIs or storage explorers.

Images (IaaS)

Windows Azure uses the concept of an “Image” to describe a template VHD that can be used to create one or more Virtual Machines. Windows Azure and some partners provide images that can be used to create Virtual Machines. You can also create images for yourself by capturing an image of an existing Windows Azure Virtual Machine, or you can upload a sysprep’d image to your storage account. An image is also in the VHD format, but the platform will not write to the image. Instead, when you create a Virtual Machine from an image, the system will create a copy of that image’s page blob in your storage account, and that copy will be used for the Virtual Machine’s operating system disk.

IMPORTANT: Windows Azure holds an infinite lease on all the page blobs, the blob container and the storage account that it considers images in your storage account. Therefore, to delete the underlying page blob, you need to delete the image from the portal by going to the “Virtual Machines” section, clicking on “Images”:

vmimages

Then you select your image and press “Delete Image” at the bottom of the screen. This will disassociate the VHD from your set of registered images, but it does not delete the page blob from your storage account. At that point, you will be able to delete the image from your storage account.

Temporary Disk

There is another disk present in all web roles, worker roles, VM Roles, and Windows Azure Virtual Machines, called the temporary disk. This is a physical disk on the node that can be used for scratch space. Data on this disk will be lost when the VM is moved to another physical machine, which can happen during upgrades, patches, and when Windows Azure detects something is wrong with the node you are running on. The sizes offered for the temporary disk are defined here.

The temporary disk is the ideal place to store your operating system’s pagefile.

IMPORTANT: The temporary disk is not persistent. You should only write data onto this disk that you are willing to lose at any time.

Billing

Windows Azure Storage charges for Bandwidth, Transactions, and Storage Capacity. The per-unit costs of each can be found here.

Bandwidth

We recommend mounting drives from within the same location (e.g., US East) as the storage account they are stored in, as this offers the best performance, and also will not incur bandwidth charges. With disks, you are required to use them within the same location the disk is stored.

Transactions

When connected to a VM, disk IOs from both drives and disks will be satisfied from storage (unless one of the layers of cache described below can satisfy the request first). Small disk IOs will incur one Windows Azure Storage transaction per IO. Larger disk IOs will be split into smaller IOs, so they will incur more transaction charges. The breakdown for this is:

  • Drives
    • IO < 2 megabytes will be 1 transaction
    • IO >= 2 megabytes will be broken into transactions of 2MBs or smaller
  • Disks
    • IO < 128 kilobytes will be 1 transaction
    • IO >= 128 kilobytes will be broken into transactions of 128KBs or smaller

In addition, operating systems often perform a little read-ahead for small sequential IOs (typically less than 64 kilobytes), which may result in larger sized IOs to drives/disks than the IO size being issued by the application. If the prefetched data is used, then this can result in fewer transactions to your storage account than the number of IOs issued by your application.

Storage Capacity

Windows Azure Storage stores pages blobs and thus VHDs in sparse format, and therefore only charges for data within the VHD that has actually been written to during the life of the VHD. Therefore, we recommend using ‘quick format’ because this will avoid storing large ranges of zeros within the page blob. When creating a VHD you can choose the quick format option by specifying the below:

quickformat

It is also important to note that when you delete files within the file system used by the VHD, most operating systems do not clear or zero these ranges, so you can still be paying capacity charges within a blob for the data that you deleted via a disk/drive.

Caches, Caches, and more Caches

Drives and disks both support on-disk caching and some limited in-memory caching. Many layers of the operating system as well as application libraries do in-memory caching as well. This section highlights some of the caching choices you have as an application developer.

Caching can be used to improve performance, as well as to reduce transaction costs. The following table outlines some of the caches that are available for use with disks and drives. Each is described in more detail below the table.

 

Type

Purpose

Data Persistence

FileStream

Memory

Improves performance and reduces IOs for sequential reads and writes.

Writes are not automatically persisted. Call “Flush” to persist writes.

Operating System Caching

Memory

Improves performance and reduces IOs for random and sequential reads and writes.

Writes are not automatically persisted. Use “Write through” file handles, or “Flush” to persist writes.

Window Azure Drive Caches

Memory And Disk

Reduces read transactions to storage. Can improve performance for sequential IO, depending on workload.

Writes are automatically persisted. Use “Write through” file handles, or “Flush” to know writes are persisted.

Windows Azure Virtual Machine Disk Caches

Memory And Disk

Reduces transactions to storage. Can improve performance for sequential IO, depending on workload. Improves boot time.

Writes are automatically persisted. Use “Write through” file handles, or “Flush” to know writes are persisted.

No Disk or Drive Cache

N/A

Can improve performance for random and sequential IO, depending on workload.

Writes are automatically persisted. Use “Write through” file handles, or “Flush” to know writes are persisted.

FileStream (applies to both disks and drives)

.NET framework’s FileStream class will cache reads and writes in memory to reduce IOs to the disk. Some of the FileStream constructors take a cache size, and others will choose the default 8k cache size for you. You can not specify that the class use no memory cache, as the minimum cache size is 8 bytes. You can force the buffer to be written to disk by calling the FileStream.Flush(bool) API.

Operating System Caching (applies to both disks and drives)

The operating system itself will do in-memory buffering for both reads and writes, unless you explicitly turn it off when you open a file using FILE_FLAG_WRITE_THROUGH and/or FILE_FLAG_NO_BUFFERING. An in-depth discussion of the in memory caching behavior of windows is available here.

Windows Azure Drive Caches

Drives allow you to choose whether to use the node’s local temporary disk as a read cache, or to use no cache at all. The space for a drive’s cache is allocated from your web role or worker role’s temporary disk. This cache is write-through, so writes are always committed immediately to storage. Reads will be satisfied either from the local disk, or from storage.

Using the drive local cache can improve sequential IO read performance when the reads ‘hit’ the cache. Sequential reads will hit the cache if:

  1. The data has been read before. The data is cached on the first time it is read, not on first write.
  2. The cache is large enough to hold all of the data.

Access to the blob can often deliver a higher rate of random IOs than the local disk. However, these random IOs will incur storage transaction costs. To reduce the number of transactions to storage, you can use the local disk cache for random IOs as well. For best results, ensure that your random writes to the disk are 8KB aligned, and the IO sizes are in multiples of 8KB.

Windows Azure Virtual Machine Disk Caches

When deploying a Virtual Machine, the OS disk has two host caching choices:

  1. Read/Write (Default) – write back cache
  2. Read - write through cache

When you setup a data disk on a virtual machine, you get three host caching choices:

  1. Read/Write – write back cache
  2. Read – write through cache
  3. None (Default)

The type of cache to use for data disks and the OS disk is not currently exposed through the portal. To set the type of host caching, you must either use the Service Management APIs (either Add Data Disk or Update Data Disk) or the Powershell commands (Add-AzureDataDisk or Set-AzureDataDisk).

The read cache is stored both on disk and in memory in the host OS. The write cache is stored in memory in the host OS.

WARNING: If your application does not use FILE_FLAG_WRITE_THROUGH, the write cache could result in data loss because the data could be sitting in the host OS memory waiting to be written when the physical machine crashes unexpectedly.

Using the read cache will improve sequential IO read performance when the reads ‘hit’ the cache. Sequential reads will hit the cache if:

  1. The data has been read before.
  2. The cache is large enough to hold all of the data.

The cache’s size for a disk varies based on instance size and the number of disks mounted. Caching can only be enabled for up to four data disks.

No Caching for Windows Azure Drives and VM Disks

Windows Azure Storage can provide a higher rate of random IOs than the local disk on your node that is used for caching. If your application needs to do lots of random IOs, and throughput is important to you, then you may want to consider not using the above caches. Keep in mind, however, that IOs to Windows Azure Storage do incur transaction costs, while IOs to the local cache do not.

To disable your Windows Azure Drive cache, pass ‘0’ for the cache size when you call the Mount() API.

For a Virtual Machine data disk the default behavior is to not use the cache. If you have enabled the cache on a data disk, you can disable it using the Update Data Disk service management API, or the Set-AzureDataDisk powershell command.

For a Virtual Machine operating system disk the default behavior is to use the cache. If your application will do lots of random IOs to data files, you may want to consider moving those files to a data disk which has the caching turned off.

 

Andrew Edwards and Brad Calder

Windows Azure Storage – 4 Trillion Objects and Counting

$
0
0

Windows Azure Storage has had an amazing year of growth. We have over 4 trillion objects stored, process an average of 270,000 requests per second, and reach peaks of 880,000 requests per second.

About a year ago we hit the 1 trillion object mark. Then for the past 12 months, we saw an impressive 4x increase in number of objects stored, and a 2.7x increase in average requests per second.

The following graph shows the number of stored objects in Windows Azure Storage over the past year. The number of stored objects is counted on the last day of the month shown. The object count is the number of unique user objects stored in Windows Azure Storage, so the counts do not include replicas.

objectcount

The following graph shows the average and peak requests per second. The average requests per second is the average over the whole month shown, and the peak requests per second is the peak for the month shown.

requestspersecond 

We expect this growth rate to continue, especially since we just lowered the cost of requests to storage by 10x. It now costs $0.01 per 100,000 requests regardless of request type (same cost for puts and gets). This makes object puts and gets to Windows Azure Storage 10x to 100x cheaper than other cloud providers.

In addition, we now offer two types of durability for your storage – Locally Redundant Storage (LRS) and Geo Redundant Storage (GRS). GRS is the default storage that we have always provided, and now we are offering a new type of storage called LRS. LRS is offered at a discount and provides locally redundant storage, where we maintain an equivalent 3 copies of your data within a given location. GRS provides geo-redundant storage, where we maintain an equivalent 6 copies of your data spread across 2 locations at least 400 miles apart from each other (3 copies are kept in each location). This allows you to choose the desired level of durability for your data. And of course, if your data does not require the additional durability of GRS you can use LRS at a 23% to 34% discounted price (depending on how much data is stored). In addition, we also employ a sophisticated erasure coding scheme for storing data that provides higher durability than just storing 3 (for LRS) or 6 (for GRS) copies of your data, while at the same time keeping the storage overhead low, as described in our USENIX paper.

We are also excited about our recent release of Windows Azure Virtual Machines, where the persistent disks are stored as objects (blobs) in Windows Azure Storage. This allows the OS and data disks used by your VMs to leverage the same LRS and GRS durability provided by Windows Azure Storage. With that release we also provided access to Windows Azure Storage via easy to use client libraries for many popular languages (.net, java, node.js, php, and python), as well as REST.

Windows Azure Storage uses a unique approach of storing different object types (Blobs, Disks/Drives, Tables, Queues) in the same store, as described in our SOSP paper. The total number of blobs (disk/drives are stored as blobs), table entities, and queue messages stored account for the 4+ trillion objects in our unified store. By blending different types of objects across the same storage stack, we have a single stack for replicating data to keep it durable, a single stack for automatic load balancing and dealing with failures to keep data available, and we store all of the different types of objects on the same hardware, blending their workloads, to keep prices low. This allows us to have one simple pricing model for all object types (same cost in terms of GB/month, bandwidth, as well as transactions), so customers can focus on choosing the type of object that best fits their needs, instead of being forced to use one type of object over another due to price differences.

We are excited about the growth ahead and continuing to work with customers to provide a quality service. Please let us know if you have any feedback, questions or comments! If you would like to learn more about Windows Azure, click here.

Brad Calder


Introducing Windows Azure Storage Client Library 2.0 for .NET and Windows Runtime

$
0
0

Today we are releasing version 2.0 of the Windows Azure Storage Client Library. This is our largest update to our .NET library to date which includes new features, broader platform compatibility, and revisions to address the great feedback you’ve given us over time. The code is available on GitHub now. The libraries are also available through NuGet, and also included in the Windows Azure SDK for .NET - October 2012; for more information and links see below. In addition to the .NET 4.0 library, we are also releasing two libraries for Windows Store apps as Community Technology Preview (CTP) that fully supports the Windows Runtime platform and can be used to build modern Windows Store apps for both Windows RT (which supports ARM based systems), and Windows 8, which runs in any of the languages supported by Windows Store apps (JavaScript, C++, C#, and Visual Basic). This blog post serves as an overview of these libraries and covers some of the implementation details that will be helpful to understand when developing cloud applications in .NET regardless of platform.

What’s New

We have introduced a number of new features in this release of the Storage Client Library including:

  • Simplicity and Usability - A greatly simplified API surface which will allow developers new to storage to get up and running faster while still providing the extensibility for developers who wish to customize the behavior of their applications beyond the default implementation.
  • New Table Implementation - An entirely new Table Service implementation which provides a simple interface that is optimized for low latency/high performance workloads, as well as providing a more extensible serialization model to allow developers more control over their data.
  • Rich debugging and configuration capabilities– One common piece of feedback we receive is that it’s too difficult to know what happened “under the covers” when making a call to the storage service. How many retries were there? What were the error codes? The OperationContext object provides rich debugging information, real-time status events for parallel and complex actions, and extension points allowing users the ability to customize requests or enable end to end client tracing
  • Windows Runtime Support - A Windows Runtime component with support for developing Windows Store apps using JavaScript, C++,C#, and Visual Basic; as well as a Strong Type Tables Extension library for C++, C#, and Visual Basic
  • Complete Sync and Asynchronous Programming Model (APM) implementation - A complete Synchronous API for .Net 4.0. Previous releases of the client implemented synchronous methods by simply surrounding the corresponding APM methods with a ManualResetEvent, this was not ideal as extra threads remained blocked during execution. In this release all synchronous methods will complete work on the thread in which they are called with the notable exceptions of the stream implementations available via Cloud[Page|Block]Blob.Open[Read|Write]due to parallelism.
  • Simplified RetryPolicies - Easy and reusable RetryPolicies
  • .NET Client Profile– The library now supports the .NET Client Profile. For more on the .Net Client Profile see here.
  • Streamlined Authentication Model - There is now a single StorageCredentials type that supports Anonymous, Shared Access Signature, and Account and Key authentication schemes
  • Consistent Exception Handling - The library immediately will throw any exception encountered prior to making the request to the server. Any exception that occurs during the execution of the request will subsequently be wrapped inside a single StorageException type that wraps all other exceptions as well as providing rich information regarding the execution of the request.
  • API Clarity - All methods that make requests to the server are clearly marked with the [DoesServiceRequest] attribute
  • Expanded Blob API - Blob DownloadRange allows user to specify a given range of bytes to download rather than rely on a stream implementation
  • Blob download resume - A feature that will issue a subsequent range request(s) to download only the bytes not received in the event of a loss of connectivity
  • Improved MD5 - Simplified MD5 behavior that is consistent across all client APIs
  • Updated Page Blob Implementation - Full Page Blob implementation including read and write streams
  • Cancellation - Support for Asynchronous Cancellation via the ICancellableAsyncResult. Note, this can be used with .NET CancellationTokens via the CancellationToken.Register() method.
  • Timeouts - Separate client and server timeouts which support end to end timeout scenarios
  • Expanded Azure Storage Feature Support– It supports the 2012-02-12 REST API version with implementation for for Blob & Container Leases, Blob, Table, and Queue Shared Access Signatures, and Asynchronous Cross-Account Copy Blob

Design

When designing the new Storage Client for .NET and Windows Runtime, we set up a series of design guidelines to follow throughout the development process. In addition to these guidelines, there are some unique requirements when developing for Windows Runtime, and specifically when projecting into JavaScript, that has driven some key architectural decisions.

For example, our previous RetryPolicy was based on a delegate that the user could configure; however as this cannot be supported on all platforms we have redesigned the RetryPolicy to be a simple and consistent implementation everywhere. This change has also allowed us to simplify the interface in order to address user feedback regarding the complexity of the previous implementation. Now a user who constructs a custom RetryPolicy can re-use that same implementation across platforms.

Windows Runtime

A key driver in this release was expanding platform support, specifically targeting the upcoming releases of Windows 8, Windows RT, and Windows Server 2012. As such, we are releasing the following two Windows Runtime components to support Windows Runtime as Community Technology Preview (CTP):

  • Microsoft.WindowsAzure.Storage.winmd - A fully projectable storage client that supports JavaScript, C++, C#, and VB. This library contains all core objects as well as support for Blobs, Queues, and a base Tables Implementation consumable by JavaScript
  • Microsoft.WindowsAzure.Storage.Table.dll – A table extension library that provides generic query support and strong type entities. This is used by non-JavaScript applications to provide strong type entities as well as reflection based serialization of POCO objects

Breaking Changes

With the introduction of Windows 8, Windows RT, and Windows Server 2012 we needed to broaden the platform support of our current libraries. To meet this requirement we have invested significant effort in reworking the existing Storage Client codebase to broaden platform support, while also delivering new features and significant performance improvements (more details below). One of the primary goals in this version of the client libraries was to maintain a consistent API across platforms so that developer’s knowledge and code could transfer naturally from one platform to another. As such, we have introduced some breaking changes from the previous version of the library to support this common interface. We have also used this opportunity to act on user feedback we have received via the forums and elsewhere regarding both the .Net library as well as the recently released Windows Azure Storage Client Library for Java. For existing users we will be posting an upgrade guide for breaking changes to this blog that describes each change in more detail.

Please note the new client is published under the same NuGet package as previous 1.x releases. As such, please check any existing projects as an automatic upgrade will introduce breaking changes.

Additional Dependencies

The new table implementation depends on three libraries (collectively referred to as ODataLib), which are resolved through the ODataLib (version 5.0.2) packages available through NuGet and not the WCF Data Services installer which currently contains 5.0.0 versions.  The ODataLib libraries can be downloaded directly or referenced by your code project through NuGet.  The specific ODataLib packages are:

http://nuget.org/packages/Microsoft.Data.OData/5.0.2

http://nuget.org/packages/Microsoft.Data.Edm/5.0.2

http://nuget.org/packages/System.Spatial/5.0.2

Namespaces

One particular breaking change of note is that the name of the assembly and root namespace has moved to Microsoft.WindowsAzure.Storage instead of Microsoft.WindowsAzure.StorageClient.In addition to aligning better with other Windows Azure service libraries this change allows developers to use the legacy 1.X versions of the library and the 2.0 release side-by-side as they migrate their applications. Additionally, each Storage Abstraction (Blob, Table, and Queue) has now been moved to its own sub-namespace to provide a more targeted developer experience and cleaner IntelliSense experience. For example the Blob implementation is located in Microsoft.WindowsAzure.Storage.Blob, and all relevant protocol constructs are located in Microsoft.WindowsAzure.Storage.Blob.Protocol.

Testing, stability, and engaging the open source community

We are committed to providing a rock solid API that is consistent, stable, and reliable. In this release we have made significant progress in increasing test coverage as well as breaking apart large test scenarios into more targeted ones that are more consumable by the public.

Microsoft and Windows Azure are making great efforts to be as open and transparent as possible regarding the client libraries for our services. The source code for all the libraries can be downloaded via GitHub under the Apache 2.0 license. In addition we have provided over 450 new Unit Tests for the .Net 4.0 library alone. Now users who wish to modify the codebase have a simple and light weight way to validate their changes. It is also important to note that most of these tests run against the Storage Emulator that ships via the Windows Azure SDK for .NET allowing users to execute tests without incurring any usage on their storage accounts. We will also be providing a series of higher level scenarios and How-To’s to get users up and running both simple and advanced topics relating to using Windows Azure Storage.

Summary

We have put a lot of work into providing a truly first class development experience for the .NET community to work with Windows Azure Storage. In addition to the content provided in these blog posts we will continue to release a series of additional blog posts which will target various features and scenarios in more detail, so check back soon. Hopefully you can see your past feedback reflected in this new library. We really do appreciate the feedback we have gotten from the community, so please keep it coming by leaving a comment below or participating on our forums.

Joe Giardino
Serdar Ozler
Justin Yu
Veena Udayabhanu

Windows Azure Storage

Resources

Get the Windows Azure SDK for .Net

Windows Azure Storage BUILD Talk - What’s Coming, Best Practices and Internals

$
0
0

At Microsoft’s Build conference we spoke about Windows Azure Storage internals, best practices and a set of exciting new features that we have been working on. Before we go ahead talking about the exciting new features in our pipeline, let us reminiscence a little about the past year. It has been almost a year since we blogged about the number of objects and average requests per second we serve.

This past year once again has proven to be great for Windows Azure Storage with many external customers and internal products like XBox, Skype, SkyDrive, Bing, SQL Server, Windows Phone, etc, driving significant growth for Windows Azure Storage and making it their choice for storing and serving critical parts of their service. This has resulted in Windows Azure Storage hosting more than 8.5 trillion unique objects and serving over 900K request/sec on an average (that’s over 2.3 trillion requests per month). This is a 2x increase in number of objects stored and 3x increase in average requests/sec since we last blogged about it a year ago!

In the talk, we also spoke about a variety of new features in our pipeline. Here is a quick recap on all the features we spoke about.

  • Queue Geo-Replication: we are pleased to announce that all queues are now geo replicated for Geo Redundant Storage accounts. This means that all data for Geo Redundant Storage accounts are now geo-replicated (Blobs, Tables and Queues).

By end of CY ’13, we are targeting to release the following features:

  • Secondary read-only access: we will provide a secondary endpoint that can be utilized to read an eventually consistent copy of your geo-replicated data. In addition, we will provide an API to retrieve the current replication lag for your storage account. Applications will be able to access the secondary endpoint as another source for computing over the accounts data as well as a fallback option if primary is not available.
  • Windows Azure Import/Export: we will preview a new service that allows customers to ship terabytes of data in/out of Windows Azure Blobs by shipping disks.
  • Real-Time Metrics: we will provide in near real-time per minute aggregates of storage metrics for Blobs, Tables and Queues. These metrics will provide more granular information about your service, which hourly metrics tends to smoothen out.
  • Cross Origin Resource Sharing (CORS): we will enable CORS for Azure Blobs, Tables and Queue services. This enables our customers to use Javascript in their web pages to access storage directly. This will avoid requiring a proxy service to route storage requests to circumvent the fact that browsers prevent cross domain access.
  • JSON for Azure Tables: we will enable OData v3 JSON protocol which is much lighter and performant than AtomPub. In specific, JSON protocol has a NoMetadata option which is a very efficient protocol in terms of bandwidth.

If you missed the Build talk, you can now access it from here as it covers in more detail the above mentioned features in addition to best practices.

Brad Calder and Jai Haridas

Introducing Storage Client Library 2.1 RC for .NET and Windows Phone 8

$
0
0

We are pleased to announce the public availability of 2.1 Release Candidate (RC) build for the storage client library for .NET and Windows Phone 8. The 2.1 release includes expanded feature support, which this blog will detail.

Why RC?

We have spent significant effort in releasing the storage clients on a more frequent cadence as well as becoming more responsive to client feedback. As we continue that effort, we wanted to provide an RC of our next release, so that you can provide us feedback that we might be able to address prior to the “official” release. Getting your feedback is the goal of this release candidate, so please let us know what you think.

What’s New?

This release includes a number of new features, many of which have come directly from client feedback (so please keep this coming), which are detailed below.

Async Task Methods

Each public API now exposes an Async method that returns a task for a given operation. Additionally, these methods support pre-emptive cancellation via an overload which accepts a CancellationToken. If you are running under .NET 4.5, or using the Async Targeting Pack for .NET 4.0, you can easily leverage the async / await pattern when writing your applications against storage.

Table IQueryable

In 2.1 we are adding IQueryable support for the Table Service layer on desktop and phone. This will allow users to construct and execute queries via LINQ similar to WCF Data Services, however this implementation has been specifically optimized for Windows Azure Tables and NoSQL concepts. The snippet below illustrates constructing a query via the new IQueryable implementation:

 

var query = from ent in currentTable.CreateQuery<CustomerEntity>()

where ent.PartitionKey == “users” && ent.RowKey = “joe”

select ent;

 

 

The IQueryable implementation transparently handles continuations, and has support to add RequestOptions, OperationContext, and client side EntityResolvers directly into the expression tree. To begin using this please add a using to the Microsoft.WindowsAzure.Storage.Table.Queryable namespace and construct a query via the CloudTable.CreateQuery<T>() method. Additionally, since this makes use of existing infrastructure optimization such as IBufferManager, Compiled Serializers, and Logging are fully supported.

Buffer Pooling

For high scale applications, Buffer Pooling is a great strategy to allow clients to re-use existing buffers across many operations. In a managed environment such as .NET, this can dramatically reduce the number of cycles spent allocating and subsequently garbage collecting semi-long lived buffers.

To address this scenario each Service Client now exposes a BufferManager property of type IBufferManager. This property will allow clients to leverage a given buffer pool with any associated objects to that service client instance. For example, all CloudTable objects created via CloudTableClient.GetTableReference() would make use of the associated service clients BufferManager. The IBufferManager is patterned after the BufferManager in System.ServiceModel.dll to allow desktop clients to easily leverage an existing implementation provided by the framework. (Clients running on other platforms such as WinRT or Windows Phone may implement a pool against the IBufferManager interface)

Multi-Buffer Memory Stream

During the course of our performance investigations we have uncovered a few performance issues with the MemoryStream class provided in the BCL (specifically regarding Async operations, dynamic length behavior, and single byte operations). To address these issues we have implemented a new Multi-Buffer memory stream which provides consistent performance even when length of data is unknown. This class leverages the IBufferManager if one is provided by the client to utilize the buffer pool when allocating additional buffers. As a result, any operation on any service that potentially buffers data (Blob Streams, Table Operations, etc.) now consumes less CPU, and optimally uses a shared memory pool.

.NET MD5 is now default

Our performance testing highlighted a slight performance degradation when utilizing the FISMA compliant native MD5 implementation compared to the built in .NET implementation. As such, for this release the .NET MD5 is now used by default, any clients requiring FISMA compliance can re-enable it as shown below:

 

CloudStorageAccount.UseV1MD5 = false;

 

Client Tracing

The 2.1 release implements .NET tracing, allowing users to enable log information regarding request execution and REST requests (See below for a table of what information is logged). Additionally, Windows Azure Diagnostics provides a trace listener that can redirect client trace messages to the WADLogsTable if users wish to persist these traces to the cloud.

To enable tracing in .NET you must add a trace source for the storage client to the app.config and set the verbosity

 

 

<system.diagnostics>
<sources>
<source name="Microsoft.WindowsAzure.Storage">
<listeners>
<add name="myListener"/>
</listeners>
</source>
</sources>
<switches>
<add name="Microsoft.WindowsAzure.Storage" value="Verbose" />
</switches>

 

 

The application is now set to log all trace messages created by the storage client up to the Verbose level. However, if a client wishes to enable logging only for specific clients or requests they can further configure the default logging level in their application by setting OperationContext.DefaultLogLevel and then opt-in any specific requests via the OperationContext object:

 

// Disbable Default Logging
OperationContext.DefaultLogLevel = LogLevel.Off;

// Configure a context to track my upload and set logging level to verbose
OperationContext myContext = new OperationContext() { LogLevel = LogLevel.Verbose };

blobRef.UploadFromStream(stream, myContext);

 

New Blob APIs

In 2.1 we have added Blob Text, File, and Byte Array APIs based on feedback from clients. Additionally, Blob Streams can now be opened, flushed, and committed asynchronously via new Blob Stream APIs.

New Range Based Overloads

In 2.1 Blob upload API’s include an overload which allows clients to only upload a given range of the byte array or stream to the blob. This feature allows clients to avoid potentially pre-buffering data prior to uploading it to the storage service. Additionally, there are new download range API’s for both streams and byte arrays that allow efficient fault tolerant range downloads without the need to buffer any data on the client side.

IgnorePropertyAttribute

When persisting POCO objects to Windows Azure Tables in some cases clients may wish to omit certain client only properties. In this release we are introducing the IgnorePropertyAttribute to allow clients an easy way to simply ignore a given property during serialization and de-serialization of an entity. The following snippet illustrates how to ignore my FirstName property of my entity via the IgnorePropertyAttribute:

 

public class Customer : TableEntity
{
[IgnoreProperty]
public string FirstName { get; set; }
}

 

Compiled Serializers

When working with POCO types previous releases of the SDK relied on reflection to discover all applicable properties for serialization / de-serialization at runtime. This process was both repetitive and expensive computationally. In 2.1 we are introducing support for Compiled Expressions which will allow the client to dynamically generate a LINQ expression at runtime for a given type. This allows the client to do the reflection process once and then compile a lambda at runtime which can now handle all future read and writes of a given entity type. In performance micro-benchmarks this approach is roughly 40x faster than the reflection based approach computationally.

All compiled expressions for read and write are held in a static concurrent dictionaries on TableEntity. If you wish to disable this feature simply set TableEntity.DisableCompiledSerializers = true;

Easily Serialize 3rd Party Objects

In some cases clients wish to serialize objects in which they do not control the source, for example framework objects or objects form 3rd party libraries. In previous releases clients were required to write custom serialization logic for each type they wished to serialize. In the 2.1 release we are exposing the core serialization and de-serialization logic for any CLR type via the static TableEntity.[Read|Write]UserObject methods. This allows clients to easily persist and read back entities objects for types that do not derive from TableEntity or implement the ITableEntity interface. This pattern can also be especially useful when exposing DTO types via a service as the client will longer be required to maintain two entity types and marshal between them.

Numerous Performance Improvements

As part of our ongoing focus on performance we have included numerous performance improvements across the APIs including parallel blob upload, table service layer, blob write streams, and more. We will provide more detailed analysis of the performance improvements in an upcoming blog post.

Windows Phone

The Windows Phone client is based on the same source code as the desktop client, however there are 2 key differences due to platform limitations. The first is that the Windows Phone library does not expose synchronous methods in order to keep applications fast and fluid. Additionally, the Windows Phone library does not provide MD5 support as the platform does not expose an implementation of MD5. As such, if your scenario requires it, you must validate the MD5 at the application layer. The Windows Phone library is currently in testing and will be published in the coming weeks. Please note that it will only be compatible with Windows Phone 8, not 7.x.

Summary

We have spent considerable effort in improving the storage client libraries in this release. We welcome any feedback you may have in the comments section below, the forums, or GitHub.

Joe Giardino

Resources

Getting the most out of Windows Azure Storage – TechEd NA ‘13

Nuget

Github

2.1 Complete Changelog

.NET Clients encountering Port Exhaustion after installing KB2750149 or KB2805227

$
0
0

A recent update for .NET 4.5 introduced a regression to HttpWebRequest that may affect high scale applications. This blog post will cover the details of this change, how it impacts clients, and mitigations clients may take to avoid this issue altogether.

What is the effect?

Client would observe long latencies for their Blob, Queue, and Table Storage requests and may find either that that their requests to storage are dispatched after a delay or it is not dispatching requests to storage and instead see System.Net.WebException being thrown from their application when trying to access storage. The details about the exception is explained below. Running a netstat as described in the next section would show that the process has consumed many ports causing port exhaustion.

Who is affected?

Any client that is accessing Windows Azure Storage from a .NET platform with KB2750149 or KB2805227 installed that does not consume the entire response stream will be affected. This includes clients that are accessing the REST API directly via HttpWebRequest and HttpClient, the Storage Client for Windows RT, as well as the .NET Storage Client Library (2.0.6.0 and below provided via NuGet, GitHub, and the SDK). You can read more about the specifics of this update here.

In many cases the Storage Client Libraries do not expect a body to be returned from the server based on the REST API and subsequently do not attempt to read the response stream. Under previous behavior this “empty” response consisting of a single 0 length chunk would have been automatically consumed by the .NET networking layer allowing the socket to be reused. To address this change proactively we have added a fix to the .NET Client library in version 2.0.6.1 to explicitly drain the response stream.

A client can use the netstat utility to check for processes that are holding many ports open in the TIME_WAIT or ESTABLISHED states by issuing a nestat –a –o ( The –a will show all connections, and the -o option will display the owner process ID).

 

netstat_thumb3

Running this command on an affected machine shows the following:

netstat-result_thumb2

You can see above that a single process with ID 3024 is holding numerous connections open to the server.

Description

Users installing the recent update (KB2750149 or KB2805227) will observe slightly different behavior when leveraging the HttpWebRequest to communicate with a server that returns a chunked encoded response. (For more on Chunked encoded data see here).

When a server responds to an HTTP request with a chunked encoded response the client may be unaware of the entire length of the body, and therefore will read the body in a series of chunks from the response stream. The response stream is terminated when the server sends a zero length “chunk” followed by a CRLF sequence (see the article above for more details). When the server responds with an empty body this entire payload will consists of a single zero-length chunk to terminate the stream.

Prior to this update the default behavior of the HttpWebRequest was to attempt to “drain” the response stream whenever the users closes the HttpWebResponse. If the request can successfully read the rest of the response then the socket may be reused by another request in the application and is subsequently returned back to the shared pool. However, if a request still contains unread data then the underlying socket will remain open for some period of time before being explicitly disposed. This behavior will not allow the socket to be reused by the shared pool causing additional performance degradation as each request will be required to establish a new socket connection with the service.

Client Observed Behavior

In some cases older versions of the Storage Client Library will not retrieve the response stream from the HttpWebRequest (i.e. PUT operations), and therefore will not drain it, even though data is not sent by the server. Clients with KB2750149 or KB2805227 installed that leverage these libraries may begin to encounter TCP/IP port exhaustion. When TCP/IP port exhaustion does occur a client will encounter the following Web and Socket Exceptions:

System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send.

- or -

System.Net.WebException: Unable to connect to the remote server
System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted.

Note, if you are accessing storage via the Storage Client library these exceptions will be wrapped in a StorageException:

Microsoft.WindowsAzure.Storage.StorageException: Unable to connect to the remote server

System.Net.WebException: Unable to connect to the remote server
System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted

Mitigation

We have been working with the .NET team to address this issue. A permanent fix is now available which reinstates this read ahead semantic in a time bounded manner.

Install KB2846046 or .NET 4.5.1 Preview

Please consider installing the hotfix (KB2846046) from the .NET team to resolve this issue. However, please note that you need to contact Microsoft Customer Support Services to obtain the hotfix. For more information, please visit the corresponding KB article.

You can also install .NET 4.5.1 Preview that already contains this fix.

Upgrade to latest version of the Storage Client (2.0.6.1)

An update was made for the 2.0.6.1 (NuGet, GitHub) version of the Storage Client library to address this issue. If possible please upgrade your application to use the latest assembly.

Uninstall KB2750149 and KB2805227

We also recognize that some clients may be running applications that still utilize the 1.7 version of the storage client and may not be able to easily upgrade to the latest version without additional effort or install the hotfix. For such users, consider uninstalling the updates until the .NET team releases a publicly available fix for this issue. We will update this blog, once such fix is available.

Another alternative is to pin Guest OS for your Windows Azure cloud services as this prevents getting updates. This involves explicitly setting your OS to a version released before 2013.

guestos

More information on managing Guest OS updates can be found at Update the Windows Azure Guest OS from the Management Portal.

Update applications that leverage the REST API directly to explicitly drain the response stream

Any client application that directly references the Windows Azure REST API can be updated to explicitly retrieve the response stream from the HttpWebRequest via [Begin/End]GetResponseStream() and drain it manually i.e. by calling the Read or BeginRead methods until end of stream

Summary

We apologize for any inconvenience this may have caused. Please feel free to leave questions or comments below,

Joe Giardino, Serdar Ozler, Jean Ghanem, and Marcus Swenson

 

Resources

Windows Azure Storage Client library 2.0.6.1 (NuGet, GitHub)

Original KB article #1: http://support.microsoft.com/kb/2750149

Original KB article #2: http://support.microsoft.com/kb/2805227

Hotfix KB article: http://support.microsoft.com/kb/2846046

.NET 4.5.1 Preview: http://go.microsoft.com/fwlink/?LinkId=309499

Announcing Storage Client Library 2.1 RTM & CTP for Windows Phone

$
0
0

We are pleased to announce that the storage client for .NET 2.1 has RTM’d. This release includes several notable features such as Async Task methods, IQueryable for Tables, buffer pooling support, and much more. In addition we are releasing the CTP of the storage client for Windows Phone 8. With the existing support for Windows Runtime clients can now leverage Windows Azure Storage via a consistent API surface across multiple windows platforms. As usual all of the source code is available via github (see resources section below). You can download the latest binaries via the following nuget Packages:

Nuget – 2.1 RTM

Nuget – 2.1 For Windows Phone and Windows RunTime (Preview)

Nuget – 2.1 Tables Extension library for Non-JavaScript Windows RunTime apps (Preview)

This remainder of this blog will cover some of the new features and scenarios in additional detail and provide supporting code samples. As always we appreciate your feedback, so please feel free to add comments below.

Fundamentals

For this release we focused heavily on fundamentals by dramatically expanding test coverage, and building an automated performance suite that let us benchmark performance behaviors across various high scale scenarios.

Here are a few highlights for the 2.1 release:

  • Over 1000 publicly available Unit tests covering every public API
  • Automated Performance testing to validate performance impacting changes
  • Expanded Stress testing to ensure data correctness under massive loads
  • Key performance improving features that target memory behavior and shared infrastructure (more details below)

Performance

We are always looking for ways to improve the performance of client applications by improving the storage client itself and by exposing new features that better allow clients to optimize their applications. In this release we have done both and the results are dramatic.

For example, below are the results from one of the test scenarios we execute where a single XL VM round trips 30 256MB Blobs simultaneously (7.5 GB in total). As you can see there are dramatic improvements in both latency and CPU usage compared to SDK 1.7 (CPU drops almost 40% while latency is reduced by 16.5% for uploads and 23.2% for downloads). Additionally, you may note the actual latency improvements between 2.0.5.1 and 2.1 are only a few percentage points. This is because we have successfully removed the client out of the critical path resulting in an application that is now entirely dependent on the network. Further, while we have improved performance in this scenario CPU usage has dropped another 13% on average compared to SDK 2.0.5.1.

 

perf

This is just one example of the performance improvements we have made, for more on performance as well as best practices please see the Tech Ed Presentation in the Resources section below.

Async Task Methods

Each public API now exposes an Async method that returns a task for a given operation. Additionally, these methods support pre-emptive cancellation via an overload which accepts a CancellationToken. If you are running under .NET 4.5, or using the Async Targeting Pack for .NET 4.0, you can easily leverage the async / await pattern when writing your applications against storage.

Buffer Pooling

For high scale applications, Buffer Pooling is a great strategy to allow clients to re-use existing buffers across many operations. In a managed environment such as .NET, this can dramatically reduce the number of cycles spent allocating and subsequently garbage collecting semi-long lived buffers.

To address this scenario each Service Client now exposes a BufferManager property of type IBufferManager. This property will allow clients to leverage a given buffer pool with any associated objects to that service client instance. For example, all CloudTable objects created via CloudTableClient.GetTableReference() would make use of the associated service clients BufferManager. The IBufferManager is patterned after the BufferManager in System.ServiceModel.dll to allow desktop clients to easily leverage an existing implementation provided by the framework. (Clients running on other platforms such as Windows Runtime or Windows Phone may implement a pool against the IBufferManager interface)

For desktop applications to leverage the built in BufferManager provided by the System.ServiceModel.dll a simple adapter is required:

using Microsoft.WindowsAzure.Storage;
using System.ServiceModel.Channels;

publicclass WCFBufferManagerAdapter : IBufferManager
{
privateint defaultBufferSize = 0;

public WCFBufferManagerAdapter(BufferManager manager, int defaultBufferSize)
{
this.Manager = manager;
this.defaultBufferSize = defaultBufferSize;
}

public BufferManager Manager { get; internal set; }

publicvoid ReturnBuffer(byte[] buffer)
{
this.Manager.ReturnBuffer(buffer);
}

publicbyte[] TakeBuffer(int bufferSize)
{
returnthis.Manager.TakeBuffer(bufferSize);
}

publicint GetDefaultBufferSize()
{
returnthis.defaultBufferSize;
}
}

With this in place my application can now specify a shared buffer pool across any resource associated with a given service client by simply setting the BufferManager property.

BufferManager mgr = BufferManager.CreateBufferManager([MaxBufferPoolSize], [MaxBufferSize]);

serviceClient.BufferManager = new WCFBufferManagerAdapter(mgr, [MaxBufferSize]);

Multi-Buffer Memory Stream

During the course of our performance investigations we have uncovered a few performance issues with the MemoryStream class provided in the BCL (specifically regarding Async operations, dynamic length behavior, and single byte operations). To address these issues we have implemented a new Multi-Buffer memory stream which provides consistent performance even when length of data is unknown. This class leverages the IBufferManager if one is provided by the client to utilize the buffer pool when allocating additional buffers. As a result, any operation on any service that potentially buffers data (Blob Streams, Table Operations, etc.) now consumes less CPU, and optimally uses a shared memory pool.

.NET MD5 is now default

Our performance testing highlighted a slight performance degradation when utilizing the FISMA compliant native MD5 implementation compared to the built in .NET implementation. As such, for this release the .NET MD5 is now used by default, any clients requiring FISMA compliance can re-enable it as shown below:

CloudStorageAccount.UseV1MD5 = false;

New Range Based Overloads

In 2.1 Blob upload API’s include an overload which allows clients to only upload a given range of the byte array or stream to the blob. This feature allows clients to avoid potentially pre-buffering data prior to uploading it to the storage service. Additionally, there are new download range API’s for both streams and byte arrays that allow efficient fault tolerant range downloads without the need to buffer any data on the client side.

Client Tracing

The 2.1 release implements .NET Tracing, allowing users to enable log information regarding request execution and REST requests (See below for a table of what information is logged). Additionally, Windows Azure Diagnostics provides a trace listener that can redirect client trace messages to the WADLogsTable if users wish to persist these traces to the cloud.

Logged Data

Each log line will include the following data:

  • Client Request ID: Per request ID that is specified by the user in OperationContext
  • Event: Free-form text

As part of each request the following data will be logged to make it easier to correlate client-side logs to server-side logs:

  • Request:
  • Request Uri
  • Response:
  • Request ID
  • HTTP status code

Trace Levels

Level

Events

Off

Nothing will be logged.

Error

If an exception cannot or will not be handled internally and will be thrown to the user; it will be logged as an error.

Warning

If an exception is caught and handled internally, it will be logged as a warning. Primary use case for this is the retry scenario, where an exception is not thrown back to the user to be able to retry. It can also happen in operations such as CreateIfNotExists, where we handle the 404 error silently.

Informational

The following info will be logged:

  • Right after the user calls a method to start an operation, request details such as URI and client request ID will be logged.
  • Important milestones such as Sending Request Start/End, Upload Data Start/End, Receive Response Start/End, Download Data Start/End will be logged to mark the timestamps.
  • Right after the headers are received, response details such as request ID and HTTP status code will be logged.
  • If an operation fails and the storage client decides to retry, the reason for that decision will be logged along with when the next retry is going to happen.
  • All client-side timeouts will be logged when storage client decides to abort a pending request.

Verbose

Following info will be logged:

  • String-to-sign for each request
  • Any extra details specific to operations (this is up to each operation to define and use)


Enabling Tracing

A key concept is the opt-in / opt-out model that the client provides to tracing. In typical applications it is customary to enable tracing at a given verbosity for a specific class. This works fine for many client applications, however for cloud applications that are executing at scale this approach may generate much more data than what is required by the user. As such we have provided the ability for clients to work in either an opt-in model for tracing which allows clients to configure listeners at a given verbosity, but only log specific requests if and when they choose. Essentially this design provides the ability for users to perform “vertical” logging across layers of the stack targeted at specific requests rather than “horizontal” logging which would record all traffic seen by a specific class or layer.

To enable tracing in .NET you must add a trace source for the storage client to the app.config and set the verbosity:

<system.diagnostics>
<sources>
<sourcename="Microsoft.WindowsAzure.Storage">
<listeners>
<addname="myListener"/>
</listeners>
</source>
</sources>
<switches>
<addname="Microsoft.WindowsAzure.Storage"value="Verbose"/>
</switches>

Then add a listener to record the output; in this case we will simply record it to application.log
 
<sharedListeners>
<addname="myListener"
type="System.Diagnostics.TextWriterTraceListener"
initializeData="application.log"/>
</sharedListeners>

The application is now set to log all trace messages created by the storage client up to the Verbose level. However, if a client wishes to enable logging only for specific clients or requests they can further configure the default logging level in their application by setting OperationContext.DefaultLogLevel and then opt-in any specific requests via the OperationContext object:
 
// Disable Default Logging
OperationContext.DefaultLogLevel = LogLevel.Off;

// Configure a context to track my upload and set logging level to verbose
OperationContext myContext = new OperationContext() { LogLevel = LogLevel.Verbose };
blobRef.UploadFromStream(stream, myContext);

With client side tracing used in conjunction with storage logging clients can now get a complete view of their application from both the client and server perspectives.

Blob Features

Blob Streams

In the 2.1 release, we improved blob streams that are created by OpenRead and OpenWrite APIs of CloudBlockBlob and CloudPageBlob. The write stream returned by OpenWrite can now upload much faster when the parallel upload functionality is enabled by keeping number of active writers at a certain level. Moreover, the return type is changed from a Stream to a new type named CloudBlobStream, which is derived from Stream. CloudBlobStream offers the following new APIs:

publicabstract ICancellableAsyncResult BeginCommit(AsyncCallback callback, object state);
publicabstract ICancellableAsyncResult BeginFlush(AsyncCallback callback, object state);
publicabstractvoid Commit();
publicabstractvoid EndCommit(IAsyncResult asyncResult);
publicabstractvoid EndFlush(IAsyncResult asyncResult);

Flush already exists in Stream itself, so CloudBlobStream only adds asynchronous version. However, Commit is a completely new API that now allows the caller to commit before disposing the Stream. This allows much easier exception handling during commit and also the ability to commit asynchronously.

The read stream returned by OpenRead does not have a new type, but it now has true synchronous and asynchronous implementations. Clients can now get the stream synchronously via OpenRead or asynchronously using [Begin|End]OpenRead. Moreover, after the stream is opened, all synchronous calls such as querying the length or the Read API itself are truly synchronous, meaning that they do not call any asynchronous APIs internally.

Table Features

IgnorePropertyAttribute

When persisting POCO objects to Windows Azure Tables in some cases clients may wish to omit certain client only properties. In this release we are introducing the IgnorePropertyAttribute to allow clients an easy way to simply ignore a given property during serialization and de-serialization of an entity. The following snippet illustrates how to ignore my FirstName property of my entity via the IgnorePropertyAttribute:

publicclass Customer : TableEntity
{
[IgnoreProperty]
publicstring FirstName { get; set; }
}
Compiled Serializers

When working with POCO types previous releases of the SDK relied on reflection to discover all applicable properties for serialization / de-serialization at runtime. This process was both repetitive and expensive computationally. In 2.1 we are introducing support for Compiled Expressions which will allow the client to dynamically generate a LINQ expression at runtime for a given type. This allows the client to do the reflection process once and then compile a Lambda at runtime which can now handle all future read and writes of a given entity type. In performance micro-benchmarks this approach is roughly 40x faster than the reflection based approach computationally.

All compiled expressions for read and write are held in a static concurrent dictionaries on TableEntity. If you wish to disable this feature simply set TableEntity.DisableCompiledSerializers = true;

Serialize 3rd Party Objects

In some cases clients wish to serialize objects in which they do not control the source, for example framework objects or objects form 3rd party libraries. In previous releases clients were required to write custom serialization logic for each type they wished to serialize. In the 2.1 release we are exposing the core serialization and de-serialization logic for any CLR type. This allows clients to easily persist and read back entities objects for types that do not derive from TableEntity or implement the ITableEntity interface. This pattern can also be especially useful when exposing DTO types via a service as the client will longer be required to maintain two entity types and marshal between them.

A general purpose adapter pattern can be used which will allow clients to simply wrap an object instance in generic adapter which will handle serialization for a given type. The example below illustrates this pattern:

publicclass EntityAdapter<T> : ITableEntity where T : new()
{
public EntityAdapter()
{
// If you would like to work with objects that do not have a default Ctor you can use (T)Activator.CreateInstance(typeof(T));
this.InnerObject = new T();
}

public EntityAdapter(T innerObject)
{
this.InnerObject = innerObject;
}

public T InnerObject { get; set; }

/// <summary>
/// Gets or sets the entity's partition key.
/// </summary>
/// <value>The partition key of the entity.</value>
publicstring PartitionKey { [TODO: Must implement logic to map PartitionKey to object here!] get; set; }

/// <summary>
/// Gets or sets the entity's row key.
/// </summary>
/// <value>The row key of the entity.</value>
publicstring RowKey {[TODO: Must implement logic to map RowKey to object here!] get; set; }

/// <summary>
/// Gets or sets the entity's timestamp.
/// </summary>
/// <value>The timestamp of the entity.</value>
public DateTimeOffset Timestamp { get; set; }

/// <summary>
/// Gets or sets the entity's current ETag. Set this value to '*' in order to blindly overwrite an entity as part of an update operation.
/// </summary>
/// <value>The ETag of the entity.</value>
publicstring ETag { get; set; }

publicvirtualvoid ReadEntity(IDictionary<string, EntityProperty> properties, OperationContext operationContext)
{
TableEntity.ReadUserObject(this.InnerObject, properties, operationContext);
}

publicvirtual IDictionary<string, EntityProperty> WriteEntity(OperationContext operationContext)
{
return TableEntity.WriteUserObject(this.InnerObject, operationContext);
}
}

The following example uses the EntityAdapter pattern to insert a DTO object directly to the table via the adapter:
 
table.Execute(TableOperation.Insert(new EntityAdapter<CustomerDTO>(customer)));
 
Further I can retrieve this entity back via:
 
testTable.Execute(TableOperation.Retrieve<EntityAdapter<CustomerDTO>>(pk, rk)).Result;

Note, the Compiled Serializer functionality will be utilized for any types serialized or deserialized via TableEntity.[Read|Write]UserObject.
Table IQueryable

In 2.1 we are adding IQueryable support for the Table Service layer on desktop and phone. This will allow users to construct and execute queries via LINQ similar to WCF Data Services, however this implementation has been specifically optimized for Windows Azure Tables and NoSQL concepts. The snippet below illustrates constructing a query via the new IQueryable implementation:

var query = from ent in currentTable.CreateQuery<CustomerEntity>()
where ent.PartitionKey == “users” && ent.RowKey = “joe”
select ent;

The IQueryable implementation transparently handles continuations, and has support to add RequestOptions, OperationContext, and client side EntityResolvers directly into the expression tree. Additionally, since this makes use of existing infrastructure optimizations such as IBufferManager, Compiled Serializers, and Logging are fully supported.

Note, to support IQueryable projections the type constraint on TableQuery of ITableEntity, new() has been removed. Instead, any TableQuery objects not created via the new CloudTable.CreateQuery<T>() method will enforce this constraint at runtime.

Conceptual model

We are committed to backwards compatibility, as such we strive to make sure we introduce as few breaking changes as possible for existing clients. Therefore, in addition to supporting the new IQueryable mode of execution, we continue to support the 2.x “fluent” mode of constructing queries via the Where, Select, and Take methods. However, these modes are not strictly interoperable while constructing queries as they store data in different forms.

Aside from query construction, a key difference between the two modes is that the IQueryable interface requires that the query object be able to execute itself, as compared to the previous model of executing queries via a CloudTable object. A brief summary of these two modes of execution is listed below:

Fluent Mode (2.0.x)

  • Queries are created by directly calling a constructor
  • Queries are executed against a CloudTable object via ExecuteQuery[Segmented] methods
  • EntityResolver specified in execute overload
  • Fluent methods Where, Select, and Take are provided

IQueryable Mode (2.1+)

  • Queries are created by an associated table, i.e. CloudTable.CreateQuery<T>()
  • Queries are executed by enumerating the results, or by Execute[Segmented] methods on TableQuery
  • EntityResolver specified via LINQ extension method Resolve
  • IQueryable Extension Methods provided : WithOptions, WithContext, Resolve, AsTableQuery

The table below illustrates various scenarios between the two modes

 

Fluent Mode

IQueryable Mode

Construct Query

TableQuery<ComplexEntity> stringQuery = new TableQuery<ComplexEntity>()

TableQuery<ComplexEntity> query = (from ent in table.CreateQuery<ComplexEntity>()

Filter

q.Where(TableQuery.GenerateFilterCondition("val",QueryComparisons.GreaterThanOrEqual, 50));

TableQuery<ComplexEntity> query =(from ent in table.CreateQuery<ComplexEntity>()

                                                            where ent.val >= 50select ent);

Take

q.Take(5);

TableQuery<ComplexEntity>  query = (from ent in table.CreateQuery<ComplexEntity>()

                                                              select ent).Take(5);

Projection

q.Select(new List<string>() { "A", "C" })

TableQuery<ProjectedEntity>  query = (from ent in table.CreateQuery<ComplexEntity>()

                                                              select new ProjectedEntity(){a = ent.a,b = ent.b,c = ent.c…});

Entity Resolver

currentTable.ExecuteQuery(query, resolver)

TableQuery<ComplexEntity> query = (from ent in table.CreateQuery<ComplexEntity>()

                                                              select ent).Resolve(resolver);

Execution

currentTable.ExecuteQuery(query)

foreach (ProjectedPOCO ent in query)

< OR >

query.AsTableQuery().Execute(options, opContext)

Execution Segmented

TableQuerySegment<Entity> seg = currentTable.ExecuteQuerySegmented(query, continuationToken, options, opContext);

TableQuery<ComplexEntity> query = (from ent in table.CreateQuery<ComplexEntity>()

                                                             select ent).AsTableQuery().ExecuteSegmented(token, options, opContext);

Request Options

currentTable.ExecuteQuery(query, options, null)

TableQuery<ComplexEntity> query = (from ent in table.CreateQuery<ComplexEntity>()

                                                             select ent).WithOptions(options);

< OR >

query.AsTableQuery().Execute(options, null)

Operation Context

currentTable.ExecuteQuery(query, null, opContext)

TableQuery<ComplexEntity> query = (from ent in table.CreateQuery<ComplexEntity>()

                                                             select ent).WithContext(opContext);

< OR >

query.AsTableQuery().Execute(null, opContext)

 

Complete Query

The query below illustrates many of the supported extension methods and returns an enumerable of string values corresponding to the “Name” property on the entities.

var nameResults = (from ent in currentTable.CreateQuery<POCOEntity>()
where ent.Name == "foo"
select ent)
.Take(5)
.WithOptions(new TableRequestOptions())
.WithContext(new OperationContext())
.Resolve((pk, rk, ts, props, etag) => props["Name"].StringValue);

Note the three extension methods which allow a TableRequestOptions, an OperationContext, and an EntityResolver to be associated with a given query. These extensions are available by including a using statement for the Microsoft.WindowsAzure.Storage.Tables.Queryable namespace.

The extension .AsTableQuery() is also provided, however unlike the WCF implementation this is no longer mandatory, it simply allows clients more flexibility in query execution by providing additional methods for execution such as Task, APM, and segmented execution methods.

Projection

In traditional LINQ providers projection is handled via the select new keywords, which essentially performs two separate actions. The first is to analyze any properties that are accessed and send them to the server to allow it to only return desired columns, this is considered server side projection. The second is to construct a client side action which is executed for each returned entity, essentially instantiating and populating its properties with the data returned by the server, this is considered client side projection. In the implementation released in 2.1 we have allowed clients to separate these two different types of projections by allowing them to be specified separately in the expression tree. (Note, you can still use the traditional approach via select new if you prefer.)

Server Side Projection Syntax

For a simple scenario where you simply wish to filter the properties returned by the server a convenient helper is provided. This does not provide any client side projection functionality, it simply limits the properties returned by the service. Note, by default PartitionKey, RowKey, TimeStamp, and Etag are always requested to allow for subsequent updates to the resulting entity.

IQueryable<POCOEntity> projectionResult = from ent in currentTable.CreateQuery<POCOEntity>()
select TableQuery.Project(ent, "a", "b");

This has the same effect as writing the following, but with improved performance and simplicity:

IQueryable<POCOEntity> projectionResult = from ent in currentTable.CreateQuery<POCOEntity>()
select new POCOEntity()
{
PartitionKey = ent.PartitionKey,
RowKey = ent.RowKey,
Timestamp = ent.Timestamp,
a = ent.a,
b = ent.b
};

Client Side Projection Syntax with resolver

For scenarios where you wish to perform custom client processing during deserialization the EntityResolver is provided to allow the client to inspect the data prior to determining its type or return value. This essentially provides an open ended hook for clients to control deserialization in any way they wish. The example below performs both a server side and client side project, projecting into a concatenated string of the “FirstName” and “LastName” properties.

IQueryable<string> fullNameResults = (from ent in from ent in currentTable.CreateQuery<POCOEntity>()
select TableQuery.Project(ent, "FirstName", "LastName"))
.Resolve((pk, rk, ts, props, etag) => props["FirstName"].StringValue + props["LastName"].StringValue);

The EntityResolver can read the data directly off of the wire which avoids the step of de-serializing the data into the base entity type and then selecting out the final result from that “throw away” intermediate object. Since EntityResolver is a delegate type any client side projection logic can be implemented here (See the NoSQL section here for a more in depth example).
Type-Safe DynamicTableEntity Query Construction

The DynamicTableEntity type allows for clients to interact with schema-less data in a simple straightforward way via a dictionary of properties. However constructing type-safe queries against schema-less data presents a challenge when working with the IQueryable interface and LINQ in general as all queries must be of a given type which contains relevant type information for its properties. So for example, let’s say I have a table that has both customers and orders in it. Now if I wish to construct a query that filters on columns across both types of data I would need to create some dummy CustomerOrder super entity which contains the union of properties between the Customer and Order entities.

This is not ideal, and this is where the DynamicTableEntity comes in. The IQueryable implementation has provided a way to check for property access via the DynamicTableEntity Properties dictionary in order to provide for type-safe query construction. This allows the user to indicate to the client the property it wishes to filter against and its type. The sample below illustrates how to create a query of type DynamicTableEntity and construct a complex filter on different properties:

TableQuery<DynamicTableEntity> res = from ent in table.CreateQuery<DynamicTableEntity>()
where ent.Properties["customerid"].StringValue == "customer_1" ||
ent.Properties["orderdate"].DateTimeOffsetValue > startDate
select ent;

In the example above the IQueryable was smart enough to infer that the client is filtering on the “customerid” property as a string, and the “orderdate” as a DateTimeOffset and constructed the query accordingly.

Windows Phone Known Issue

The current CTP release contains a known issue where in some cases calling HttpWebRequest.Abort() may not result in the HttpWebRequest’s callback being called. As such, it is possible when cancelling an outstanding request the callback may be lost and the operation will not return. This issue will be addressed in a future release.

Summary

We are continuously making improvements to the developer experience for Windows Azure Storage and very much value your feedback. Please feel free to leave comments and questions below,

 

Joe Giardino

 

Resources

Getting the most out of Windows Azure Storage – TechEd NA ‘13

Nuget – 2.1 RTM

Nuget – 2.1 For Windows Phone and Windows RunTime (Preview)

Nuget – 2.1 Tables Extension library for Non-JavaScript Windows RunTime apps (Preview)

Github

2.1 Complete Changelog


AzCopy – Transfer data with re-startable mode and SAS Token

$
0
0

Recently, we released a new version of AzCopy with a set of new features.

You can download it from here or Microsoft download center (Clicks ‘Download’ button and choose “WindowsAzureStorageTools.MSI”)

New features added

  • /DestSAS and /SourceSAS: This option allows access to storage containers and blobs with a SAS (Shared Access Signature) token. SAS token, which is generated by the storage account owner, grants access to specific containers and blobs with specifc permissions and for a specified period of time. Permissions include LIST, READ, WRITE or DELETE.
    Currently, we support SAS token while you upload data to Azure Storage or download data from Azure Storage. For blob copy, we support SAS Token for the source location.
    Please refer to this blog post for more information on How to create a Shared Access Signature.
  • Enhancement of re-startable mode: In previous version, the re-startable mode is supported for interruption caused by network or other issues during file transfer but it only supports restarting the transfer from the beginning of the interrupted file. Imagine if the interrupted file is a large file (e.g. a VHD file) and most part of it has already been transferred to Azure Storage, it would be time-consuming to restart the transfer of entire file again. So to improve this scenario, we enhanced re-startable mode and made it restart the transfer from the point of interruption. For block blob, we chose 4Mb as the size of data chunk.

Example1: Upload all files from a local directory to a container using SAS token which offers permits for list and write

AzCopy C:\blobData https://xyzaccount.blob.core.windows.net/xyzcontainer /DestSAS:”?sr=c&si=mypolicy&sig=XXXXX” /s

/DestSAS here is for you to specify the SAS token to access storage container, it should be enclosed in quotes.

Example2: Download all blobs from a container to a local directory and then delete them from source (by using /Mov). The SAS token offers list, read and delete permission.

AzCopyhttps://xyzaccount.blob.core.windows.net/xyzcontainer C:\blobData /SourceSAS:”?sr=c&si=mypolicy&sig=XXXXX” /MOV /s

Example3: Upload all files from a local directory to a container in re-startable mode.

AzCopy C:\blobData https://myaccount.blob.core.windows.net/mycontainer /destkey:key /Z:restart.log /s

/Z is the parameter to turn on the re-startable mode. ‘restart.log’ is the customer-defined name of the journal file which will be saved in local.

For instance, if “C:\blobData” folder contains four files below and the size of file2.vhd is 1GB.

C:\blobData\file1.docx

C:\blobData\file2.vhd

C:\blobData\file3.txt

C:\blobData\file4.txt

Assume an interrupt occurs while copying the “file2.vhd”, and 90% of the file has been transferred to Azure already. Then upon restarting, we will only transfer the remaining 10% of “file2.vhd” and another two remaining files.

 

Jason Tang

Announcing Windows Azure Import/Export Service Preview

$
0
0

Windows Azure Storage team is excited to announce the preview of Windows Azure Import/Export service which provides an efficient solution for importing large amounts of on- premise data into Windows Azure Blobs or exporting your Windows Azure Blobs to you!

In this blog post, we will walk you through information regarding high level capabilities, how to enroll, getting started and when to use this service.

What is Windows Azure Import/Export service?

As described in our Getting Started Guide, Windows Azure Import/Export enables moving large amounts of data in and out of Windows Azure Blobs into your Windows Azure Storage account. You can ship TBs of encrypted data via hard disk drives through FedEx to our data centers where Microsoft’s high-speed internal network is used to transfer the data to or from your blob storage account.

The following requirements need to be followed:

  • The devices need to be up to 4TB 3.5 inch SATA II internal hard drives. Note, it easy to attach an off the shelf SATA II drive to most any machine using a USB connector allowing you to easily transfer data to/from your machine to the SATA II drive. See an example of connectors in the table below.
  • Drives shipped are required to be encrypted using BitLocker key.

To make drive preparation easy, we have provided a tool called WAImportExport.exe. More information on how to use the tool can be found here. Once the drives are prepared, you can login into Windows Azure Management Portal to

  1. Create Import/Export jobs
  2. Obtain the shipping address to the data center for where to ship the disks to
  3. Update the job with tracking numbers once the drives are shipped using FedEx to the location provided in step (2) above
  4. Manage the import/export jobs and monitor progress

The following MSDN article talks in depth about the steps involved and also answers some of the frequently asked questions. In addition to the management portal, you can also use the REST interface to create or manage your import/export jobs.

Management Interface

Users can choose one of the following methods to interface with Windows Azure Import/Export service:

  1. Windows Azure Management Portal Interface to manage jobs.
  2. REST Interface

Encryption

Mandatory requirement to encrypt data in the drive with a BitLocker key

Supported Device

3.5 inch SATA II hard drives

Note: You can easily transfer your data from your machine via USB to a SATA II drive by using one of the SATA to USB adaptors:

  • Anker 68UPSATAA-02BU
  • Anker 68UPSHHDS-BU
  • Startech SATADOCK22UE

Supported Maximum Disk Capacity

4 TB

Maximum Number of Jobs per Azure Subscription

20

Maximum number of drives per job

10

Supported file format

NTFS

Shipping

The following shipping options are available:

  • Packages(s) for an import job can be shipped either with FedEx Express or FedEx Ground.
  • Return shipping is free and provided via FedEx Ground.
Table 1 Quick requirement overview

When to use Windows Azure Import/Export Service?

If you have TB’s of data, Windows Azure Import/Export can move data in and out of Windows Azure blobs much faster than uploading and downloading data over the Internet.

The factors that come into play when choosing are to use the import/export service versus transferring the data over the internet are:

  1. How much data needs to be imported/exported?
  • What is network bandwidth to the region I want to copy data into or from?
  • What is cost of bandwidth?
  • Does the preview support my region?
  • Large data sets take very long time to upload or download over the internet. 10 TB could take years over T1 (1.5Mbps) or a month over T3 (44.7 Mbps). Customers can ship the drive using Windows Azure Import/Export service and significantly cut down the data upload or download time. The Windows Azure Import/Export service would take only a handful of days in addition to shipping time to transfer the data versus weeks/months/years for transferring the data over the internet.

    Customers can calculate the time it would take to transfer their data over the internet. If it is more than a handful of days and the import/export service is available for their region, then they should consider using Windows Azure Import/Export service. However, for customers already having good peering or for smaller amounts of data, Windows Azure Copy tool could instead be used to efficiently import/export data from/to on premise.

    Regions Supported

    The Import/Export service can only accept shipments that originate from U.S. locations during preview release, and can return packages only to U.S. addresses. The service supports importing data to and exporting data from storage accounts in the following regions:

    • East US
    • West US
    • North Central US
    • South Central US
    • North Europe
    • West Europe
    • East Asia
    • Southeast Asia

    Note, the import/export service currently runs in only a few of the U.S. regions for preview. This means you may need to ship your drives to a different region from where you storage account resides.  If the storage account is different from the region to which you have to ship the drives you will see “The selected datacenter region and storage account region are not the same” in the portal when creating the job.

    If your storage account resides in a European or Asian data center, you must ship your drive to one of the supported regions in the U.S., and the shipment must originate from within the U.S. The Import/Export service will then copy the data to or from your storage account in Europe or Asia.

    Pricing

    The Windows Azure Import/Export service will charge a drive handling fees of $80 per drive, and during the preview the drive handling fee will be at a discounted rate of $40 per drive. Regular storage transactions will apply for putting (import) and getting (export) the blobs from your storage account.

    For import jobs, there is no ingress charge for the copy operation.

    For export jobs, there will be data transfers fees for copying data between Windows Azure regions. For example, if your storage account resides in West Europe and you ship your drive to the East US data center, you will incur egress charges for moving the data from West Europe to East US in order to export it. In contrast, if your storage account resides in West US and you are told to ship your disks to West US then there are no egress charges for your export job.

    More details on pricing can be found here.

    How to enroll in the preview release?

    This preview release is available only to users with a token. If you are interested in using this service, please send an email to waimportexport@microsoft.com with the following information:

    1. Regions you are interested in importing or exporting data to or from respectively
    2. Amount of data you would like to import or export
    3. The storage account name you want to import or export data to.

    We will review the information before approving your request to allow you to use the service.

    Summary

    We are continuously making improvements to make this service better and very much value your feedback. Please feel free to leave comments and questions below or send an email to waimportexport@microsoft.com.

    Please make sure to review legal supplemental terms of use for Windows Azure Import/Export Service prior to using this service.

     

    Aung Oo, Jai Haridas and Brad Calder

     

    Resources


    Windows Azure Tables Breaking Changes (November 2013)

    $
    0
    0

    In preparation for adding JSON support to Windows Azure Tables, we are pushing an update that introduces a few breaking changes for Windows Azure Tables. We strive hard to preserve backward compatibility and these changes were introduced due to dependencies we have on WCF Data Services.

    There are some changes in the WCF Data Services libraries which should not break XML parsers and HTTP readers written to standards. However, custom parsers may have taken certain dependencies on our previous formatting of the responses and the following breaking changes might impact them. Our recommendation is to treat xml content to standard as valid parsers do and to not take strong dependency on line breaks, whitespaces, ordering of elements etc.

    Here are a list of changes:

    • AtomPub XML response as part of the new release does not have line breaks and whitespaces in between the XML elements; It is in a compact form which would help in reducing the amount of data transferred while staying equivalent to the XML generated prior to the service update. Standard XML parsers are not impacted by this but customers have reported breaks in custom logic. We recommend that clients that roll their own parsers are compatible with XML specifications which handle such changes seamlessly.
    • AtomPub XML response ordering of xml elements (title, id etc.) can change. Parsers should not take any dependency on ordering of elements.
    • A “type” placeholder has been added to the Content-Type HTTP header. For example, for a query response (not point query) the content type will have “type=feed” in addition to charset and application/atom+xml.
      • Previous version: Content-Type: application/atom+xml;charset=utf-8
      • New version:       Content-Type: application/atom+xml;type=feed;charset=utf-8
    • A new response header is returned: X-Content-Type-Options: nosniff to reduce MIME type security risks.

    Please reach out to us via forums or this blog if you have any concerns.

    Windows Azure Storage Team

    Windows Azure Storage Known Issues (November 2013)

    $
    0
    0

    In preparation for a major feature release such as CORS, JSON etc. we are pushing an update to production that introduced some bugs. We were notified recently about these bugs and plan to address in an upcoming hotfix. We will update this blog once the fixes are pushed out.

    Windows Azure Blobs, Tables and Queue Shared Access Signature (SAS)

    One of our customers reported an issue that SAS with version 2012-02-12 failed with HTTP Status Code 400 (Bad Request). Upon investigation, the issue is caused by the fact that our service had a change in how “//” gets interpreted when such sequence of characters appear before the container.

    Example: http://myaccount.blob.core.windows.net//container/blob?sv=2012-02-12&si=sasid&sx=xxxx

    Whenever it receives a SAS request with version 2012-02-12 or prior, the previous version of our service collapses the ‘//’ into ‘/’ and hence things worked fine. However, the new service update returns 400 (Bad Request) because it interprets the above Uri as if the container name is null which is invalid. We will be fixing our service to revert back to the old behavior and collapse ‘//’ into ‘/’ for 2012-02-12 version of SAS. In the meantime, we advise our customers to refrain from sending ‘//’ at the start of the container name portion of the URI.

    Windows Azure Tables

    Below are 2 known issues that we intend to hotfix either on the service side or as part of our client library as noted below:

    1. When clients define DataServiceContext.ResolveName and in case they provide a type name other than <Account Name>.<Table Name>, the CUD operations will return 400 (Bad Request). This is because ATOM “Category” element with “term” must either be omitted or be equal to the <Account Name>.<Table Name> as part of the new update. Previous version of the service used to ignore any type name being sent. We will be fixing this to again ignore what is being sent, but until then client applications would need to consider the below workaround. The ResolveName is not required for Azure Tables and client application can remove it to ensure that OData does not send “category” element.

    Here is an example of a code snippet that would generate a request that fails on the service side:

    CloudTableClient cloudTableClient = storageAccount.CreateCloudTableClient();
    TableServiceContext tableServiceContext = cloudTableClient.GetDataServiceContext();
    tableServiceContext.ResolveName = delegate(Type entityType)
    {
    // This would cause class name to be sent as the value for term in the category element and service would return Bad Request.
    return entityType.FullName;
    };


    SimpleEntity entity = new SimpleEntity("somePK", "someRK");
    tableServiceContext.AddObject("sometable", entity);
    tableServiceContext.SaveChanges();

    To mitigate the issue on the client side, please remove the highlighted “tableServiceContext.ResolveName” delegate.

    We would like to thank restaurant.com for bringing this to our attention and helping us in investigating this issue.

    2. The new .NET WCF Data Services library used on the server side as part of the service update rejects empty “cast” as part of the $filter query with 400 (Bad Request) whereas the older .NET framework library did not. This impacts Windows Azure Storage Client Library 2.1 since the IQueryable implementation (see this post for details) sends the cast operator in certain scenarios.

    We are working on fixing the client library to match .NET’s DataServiceContext behavior which does not send the cast operator and this should be available in the next couple of weeks. In the meantime we advise our customers to consider the following workaround.

    This client library issue can be avoided by ensuring you do not constrain the type of enumerable to the ITableEntity interface but to the exact type that needs to be instantiated.

    The current behavior is described by the following example:

    static IEnumerable<T> GetEntities<T>(CloudTable table)  where T : ITableEntity, new()
    {
    IQueryable<T> query = table.CreateQuery<T>().Where(x => x.PartitionKey == "mypk");
    return query.ToList();
    }

    The above code in 2.1 storage client library’s IQueryable interface will dispatch a query that looks like the below Uri that is rejected by the new service update with 400 (Bad Request).

    http://myaccount.table.core.windows.net/invalidfiltertable?$filter=cast%28%27%27%29%2FPartitionKey%20eq%20%27mypk%27&timeout=90 HTTP/1.1

    As a mitigation, consider replacing the above code with the below query. In this case the cast operator will not be sent.

        IQueryable<SimpleEntity> query = table.CreateQuery<SimpleEntity>().Where(x => x.PartitionKey == "mypk");
    return query.ToList();

    The Uri for the request looks like the following and is accepted by the service.

    http://myaccount.table.core.windows.net/validfiltertable?$filter=PartitionKey%20eq%20%27mypk%27&timeout=90

    We apologize for these issues and we are working on a hotfix to address them.

    Windows Azure Storage Team

    Windows Azure Storage Release - Introducing CORS, JSON, Minute Metrics, and More

    $
    0
    0

    We are excited to announce the availability of a new storage version 2013-08-15 that provides various new functionalities across Windows Azure Blobs, Tables and Queues. With this version, we are adding the following major features:

    1. CORS (Cross Origin Resource Sharing): Windows Azure Blobs, Tables and Queues now support CORS to enable users to access/manipulate resources from within the browser serving a web page in a different domain than the resource being accessed. CORS is an opt-in model which users can turn on using Set/Get Service Properties. Windows Azure Storage supports both CORS preflight OPTIONS request and actual CORS requests. Please see http://msdn.microsoft.com/en-us/library/windowsazure/dn535601.aspx for more information.

    2. JSON (JavaScript Object Notation): Windows Azure Tables now supports OData 3.0’s JSON format. The JSON format enables efficient wire transfer as it eliminates transferring predictable parts of the payload which are mandatory in AtomPub.

    JSON is supported in 3 forms:

    • No Metadata – This format is the most efficient transfer which is useful when the client is aware on how to interpret the data type for custom properties.
    • Minimal Metadata – This format contains data type information for custom properties of certain types that cannot be implicitly interpreted. This is useful for query when the client is unaware of the data types such as general tools or Azure Table browsers.
    • Full metadata – This format is useful for generic OData readers that requires type definition for even system properties and requires OData information like edit link, id, etc.

    More information about JSON for Windows Azure Tables can be found at http://msdn.microsoft.com/en-us/library/windowsazure/dn535600.aspx

    3. Minute Metrics in Windows Azure Storage Analytics: Up till now, Windows Azure Storage supported hourly aggregates of metrics, which is very useful in monitoring service availability, errors, ingress, egress, API usage, access patterns and to improve client applications and we had blogged about it here. In this new 2013-08-15 version, we are introducing Minute Metrics where data is aggregated at a minute level and typically available within five minutes. Minute level aggregates allow users to monitor client applications in a more real time manner as compared to hourly aggregates and allows users to recognize trends like spikes in request/second. With the introduction of minute level metrics, we now have the following tables in your storage account where Hour and Minute Metrics are stored:

    • $MetricsHourPrimaryTransactionsBlob
    • $MetricsHourPrimaryTransactionsTable
    • $MetricsHourPrimaryTransactionsQueue
    • $MetricsMinutePrimaryTransactionsBlob
    • $MetricsMinutePrimaryTransactionsTable
    • $MetricsMinutePrimaryTransactionsQueue

    Please note the change in table names for hourly aggregated metrics. Though the names have changed, your old data will still be available via the new table name too.

    To configure minute metrics, please use Set Service Properties REST API for Windows Azure Blob, Table and Queue with 2013-08-15 version. The Windows Azure Portal at this time does not allow configuring minute metrics but it will be available in future.

    In addition to the major features listed above, we have the following below additions to our service with this release. More detailed list of changes in 2013-08-15 version can be found at http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx:

    • Copy blob now allows Shared Access Signature (SAS) to be used for the destination blob if the copy is within the same storage account.
    • Windows Azure Blob service now supports Content-Disposition and ability to control response headers like cache-control, content-disposition etc. via query parameters included via SAS. Content-Disposition can also be set statically through Set Blob Properties.
    • Windows Azure Blob service now supports multiple HTTP conditional headers for Get Blob and Get Blob Properties; this feature is particularly useful for access from web-browsers which are going through proxies or CDN servers which may add additional headers.
    • Windows Azure Blob Service now allows Delete Blob operation on uncommitted blob (a blob that is created using Put Block operation but not committed yet using Put Block List API). Previously, the blob needed to be committed before deleting it.
    • List Containers, List Blobs and List Queues starting with 2013-08-15 version will no longer return the URL address field for the resource. This was done to reduce fields that can be reconstructed on client side.
    • Lease Blob and Lease Container starting with 2013-08-15 version will return ETag and Last Modified Time response headers which can be used by the lease holder to easily check if the resource has changed since it was last tracked (e.g., if the blob or its metadata was updated). The ETag value does not change for blob lease operations. Starting with 2013-08-15 version, the container lease operation will not change the ETag too.

    We are also releasing an updated Windows Azure Storage Client Library here that supports the features listed above and can be used to exercise the new features. In the next couple of months, we will also release an update to the Windows Azure Storage Emulator for Windows Azure SDK 2.2. This update will support “2013-08-15” version and the new features.

    In addition to the above changes, please also read the following two blog posts that discuss known issues and breaking changes for this release:

    Please let us know if you have any further questions either via forum or comments on this post.

    Jai Haridas and Brad Calder

    Windows Azure Tables: Introducing JSON

    $
    0
    0

    Windows Azure Storage team is excited to announce the release of JSON support for Windows Azure Tables as part of version “2013-08-15”. JSON is an alternate OData payload format to AtomPub, which significantly reduces the size of the payload and results in lower latency. To reduce the payload size even further we are providing a way to turn off the payload echo during inserts. Both of these new features are now the default behavior in the newly released Windows Azure Storage Client 3.0 Library.

    What is JSON

    JSON, JavaScript Object Notation, is a lightweight text format for serializing structured data. Similar to AtomPub, OData extends JSON format by defining general conventions for entities and properties. Unlike AtomPub, parts of the response payload in OData JSON is omitted to reduce the payload size. To reconstitute this data on the receiving end, expressions are used to compute missing links, type and control data. OData supports multiple formats for JSON:

    • nometadata– As the name suggests, this format excludes metadata that is used to interpret the data types. This format is the most efficient on transfers which is useful when client is aware on how to interpret the data types for custom properties.
    • minimalmetadata– This format contains data type information for custom properties of certain types that cannot be implicitly interpreted. This is useful for query when the client is unaware of the data types such as general tools or Azure Table browsers. However, it still excludes type information for system properties and certain additional information such as edit link, id etc. which can be reconstructed by the client. This is the default level utilized by the Windows Azure Storage Client 3.0 Library
    • fullmetadata– This format is useful for generic OData readers that requires type definition for even system properties and requires OData information like edit link, id etc. In most cases for Azure Tables Service, fullmetadata is unnecessary.

    For more information regarding the details of JSON payload format and REST API details see Payload Format for Table Service Operation.

    To take full advantage of JSON and the additional performance improvements, consider upgrading to Windows Azure Storage Client 3.0 which uses JSON and turns off echo on Insert by default. Older versions of the library do not support JSON.

    AtomPub vs JSON format

    As mentioned earlier using JSON results in significant reduction in payload size when compared to AtomPub. As a result using JSON is the recommended format and the newly released Windows Azure Storage Client 3.0 Library uses JSON with minimal metadata by default. To get a feel on how JSON request/response looks like as compared to AtomPub, please refer to the Payload Format for Table Service Operations MSDN documentation where payload examples are provided for both AtomPub and JSON format. We have also provided a sample of the JSON payload at the end of this blog.

    To compare JSON and AtomPub, we ran the example provided by end of the blog in both JSON and AtomPub and compared the payload for both minimal and no metadata. The example generates 6 requests which includes checking for a table existence, creating the table, inserting 3 entities and querying all entities in that table. The following table summarizes the amount of data transferred back and forth in bytes.

    Format

    Request Header Size

    Request Body Size

    Response Header Size

    Response Body Size

    % Savings in HTTP Body Size only vs. AtomPub

    % Savings in total HTTP transfer vs. AtomPub

    AtomPub

    3,611

    2,861

    3,211

    8,535

    N/A

    N/A

    JSON MinimalMetadata

    3,462

    771

    3,360

    2,529

    71%

    44%

    JSON NoMetadata

    3,432

    771

    3,330

    1,805

    77%

    49%

    As you can see, and for this example, both minimal and no metadata of the JSON format provide noticeable savings with over 75% reduction in the case of no metadata when compared to AtomPub. This would significantly increase the responsiveness of applications since they spent less time generating and parsing requests in addition to reduced network transfer time.

    Other benefits of JSON can be summarized as follow:

    1. Other than the performance benefits described above, JSON would reduce your cost as you will be transferring less data.
    2. Combining JSON, CORS and SAS features will enable you to build scalable applications where you can access and manipulate your Windows Azure Table data from the web browser directly through JavaScript code.
    3. Another benefit of JSON over AtomPub is the fact that some applications may already be using JSON format as the internal object model in which case using JSON with Windows Azure Tables will be a natural transition as it avoids transformations.

    Turning off Insert Entity Response Echo Content

    In this release, users can further reduce bandwidth usage by turning off the echo of the payload in the response during entity insertion. Through the ODATA wire protocol, echo content can be turned off by specifying the following HTTP header name and value “Prefer: return-no-content”. More information can be found in the Setting the Prefer Header to Manage Response Echo on Insert Operations MSDN documentation. On the Client Library front, no echo is the default behavior of the Windows Azure Storage Client Library 3.0. Note that content echo can still be turned ON for any legacy reasons by setting echoContent to true on the TableOperation.Insert method (example is provided in a subsequent section).

    The comparison data provided in the above table was with content echo enabled. However, on a re-run with echo disabled, an additional 30% saving can be seen over JSON NoMetadata payload size. This is very beneficial if the application makes a lot of entity insertions where apart from network transfer reduction application will see great reduction in IO and CPU usage.

    Using JSON with Windows Azure Storage Client Library 3.0

    The Windows Azure Storage Client Library 3.0 supports JSON as part of the Table Service layer and the WCF Data services layer. We highly recommend our customers to use the Table Service layer as it is optimized for Azure Tables, has better performance as described in here and supports all flavors of JSON format.

    Table Service Layer

    The Table Service Layer supports all flavors of JSON formats in addition to AtomPub. The format can be set on the CloudTableClient object as seen below:

    CloudTableClient tableClient = new CloudTableClient(baseUri, cred)
    {
    // Values supported can be AtomPub, Json, JsonFullMetadata or JsonNoMetadata
    PayloadFormat = TablePayloadFormat.JsonNoMetadata
    };

    Note that the default value is JSON i.e. JSON Minimal Metadata. You can also decide which format to use per request by passing in a TableRequestOptions with your choice of PayLoadFormat to the CloudTable.Execute method.

    In order to control the no-content echo for Insert Entity, you can do so by passing in the appropriate value to the TableOperation.Insert method; Note that by default, the client library will request that no-content is echoed back.

    Example:

    // Note that the default value for echoContent is already false
    table.Execute(TableOperation.Insert(customer, echoContent: false));

    As you can see, the echoContent is set at the individual entity operation level and therefore this would also be applicable to batch operations when using table.ExecuteBatch.

    JSON NoMetadata client side type resolution

    When using JSON No metadata via the Table Service Layer the client library will “infer” the property types by inspecting the type information on the POCO entity type provided by the client as shown in the JSON example at the end of the blog. (Note, by default the client will inspect an entity type once and cache the resulting information. This cache can be disabled by setting TableEntity.DisablePropertyResolverCache = true;) Additionally, in some scenarios clients may wish to provide the property type information at runtime such as when querying with the DynamicTableEntity or doing complex queries that may return heterogeneous entities. To support this scenario the client can provide a PropertyResolver Func on the TableRequestOptions which allows clients to return an EdmType enumeration for each property based on the data received from the service. The sample below illustrates a PropertyResolver that would allow a user to query the customer data in the example below into DynamicTableEntities.

    TableRequestOptions options = new TableRequestOptions()
    {
    PropertyResolver = (partitionKey, rowKey, propName, propValue) =>
    {
    if(propName == "CustomerSince")
    {
    return EdmType.DateTime;
    }
    elseif(propName == "Rating")
    {
    return EdmType.Int32;
    }
    else
    {
    return EdmType.String;
    }
    };
    };

    TableQuery<DynamicTableEntity> query = (from ent in complexEntityTable.CreateQuery<DynamicTableEntity>()
    select ent).WithOptions(options);

    WCF Data Services Layer

    As mentioned before, we recommend using the Table Service layer to access Windows Azure Tables. However, if that is not possible for legacy reasons, you might find this section useful.

    The WCF Data Services Layer payload format as part the Windows Azure Storage Client Library 3.0 defaults to JSON minimal metadata which is the most concise JSON format supported by the .NET WCF Data Services (there is no support for nometadata). In fact, if your application uses projections (i.e. $select) WCF Data Services will revert to using JSON fullmetadata, which is the most verbose JSON format.

    If you wish to use AtomPub, you can set such payload format by calling the following method:

    // tableDataContext is of TableServiceContext type that inherits from the WCF DataServiceContext class
    tableDataContext.Format.UseAtom();

    In case you decide you want to switch back to JSON (i.e. minimalmetdata and fullmetdata) you can do so by calling the following method:
    tableDataContext.Format.UseJson(new TableStorageModel(tableClient.Credentials.AccountName));

    In order to turn off echoing back content on all Insert Operations, you can set the following property:

    // Default value is None which would result in echoing back the content
    tableDataContext.AddAndUpdateResponsePreference = DataServiceResponsePreference.NoContent;

    Note that both the JSON and no-content echo settings apply to all operations.

    JSON example using Windows Azure Storage Client 3.0

    In this example, we create a simple address book by storing a few customer’s information in a table, and then we query the content of the Customers table using the table service layer of the Storage Client library. We will also use JSON nometadata and ensure that no-content echo is enabled.

    Here is some excerpts of code for the example:

    conststring customersTableName = "Customers";

    conststring connectionString = "DefaultEndpointsProtocol=https;AccountName=[ACCOUNT NAME];AccountKey=[ACCOUNT KEY]";
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);

    CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

    // Values supported can be AtomPub, Json, JsonFullMetadata or JsonNoMetadata with Json being the default value
    tableClient.PayloadFormat = TablePayloadFormat.JsonNoMetadata;

    // Create the Customers table
    CloudTable table = tableClient.GetTableReference(customersTableName);
    table.CreateIfNotExists();

    // Insert a couple of customers into the Customers table
    foreach (CustomerEntity customer in GetCustomersToInsert())
    {
    // Note that the default value for echoContent is already false
    table.Execute(TableOperation.Insert(customer, echoContent: false));
    }


    // Query all customers with first letter of their FirstName in between [I-X] and
    // with rating bigger than or equal to 2.
    // The response have a payload format of JSON no metadata and the
    // client library will map the properties returned back to the CustomerEntity object
    IQueryable<CustomerEntity> query = from customer in table.CreateQuery<CustomerEntity>()
    wherestring.Compare(customer.PartitionKey,"I") >=0 &&
    string.Compare(customer.PartitionKey,"X")<=0 &&
    customer.Rating >= 2
    select customer;

    CustomerEntity[] customers = query.ToArray();

    Here is the CustomerEntity class definition and the GetCustomersToInsert() method that initializes 3 CustomerEntity objects.

    publicclass CustomerEntity : TableEntity
    {
    public CustomerEntity() { }

    public CustomerEntity(string firstName, string lastName)
    {
    this.PartitionKey = firstName;
    this.RowKey = lastName;
    }

    [IgnoreProperty]
    publicstring FirstName
    {
    get { returnthis.PartitionKey; }
    }

    [IgnoreProperty]
    publicstring LastName
    {
    get { returnthis.RowKey; }
    }

    publicstring Address { get; set; }
    publicstring Email { get; set; }
    publicstring PhoneNumber { get; set; }
    public DateTime? CustomerSince { get; set; }
    publicint? Rating { get; set; }
    }

    privatestatic IEnumerable<CustomerEntity> GetCustomersToInsert()
    {
    returnnew[]
    {
    new CustomerEntity("Walter", "Harp")
    {
    Address = "1345 Fictitious St, St Buffalo, NY 98052",
    CustomerSince = DateTime.Parse("01/05/2010"),
    Email = "Walter@contoso.com",
    PhoneNumber = "425-555-0101",
    Rating = 4
    },
    new CustomerEntity("Jonathan", "Foster")
    {
    Address = "1234 SomeStreet St, Bellevue, WA 75001",
    CustomerSince = DateTime.Parse("01/05/2005"),
    Email = "Jonathan@fourthcoffee.com",
    PhoneNumber = "425-555-0101",
    Rating = 3
    },
    new CustomerEntity("Lisa", "Miller")
    {
    Address = "4567 NiceStreet St, Seattle, WA 54332",
    CustomerSince = DateTime.Parse("01/05/2003"),
    Email = "Lisa@northwindtraders.com",
    PhoneNumber = "425-555-0101",
    Rating = 2
    }
    };
    }

    JSON Payload Example

    Here are the 3 different response payloads corresponding to AtomPub, JSON minimalmetadata and JSON nometadata for the query request generated as part of the previous example. Note that the payload have been formatted for readability purposes. The actual wire payload does not have any indentation or newline breaks.

    AtomPub

    <?xmlversion="1.0"encoding="utf-8"?>
    <feedxml:base="http://someaccount.table.core.windows.net/"xmlns="http://www.w3.org/2005/Atom"xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"xmlns:georss="http://www.georss.org/georss"xmlns:gml="http://www.opengis.net/gml">
    <id>http://someaccount.table.core.windows.net/Customers</id>
    <titletype="text">Customers</title>
    <updated>2013-12-03T06:37:21Z</updated>
    <linkrel="self"title="Customers"href="Customers"/>
    <entrym:etag="W/&quot;datetime'2013-12-03T06%3A37%3A20.9709094Z'&quot;">
    <id>http://someaccount.table.core.windows.net/Customers(PartitionKey='Jonathan',RowKey='Foster')</id>
    <categoryterm="someaccount.Customers"scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
    <linkrel="edit"title="Customers"href="Customers(PartitionKey='Jonathan',RowKey='Foster')"/>
    <title/>
    <updated>2013-12-03T06:37:21Z</updated>
    <author>
    <name/>
    </author>
    <contenttype="application/xml">
    <m:properties>
    <d:PartitionKey>Jonathan</d:PartitionKey>
    <d:RowKey>Foster</d:RowKey>
    <d:Timestampm:type="Edm.DateTime">2013-12-03T06:37:20.9709094Z</d:Timestamp>
    <d:Address>1234 SomeStreet St, Bellevue, WA 75001</d:Address>
    <d:Email>Jonathan@fourthcoffee.com</d:Email>
    <d:PhoneNumber>425-555-0101</d:PhoneNumber>
    <d:CustomerSincem:type="Edm.DateTime">2005-01-05T00:00:00Z</d:CustomerSince>
    <d:Ratingm:type="Edm.Int32">3</d:Rating>
    </m:properties>
    </content>
    </entry>
    <entrym:etag="W/&quot;datetime'2013-12-03T06%3A37%3A21.1259249Z'&quot;">
    <id>http://someaccount.table.core.windows.net/Customers(PartitionKey='Lisa',RowKey='Miller')</id>
    <categoryterm="someaccount.Customers"scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
    <linkrel="edit"title="Customers"href="Customers(PartitionKey='Lisa',RowKey='Miller')"/>
    <title/>
    <updated>2013-12-03T06:37:21Z</updated>
    <author>
    <name/>
    </author>
    <contenttype="application/xml">
    <m:properties>
    <d:PartitionKey>Lisa</d:PartitionKey>
    <d:RowKey>Miller</d:RowKey>
    <d:Timestampm:type="Edm.DateTime">2013-12-03T06:37:21.1259249Z</d:Timestamp>
    <d:Address>4567 NiceStreet St, Seattle, WA 54332</d:Address>
    <d:Email>Lisa@northwindtraders.com</d:Email>
    <d:PhoneNumber>425-555-0101</d:PhoneNumber>
    <d:CustomerSincem:type="Edm.DateTime">2003-01-05T00:00:00Z</d:CustomerSince>
    <d:Ratingm:type="Edm.Int32">2</d:Rating>
    </m:properties>
    </content>
    </entry>
    <entrym:etag="W/&quot;datetime'2013-12-03T06%3A37%3A20.7628886Z'&quot;">
    <id>http://someaccount.table.core.windows.net/Customers(PartitionKey='Walter',RowKey='Harp')</id>
    <categoryterm="someaccount.Customers"scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
    <linkrel="edit"title="Customers"href="Customers(PartitionKey='Walter',RowKey='Harp')"/>
    <title/>
    <updated>2013-12-03T06:37:21Z</updated>
    <author>
    <name/>
    </author>
    <contenttype="application/xml">
    <m:properties>
    <d:PartitionKey>Walter</d:PartitionKey>
    <d:RowKey>Harp</d:RowKey>
    <d:Timestampm:type="Edm.DateTime">2013-12-03T06:37:20.7628886Z</d:Timestamp>
    <d:Address>1345 Fictitious St, St Buffalo, NY 98052</d:Address>
    <d:Email>Walter@contoso.com</d:Email>
    <d:PhoneNumber>425-555-0101</d:PhoneNumber>
    <d:CustomerSincem:type="Edm.DateTime">2010-01-05T00:00:00Z</d:CustomerSince>
    <d:Ratingm:type="Edm.Int32">4</d:Rating>
    </m:properties>
    </content>
    </entry>
    </feed>

    JSON minimalmetadata

    {
    "odata.metadata":"http://someaccount.table.core.windows.net/$metadata#Customers",
    "value":[
    {
    "PartitionKey":"Jonathan",
    "RowKey":"Foster",
    "Timestamp":"2013-12-03T06:39:56.6443475Z",
    "Address":"1234 SomeStreet St, Bellevue, WA 75001",
    "Email":"Jonathan@fourthcoffee.com",
    "PhoneNumber":"425-555-0101",
    "CustomerSince@odata.type":"Edm.DateTime",
    "CustomerSince":"2005-01-05T00:00:00Z",
    "Rating":3
    },
    {
    "PartitionKey":"Lisa",
    "RowKey":"Miller",
    "Timestamp":"2013-12-03T06:39:56.7943625Z",
    "Address":"4567 NiceStreet St, Seattle, WA 54332",
    "Email":"Lisa@northwindtraders.com",
    "PhoneNumber":"425-555-0101",
    "CustomerSince@odata.type":"Edm.DateTime",
    "CustomerSince":"2003-01-05T00:00:00Z",
    "Rating":2
    },
    {
    "PartitionKey":"Walter",
    "RowKey":"Harp",
    "Timestamp":"2013-12-03T06:39:56.4743305Z",
    "Address":"1345 Fictitious St, St Buffalo, NY 98052",
    "Email":"Walter@contoso.com",
    "PhoneNumber":"425-555-0101",
    "CustomerSince@odata.type":"Edm.DateTime",
    "CustomerSince":"2010-01-05T00:00:00Z",
    "Rating":4
    }
    ]
    }

    JSON nometadata

    "value":[
    {
    "PartitionKey":"Jonathan",
    "RowKey":"Foster",
    "Timestamp":"2013-12-03T06:45:00.7254269Z",
    "Address":"1234 SomeStreet St, Bellevue, WA 75001",
    "Email":"Jonathan@fourthcoffee.com",
    "PhoneNumber":"425-555-0101",
    "CustomerSince":"2005-01-05T00:00:00Z",
    "Rating":3
    },
    {
    "PartitionKey":"Lisa",
    "RowKey":"Miller",
    "Timestamp":"2013-12-03T06:45:00.8834427Z",
    "Address":"4567 NiceStreet St, Seattle, WA 54332",
    "Email":"Lisa@northwindtraders.com",
    "PhoneNumber":"425-555-0101",
    "CustomerSince":"2003-01-05T00:00:00Z",
    "Rating":2
    },
    {
    "PartitionKey":"Walter",
    "RowKey":"Harp",
    "Timestamp":"2013-12-03T06:45:00.5384082Z",
    "Address":"1345 Fictitious St, St Buffalo, NY 98052",
    "Email":"Walter@contoso.com",
    "PhoneNumber":"425-555-0101",
    "CustomerSince":"2010-01-05T00:00:00Z",
    "Rating":4
    }
    ]
    }

    Resources

    Windows Azure Storage Client Library (3.0) Binary - http://www.nuget.org/packages/WindowsAzure.Storage
    Windows Azure Storage Client Library (3.0) Source - https://github.com/WindowsAzure/azure-storage-net

    Please let us know if you have any further questions either via forum or comments on this post,
    Sam Merat, Jean Ghanem, Joe Giardino, and Jai Haridas

    Windows Azure Storage Redundancy Options and Read Access Geo Redundant Storage

    $
    0
    0

    We are excited to announce the ability to allow customers to achieve higher read availability for their data. This preview feature called “Read Access - Geo Redundant Storage (RA-GRS)” allows you to read an eventually consistent copy of your geo-replicated data from the storage account’s secondary region in case of any unavailability to the storage account’s primary region.

    Before we dive into the details of this new ability, we will briefly summarize the available redundancy options in Windows Azure Storage. We will then cover in detail each of the options available including the new option of Read Access – Geo Redundant Storage (RA-GRS) and how one can sign up for this limited preview. We will also cover the storage client library changes that one can use to achieve higher read availability using RA-GRS.

    Redundancy Options in Windows Azure Storage

    Windows Azure Storage provides following options for redundancy for Blobs, Tables and Queues:

    1. Locally Redundant Storage (LRS): All data in the storage account is made durable by replicating transactions synchronously to three different storage nodes within the same region. The below section will cover more details on LRS including on how to select LRS.

    2. Geo Redundant Storage (GRS): This is the default option for redundancy when a storage account is created. Like LRS, transactions are replicated synchronously to three storage nodes within the primary region chosen for creating the storage account. However, the transaction is also queued for asynchronous replication to another secondary region (hundreds of miles away from the primary) where data is again made durable by replicating it to three more storage nodes there. The below section will cover in depth the asynchronous replication process, information on region pairings and the failover process.

    3. Read Access - Geo Redundant Storage (RA-GRS): For a GRS storage account, we now have introduced in limited preview the ability to turn on read only access to a storage account’s data in the secondary region. Since replication to the secondary region is done asynchronously, this provides an eventual consistent version of the data to read from. The below section will cover more details on RA-GRS, how to enable this in preview mode and details on storage analytics.

    Locally Redundant Storage (LRS)

    What is LRS?

    Locally redundant storage stores multiple copies of your data synchronously within a region for durability. To ensure durability, we replicate the transaction synchronously across three different storage nodes across different fault domains and upgrade domains. A fault domain (FD) is a group of nodes that represent a physical unit of failure and can be considered as nodes belonging to the same physical rack. An upgrade domain (UD) is a group of nodes that will be upgraded together during the process of service upgrade (rollout). The three replicas are spread across UDs and FDs to ensure that data is available even if hardware failure impacts a single rack and when nodes are upgraded during a rollout.

    In addition to returning success only when all three replicas are persisted, we store CRCs of the data to ensure correctness and periodically read and validate the CRCs to detect bit rot (random errors occurring on the disk media over a period of time). In addition, Windows Azure Storage erasure codes data which further improves durability. More details on how data is made durable can be found in our SOSP paper.

    Scenarios for LRS

    LRS costs less than GRS. Based on current price structure, the reduction in price compared to GRS is around 23% to 34% depending on how much data is stored. Here are some reasons why one may choose LRS over GRS.

    1. Applications that store data which can be easily reconstructed may choose to not geo replicate data not only for cost but also because they get higher throughput for the storage account. LRS accounts get 10 Gibps ingress and 15 Gibps egress as compared to 5 Gibps ingress and 10 Gibps egress for a GRS account.

    2. Some customers want their data only replicated within a single region due to application’s data governance requirements.

    3. Some applications may have built their own geo replication strategy and not require geo replication to be managed by Windows Azure Storage service.

    How to configure LRS

    GRS is the default redundancy option when creating a storage account and is included in current pricing for Azure Storage.  To configure LRS using the Windows Azure Portal, you would need to choose “Locally Redundant” for replication in the “configure” page for the selected storage account and only then the discounted pricing would apply. When you select LRS, data will be deleted from the secondary location. It is important to note that after you select LRS, changing back to GRS (i.e. Geo Redundant) again would incur an additional charge for egress involved in copying existing data from primary location to the secondary location. Once the initial data is copied there is no further additional egress charge for geo replicating the data from the primary to secondary location for GRS. The details for bandwidth charges can be found here.

    Geo Redundant Storage (GRS):

    What is GRS?

    A geo redundant storage account has its blob, table and queue data replicated to a secondary region hundreds of miles away from the primary region. So even in the case of a complete regional outage or a regional disaster in which the primary location is not recoverable, your data is still durable. As explained above in LRS, updates to your storage account are synchronously replicated to three storage nodes in the primary region and success is returned only once three copies are persisted there. For GRS, after the updates are committed to the primary region they are asynchronously replicated to the secondary region. On the secondary, the updates are again committed to a three replica set before returning success back to the primary.

    Our goal is to keep the data completely durable at both the primary location and secondary location. This means we keep three replicas in each of the locations (i.e. total of 6 copies) to ensure that each location can recover by itself from common failures (e.g., disk, node, rack, TOR failing), without having to communicate with the other location. The two locations only have to talk to each other to geo-replicate the recent updates to storage accounts. This is important, because it means that if we had to failover a storage account from the primary to the secondary, then all the data that had been committed to the secondary location via geo-replication will already be durable there. Since transactions are replicated asynchronously, it is important to note that opting for GRS does not impact latency of transactions made to the primary location. However, since there is a delay in the geo replication, in the event of a regional disaster it is possible that delta changes that have not yet been replicated to the secondary region may be lost if the data cannot be recovered from the primary region.

    What are the secondary locations?

    When a storage account is created, the customer chooses the primary location for their storage account. However, the secondary location for the storage account is fixed and customers do not have the ability to change this. The following table shows the current primary and secondary location pairings:

    Primary

    Secondary

    North Central US

    South Central US

    South Central US

    North Central US

    East US

    West US

    West US

    East US

    North Europe

    West Europe

    West Europe

    North Europe

    South East Asia

    East Asia

    East Asia

    South East Asia

    East China

    North China

    North China

    East China

    What transactional consistency can be expected with geo replication?

    To understand transactional consistency with geo replication, it is important to understand that Windows Azure Storage uses a range based partitioning system in which every object has a property called Partition Key which is the unit of scale. All objects with the same value for Partition Key will be served by the same Partition Server (see SOSP paper for details). The Partition Key for objects are:

    • Blob: Account name, Container name and Blob name
    • Table Entity: Account name, Table name and app defined PartitionKey
    • Queue Message: Account name and Queue name

    More details on the scalability targets of a storage account and these objects can be found here.

    Geo-replication ensures that transactions to objects with same Partition Key value are committed in the same order at the secondary location as at the primary location. That said, it is also important to note that there are no geo-replication ordering guarantees across objects with different Partition Key values. This means that different partitions can be geo-replicating at different speeds. Once all the updates have been geo-replicated and committed at the secondary location, the secondary location will have the exact same state as the primary location.

    For example, consider the case where we have two blobs, foo and bar, in our storage account (for blobs, the complete blob name is the Partition Key).  Now say we execute transactions A and B on blob foo, and then execute transactions X and Y against blob bar.  It is guaranteed that transaction A will be geo-replicated before transaction B, and that transaction X will be geo-replicated before transaction Y.  However, no other guarantees are made about the respective timings of geo-replication between the transactions against foo and the transactions against bar. If a disaster happened and caused recent transactions to not get geo-replicated, that would make it possible for, transactions A and X to be geo-replicated, while losing transactions B and Y. Or transactions A and B could have been geo-replicated, but neither X nor Y had made it to the secondary. The same holds true for operations involving Tables and Queues, where for Tables the partitions are determined by the application defined Partition Key of the entity instead of the blob name, and for Queues the Queue name is the Partition Key.

    Because of this, to best leverage geo-replication, one best practice is to avoid cross-Partition Key relationships whenever possible. This means you should try to restrict relationships for Tables to entities that have the same Partition Key value. All transactions within a single Partition Key value are committed on the secondary in the same order as the primary. However, for high scale scenarios, it is not advisable to have all entities have same Partition Key value since the scalability target for a single partition is lot lower than that of a single storage account.

    The only multiple object transaction supported by Windows Azure Storage is Entity Group Transactions for Windows Azure Tables, which allow clients to commit a batch of entities, all with-in the same Partition Key, together as a single atomic transaction. Geo-replication also treats this batch as an atomic operation. Therefore, the whole batch transaction is committed atomically on the secondary.

    What is the Geo-Failover Process?

    Geo failover is the process of configuring a storage account’s secondary location as the new primary location. At present, failover is at stamp level and we do not have the ability to failover a single storage account. We plan to provide an API to allow customers to trigger a failover at an account level, but this is not available yet. Given that failover is at the stamp level, in the event of a major disaster that affects the primary location, we will first try to restore the data in the primary location. Restoring of primary is given precedence since failing over to secondary may result in recent delta changes being lost because of the nature of replication being asynchronous, and not all applications may prefer failing over if the availability to the primary can be restored.

    If we needed to perform a failover, affected customers will be notified via their subscription contact information. As part of the failover, the customer’s “account.<service>.core.windows.net” DNS entry would be updated to point from the primary location to the secondary location. Once this DNS change is propagated, the existing Blob, Table, and Queue URIs will work. This means that you do not need to change your application’s URIs – all existing URIs will work the same before and after a geo-failover. For example, if the primary location for a storage account “myaccount” was North Central US, then the DNS entry for myaccount.<service>.core.windows.net would direct traffic to North Central US. If a geo-failover became necessary, the DNS entry for myaccount.<service>.core.windows.net would be updated so that it would then direct all traffic for the storage account to South Central US. After the failover occurs, the location that is accepting traffic is considered the new primary location for the storage account. Once the new primary is up and accepting traffic, we will bootstrap to a new secondary to get the data geo redundant again.

    What is the RPO and RTO with GRS?

    Recover Point Objective (RPO): In GRS and RA-GRS the storage service asynchronously geo-replicates the data from the primary to the secondary location. If there was a major regional disaster and a failover had to be performed, then recent delta changes that had not been geo-replicated could be lost. The number of minutes of potential data lost is referred to as RPO (i.e., the point in time to which data can be recovered to). We typically have a RPO less than 15 minutes, though there is currently no SLA on how long geo-replication takes.

    Recovery Time Objective (RTO): The other metric to know about is RTO. This is a measure of how long it takes us to do the failover, and get the storage account back online if we had to do a failover. The time to do the failover includes the following:

    • The time it takes us to investigate and determine whether we can recover the data at the primary location or if we should do the failover
    • Failover the account by changing the DNS entries

    We take the responsibility of preserving your data very seriously, so if there is any chance of recovering the data, we will hold off on doing the failover and focus on recovering the data in the primary location. In the future, we plan to provide an API to allow customers to trigger a failover at an account level, which would then allow customers to control the RTO themselves, but this is not available yet.

    Scenarios for GRS

    GRS is chosen by customers requiring the highest level of durability for Business Continuity Planning (BCP) by keeping their data durable in two regions hundreds of miles apart from each other in case of a regional disaster.

    Introducing Read-only Access to Geo Redundant Storage (RA-GRS):

    RA-GRS allows you to have higher read availability for your storage account by providing “read only” access to the data replicated to the secondary location. Once you enable this feature, the secondary location may be used to achieve higher availability in the event the data is not available in the primary region. This is an “opt-in” feature which requires the storage account be geo-replicated.

    How to enable RA-GRS?

    During limited preview, customers will need to sign up for the preview on the Windows Azure Preview page. This puts your subscription ID in a queue to be approved. Once approved, you will get a mail notifying you about the approval. Once approved, you can enable RA-GRS for any of the accounts associated with that subscription. You can enable Read-Only Access to your secondary region via Service Management REST APIs for Create and Update Storage Account or via the Windows Azure Portal. When using the APIs to enable RA-GRS, you would need to ensure that the property GeoReplicationEnabled and SecondaryReadEnabled are set to true in the request payload. Via the portal, you can configure the storage account’s replication property to “Read Access Geo-Redundant” Storage.   Note, during this preview RA-GRS is not yet available in North China and East China, but it is available everywhere else.  We will update the blog post once RA-GRS becomes available in China.

    How does RA-GRS work?

    When you enable read-only access to your secondary region, you get a secondary endpoint in addition to the primary endpoint for accessing your storage account. This secondary endpoint is similar to the primary endpoint except for the suffix “-secondary”. For example: if the primary endpoint is myaccount.<service>.core.windows.net, the secondary endpoint is myaccount-secondary.<service>.core.windows.net. The secret keys used to access the primary endpoint are the same ones used to access the secondary endpoint. Using the same keys enables the same Shared Access Signature to work for both the primary and secondary endpoints. This means that the canonicalization of the resource used for signing to access both the primary and secondary needs to remain the same. Therefore, the account name used in the canonicalized resource should exclude the “-secondary” suffix for the canonicalization (note, existing storage explorers that use the DNS to extract the account name may not exclude it and hence may not be able to read from the secondary endpoint). The secondary endpoint obtained can then be used to dispatch read requests when the primary is not available to achieve higher availability. Please note that any put/delete requests to this secondary endpoint will automatically be rejected with HTTP status code 403.

    Let us revisit the process of geo replication we explained above. A transaction on the primary is replicated asynchronously to the secondary region. However, since transactions across Partition Keys can happen out of order, we introduce a new term called “Last Sync Time” which acts as the conservative RPO time. All primary updates preceding the Last Sync Time (defined in UTC) are guaranteed to be available for read operations at the secondary. Primary updates after this point in time may or may not be available for reads. There is a separate Last Sync Time value provided for Blobs, Tables and Queues for a storage account. The Last Sync Time is calculated by tracking the geo replicated sync time for each partition and then reporting the minimum time for blobs, tables and queues. Let’s use an example to better illustrate the concept. The below table lists a timeline of operations on blob “myaccount.blob.core.windows.net/mycontainer/blob1.txt” and blob “myaccount.blob.core.windows.net/mycontainer/blob2.txt”

    UTC Time

    User Action

    Replication

    Last Sync

    Read request on secondary

    Wed, 23 Oct 2013 22:00:00

    User uploaded the blob1 with contents “Hello” – Transaction # 1

     

    Wed, 23 Oct 2013 21:58:00

     

    Wed, 23 Oct 2013 22:01:00

    User uploaded the blob2 with contents “Cheers” – Transaction # 2

     

    Wed, 23 Oct 2013 21:58:00

     

    Wed, 23 Oct 2013 22:02:00

    User issues read requests for blob1 and blob2 on secondary location

     

    Wed, 23 Oct 2013 21:58:00

    A read on blob1 & blob2 returns 404

    Wed, 23 Oct 2013 22:03:00

     

    The upload transaction # 2 is replicated to secondary location

    Wed, 23 Oct 2013 21:58:00

     

    Wed, 23 Oct 2013 22:04:00

    User updated blob1 with contents “Adios” – Transaction # 3

     

    Wed, 23 Oct 2013 21:58:00

     

    Wed, 23 Oct 2013 22:05:00

    User issues a read request for blob1 and blob2 on secondary location

     

    Wed, 23 Oct 2013 21:58:00

    A read on blob1 returns 404 and blob2 returns “Cheers”

    Wed, 23 Oct 2013 22:05:30

     

    The upload transaction # 1 is replicated to secondary location

    Wed, 23 Oct 2013 22:01:00

     

    Wed, 23 Oct 2013 22:06:00

    User issues a read request for blob1 on secondary location

     

    Wed, 23 Oct 2013 22:01:00

    A read on blob1 returns “Hello”

    Wed, 23 Oct 2013 22:07:00

     

    The upload transaction # 3 is replicated to secondary location

    Wed, 23 Oct 2013 22:04:00

     

    Wed, 23 Oct 2013 22:08:00

    User issues a read request for blob1 on secondary location

     

    Wed, 23 Oct 2013 22:04:00

    A read on blob1 returns “Adios”

    A few things to note here:

    • At Wed, 23 Oct 2013 22:03:00, though transaction 2 has been replicated and blob2 is available for read, the Last Sync time is still “Wed, 23 Oct 2013 21:58:00” since transaction 1 has not been replicated yet. The Last Sync Time is a conservative RPO time and guarantees that all transactions up to that time are available for read access on the secondary across all blobs for the storage account.
    • At Wed, 23 Oct 2013 22:05:30, only when transaction 1 has been replicated, would the Last Sync Time move to 22:01 (since transaction 2 is already replicated). Read on blob1 would return the contents “Hello” set by transaction 1 since changes related transaction 3 have not yet been replicated.
    • At Wed, 23 Oct 2013 22:07:00, only when transaction 3 has been replicated, would the Last Sync Time move to 22:04:00 to signify that all changes on blob1 and blob2 would be available for read on secondary. At that point, any read on blob1 will reflect the change in contents to “Adios”.
    How to find the Last Sync Time using RA-GRS?

    Starting with REST version 2013-08-15, a new REST API “Get Service Stats” is made available for Blobs, Tables and Queue services. This API is available only from the secondary endpoint and provides the geo replication stats for the service. The stats includes two pieces of information maintained for each service:

    1. The status of geo replication: This can be one of the following three values

    a. Live: Indicates that geo replication is enabled, active and operational

    b. Bootstrap: Indicates the initialization phase of bootstrapping the data from primary to secondary when the storage account is change from LRS to GRS. During this phase, the secondary endpoint may not be available for reads.

    c. Unavailable: The system cannot compute the Last Sync Time due to an outage or has not yet computed the Last Sync Time.

    2. Last Sync Time: This indicates the replication lag time for the service which we explained above. This is empty if the status is Bootstrap or Unavailable. When the status is in “Live” state, this is a valid UTC time.

    What is the RA-GRS SLA and Pricing?

    The benefit of using RA-GRS is that it provides a higher read availability (99.99+%) for a storage account over GRS (99.9+%). When using RA-GRS the write availability continues to be 99.9+% (same as GRS today) and read availability for RA-GRS is 99.99+%, where the data is expected to be read from secondary if primary is unavailable. In terms of pricing, the capacity (GB) charge is slightly higher for RA-GRS than GRS, whereas the transaction and bandwidth charges are the same for GRS and RA-GRS. See the Windows Azure Storage pricing page here for more details about the SLA and pricing.

    Storage Analytics for RA-GRS?

    Windows Azure Storage service provides users with storage analytics data that can be utilized to monitor the usage of the storage service. With RA-GRS, storage metrics for transactions made to the secondary endpoint are also made available if metrics are enabled using the primary endpoint using Set Service Properties for Windows Azure Blob, Table and Queue. To keep it simple, the metrics data for the transactions issued against secondary endpoint are made available only on the primary endpoint in the following table names:

    • $MetricsHourSecondaryTransactionsBlob
    • $MetricsHourSecondaryTransactionsTable
    • $MetricsHourSecondaryTransactionsQueue
    • $MetricsMinuteSecondaryTransactionsBlob
    • $MetricsMinuteSecondaryTransactionsTable
    • $MetricsMinuteSecondaryTransactionsQueue

    Metrics for transactions made to primary endpoints are still available at:

    • $MetricsHourPrimaryTransactionsBlob
    • $MetricsHourPrimaryTransactionsTable
    • $MetricsHourPrimaryTransactionsQueue
    • $MetricsMinutePrimaryTransactionsBlob
    • $MetricsMinutePrimaryTransactionsTable
    • $MetricsMinutePrimaryTransactionsQueue

    With the preview release of RA-GRS, logs are not yet made available for transactions against the secondary endpoint. Please refer to MSDN for details on analytics.

    Scenarios

    Read-only access to the secondary enables higher read availability. Applications that require higher availability and can handle eventually consistent reads can issue secondary reads when the primary for a storage account is unavailable.

    RA-GRS support in the Storage Client Library

    The Storage Client Library 3.0 which uses REST version 2013-08-15 provides new capabilities for RA-GRS. It provides the ability to query the Last Sync Time for Blobs, Tables and Queues, and it provides library support for reads to automatically retry the secondary if the request to the primary times out.

    1. GetServiceStats API for CloudBlobClient, CloudTableClient and CloudQueueClient: This API allows applications to easily retrieve the replication status and LastSyncTime for each service.

    Example:

    CloudStorageAccount account = CloudStorageAccount.Parse(cxnString);
    CloudTableClient client = account.CreateCloudTableClient();

    // Note that Get Service Stats is supported only on secondary endpoint
    client.LocationMode = LocationMode.SecondaryOnly;
    ServiceStats stats = client.GetServiceStats();
    string lastSyncTime = stats.GeoReplication.LastSyncTime.HasValue ?
    stats.GeoReplication.LastSyncTime.Value.ToString() : "empty";
    Console.WriteLine("Replication status = {0} and LastSyncTime = {1}", stats.GeoReplication.Status, lastSyncTime);

    2. LocationMode property: This property allows the secondary endpoint to be used when the primary is not available. The important values for this are:

    a. PrimaryOnly: All read requests should be issued only to the primary endpoint.

    b. PrimaryThenSecondary: Read requests should first be issued to primary and if it fails with retryable error then subsequent retries will alternate between secondary and primary. If secondary access returns 404 (Not Found) for the object then subsequent retries will remain on primary.

    c. SecondaryOnly: The read will be issued to the secondary endpoint.

    Note that all LocationMode options except for “SecondaryOnly” will continue to send write requests to just the primary endpoint. Using “SecondaryOnly” option for write requests will throw a StorageException.

    This property can be set either on:

    • Cloud[Blob|Table|Queue]Client: The LocationMode option is used for all requests issued using via objects associated with this client.
    • [Blob|Table|Queue]RequestOptions: The LocationMode option can be overridden at an API level using the same client object.

    Code Example for using PrimaryThenSecondary for download blob request:

    CloudStorageAccount account = CloudStorageAccount.Parse(cxnString);
    CloudBlobClient client = account.CreateCloudBlobClient();
    CloudBlobContainer container = client.GetContainerReference(containerName);
    CloudBlockBlob blob = container.GetBlockBlobReference(blobName);

    // Set the location mode for the request using request options. This request will first try
    // to download the blob using primary and if that fails, it will try the secondary location for subsequent retries
    blob.DownloadToFile(
    localFileName,
    FileMode.OpenOrCreate,
    null/* access condition */,
    new BlobRequestOptions()
    {
    LocationMode = LocationMode.PrimaryThenSecondary,
    ServerTimeout = TimeSpan.FromMinutes(3)
    });

    3. A new retry policy interface IExtendedRetryPolicy has been introduced to allow users to extend retries in which target location for subsequent retries can be changed. This interface provides a new method Evaluate which replaces ShouldRetry. Note that ShouldRetry is still retained in this interface for backward compatibility but is unused.

    The Evaluate method allows users to return RetryInfo which contains the location to use on subsequent retry in addition to the RetryInterval.

    In this example of retry class implementation, we change the target location to be secondary only on the last attempt.

    public RetryInfo Evaluate(RetryContext retryContext, OperationContext operationContext)
    {
    statusCode = retryContext.LastRequestResult.HttpStatusCode;

    if (retryContext.CurrentRetryCount >= this.maximumAttempts
    || ((statusCode >= 300 && statusCode < 500 && statusCode != 408)
    || statusCode == 501 // Not Implemented
    || statusCode == 505 // Version Not Supported
    ))
    {
    // do not retry
    returnnull;
    }

    RetryInfo info = new RetryInfo();
    info.RetryInterval = EvaluateBackoffTime();
    if (retryContext.CurrentRetryCount == this.maximumAttempts - 1)
    {
    // retry against secondary
    info.TargetLocation = StorageLocation.Secondary;
    }

    return info;
    }

    We hope you enjoy this new feature and please provide us feedback using comments on this blogs or via Windows Azure Storage forums.

    Jai Haridas and Brad Calder

    Viewing all 167 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>