Quantcast
Channel: Microsoft Azure Storage Team Blog
Viewing all articles
Browse latest Browse all 167

Windows Azure Storage Client for Java Overview

$
0
0

We released the Storage Client for Java with support for Windows Azure Blobs, Queues, and Tables. Our goal is to continue to improve the development experience when writing cloud applications using Windows Azure Storage. This release is a Community Technology Preview (CTP) and will be supported by Microsoft. As such, we have incorporated feedback from customers and forums for the current .NET libraries to help create a more seamless API that is both powerful and simple to use. This blog post serves as an overview of the library and covers some of the implementation details that will be helpful to understand when developing cloud applications in Java. Additionally, we’ve provided two additional blog posts that cover some of the unique features and programming models for the blob and table service.

Packages

The Storage Client for Java is distributed in the Windows Azure SDK for Java jar (see below for locations). For the optimal development experience import the client sub package directly (com.microsoft.windowsazure.services.[blob|queue|table].client). This blog post refers to this client layer.

The relevant packages are broken up by service:

Common

com.microsoft.windowsazure.services.core.storage – This package contains all storage primitives such as CloudStorageAccount, StorageCredentials, Retry Policies, etc.

Services

com.microsoft.windowsazure.services.blob.client – This package contains all the functionality for working with the Windows Azure Blob service, including CloudBlobClient, CloudBlob, etc.

com.microsoft.windowsazure.services.queue.client – This package contains all the functionality for working with the Windows Azure Queue service, including CloudQueueClient, CloudQueue, etc.

com.microsoft.windowsazure.services.table.client – This package contains all the functionality for working with the Windows Azure Table service, including CloudTableClient, TableServiceEntity, etc.

Services

While this document describes the common concepts for all of the above packages, it’s worth briefly summarizing the capabilities of each client library. Blob and Table each have some interesting features that warrant further discussion. For those, we’ve provided additional blog posts linked below. The client API surface has been designed to be easy to use and approachable, however to accommodate more advanced scenarios we have provided optional extension points when necessary.

Blob

The Blob API supports all of the normal Blob Operations (upload, download, snapshot, set/get metadata, and list), as well as the normal container operations (create, delete, list blobs). However we have gone a step farther and also provided some additional conveniences such as Download Resume, Sparse Page Blob support, simplified MD5 scenarios, and simplified access conditions.

To better explain these unique features of the Blob API, we have published an additional blog post which discusses these features in detail. You can also see additional samples in our article How to Use the Blob Storage Service from Java.

Sample – Upload a File to a Block Blob

// You will need these imports
import com.microsoft.windowsazure.services.blob.client.CloudBlobClient;
import com.microsoft.windowsazure.services.blob.client.CloudBlobContainer;
import com.microsoft.windowsazure.services.blob.client.CloudBlockBlob;
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;

// Initialize Account
CloudStorageAccount account = CloudStorageAccount.parse([ACCOUNT_STRING]);

// Create the blob client
CloudBlobClient blobClient = account.createCloudBlobClient();

// Retrieve reference to a previously created container
CloudBlobContainer container = blobClient.getContainerReference("mycontainer");

// Create or overwrite the "myimage.jpg" blob with contents from a local
// file
CloudBlockBlob blob = container.getBlockBlobReference("myimage.jpg");
File source = new File("c:\\myimages\\myimage.jpg");
blob.upload(new FileInputStream(source), source.length());

(Note: It is best practice to always provide the length of the data being uploaded if it is available; alternatively a user may specify -1 if the length is not known)

Table

The Table API provides a minimal client surface that is incredibly simple to use but still exposes enough extension points to allow for more advanced “NoSQL” scenarios. These include built in support for POJO, HashMap based “property bag” entities, and projections. Additionally, we have provided optional extension points to allow clients to customize the serialization and deserialization of entities which will enable more advanced scenarios such as creating composite keys from various properties etc.

Due to some of the unique scenarios listed above the Table service has some requirements and capabilities that differ from the Blob and Queue services. To better explain these capabilities and to provide a more comprehensive overview of the Table API we have published an in depth blog post which includes the overall design of Tables, the relevant best practices, and code samples for common scenarios. You can also see more samples in our article How to Use the Table Storage Service from Java.

Sample – Upload an Entity to a Table

// You will need these imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;

// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.parse([ACCOUNT_STRING]);

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
         
// Create a new customer entity.
CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");
customer1.setEmail("Walter@contoso.com");
customer1.setPhoneNumber("425-555-0101");

// Create an operation to add the new customer to the people table.
TableOperation insertCustomer1 = TableOperation.insert(customer1);

// Submit the operation to the table service.
tableClient.execute("people", insertCustomer1);

Queue

The Queue API includes convenience methods for all of the functionality available through REST. Namely creating, modifying and deleting queues, adding, peeking, getting, deleting, and updating messages, and also getting the message count. Here is a sample of creating a queue and adding a message, and you can also read How to Use the Queue Storage Service from Java.

Sample – Create a Queue and Add a Message to it

// You will need these imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.queue.client.CloudQueue;
import com.microsoft.windowsazure.services.queue.client.CloudQueueClient;
import com.microsoft.windowsazure.services.queue.client.CloudQueueMessage;
// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.parse([ACCOUNT_STRING]);

// Create the queue client
CloudQueueClient queueClient = storageAccount.createCloudQueueClient();

// Retrieve a reference to a queue
CloudQueue queue = queueClient.getQueueReference("myqueue");

// Create the queue if it doesn't already exist
queue.createIfNotExist();

// Create a message and add it to the queue
CloudQueueMessage message = new CloudQueueMessage("Hello, World");
queue.addMessage(message);

 

Design

When designing the Storage Client for Java, we set up a series of design guidelines to follow throughout the development process. In order to reflect our commitment to the Java community working with Azure, we decided to design an entirely new library from the ground up that would feel familiar to Java developers. While the basic object model is somewhat similar to our .NET Storage Client Library there have been many improvements in functionality, consistency, and ease of use which will address the needs of both advanced users and those using the service for the first time.

Guidelines

  • Convenient and performant – This default implementation is simple to use, however we will always be able to support the performance-critical scenarios. For example, Blob upload APIs require a length of data for authentication purposes. If this is unknown a user may pass -1, and the library will calculate this on the fly. However, for performance critical applications it is best to pass in the correct number of bytes.
  • Users own their requests – We have provided mechanisms that will allow users to determine the exact number of REST calls being made, the associated request ids, HTTP status codes, etc. (See OperationContext in the Object Model discussion below for more). We have also annotated every method that will potentially make a REST request to the service with the @DoesServiceRequest annotation. This all ensures that you, the developer, are able to easily understand and control the requests made by your application, even in scenarios like Retry, where the Java Storage Client library may make multiple calls before succeeding.
  • · Look and feel –
    • Naming is consistent. Logical antonyms are used for complimentary actions (i.e. upload and download, create and delete, acquire and release)
    • get/set prefixes follow Java conventions and are reserved for local client side “properties”
    • Minimal overloads per method. One with the minimum set of required parameters and one overload including all optional parameters which may be null. The one exception is listing methods have 2 minimum overloads to accommodate the common scenario of listing with prefix.
  • Minimal API Surface – In order to keep the API surface smaller we have reduced the number of extraneous helper methods. For example, Blob contains a single upload and download method that use Input / OutputStreams. If a user wishes to handle data in text or byte form, they can simply pass in the relevant stream.
  • Provide advanced features in a discoverable way – In order to keep the core API simple and understandable advanced features are exposed via either the RequestOptions or optional parameters.
  • Consistent Exception Handling - The library immediately will throw any exception encountered prior to making the request to the server. Any exception that occurs during the execution of the request will subsequently be wrapped inside a StorageException.
  • Consistency – objects are consistent in their exposed API surface and functionality. For example a Blob, Container, or Queue all expose an exists() method

Object Model

The Storage Client for Java uses local client side objects to interact with objects that reside on the server. There are additional features provided to help determine if an operation should execute, how it should execute, as well as provide information about what occurred when it was executed. (See Configuration and Execution below)

Objects

StorageAccount

The logical starting point is a CloudStorageAccount which contains the endpoint and credential information for a given storage account. This account then creates logical service clients for each appropriate service: CloudBlobClient, CloudQueueClient, and CloudTableClient. CloudStorageAccount also provides a static factory method to easily configure your application to use the local storage emulator that ships with the Windows Azure SDK.

A CloudStorageAccount can be created by parsing an account string which is in the format of:

"DefaultEndpointsProtocol=http[s];AccountName=<account name>;AccountKey=<account key>"

Optionally, if you wish to specify a non-default DNS endpoint for a given service you may include one or more of the following in the connection string.

“BlobEndpoint=<endpoint>”, “QueueEndpoint=<endpoint>”, “TableEndpoint=<endpoint>”

Sample – Creating a CloudStorageAccount from an account string

// Initialize Account
CloudStorageAccount account = CloudStorageAccount.parse([ACCOUNT_STRING]);

ServiceClients

Any service wide operation resides on the service client. Default configuration options such as timeout, retry policy, and other service specific settings that objects associated with the client will reference are stored here as well.

For example:

  • To turn on Storage Analytics for the blob service a user would call CloudBlobClient.uploadServiceProperties(properties)
  • To list all queues a user would call CloudQueueClient.listQueues()
  • To set the default timeout to 30 seconds for objects associated with a given client a user would call Cloud[Blob|Queue|Table]Client.setTimeoutInMs(30 * 1000)

Cloud Objects

Once a user has created a service client for the given service it’s time to start directly working with the Cloud Objects of that service. A CloudObject is a CloudBlockBlob, CloudPageBlob, CloudBlobContainer, and CloudQueue, each of which contains methods to interact with the resource it represents in the service.

Below are basic samples showing how to create a Blob Container, a Queue, and a Table. See the samples in the Services section for examples of how to interact with a CloudObjects.

Blobs

// Retrieve reference to a previously created container
CloudBlobContainer container = blobClient.getContainerReference("mycontainer");

// Create the container if it doesn't already exist
container.createIfNotExist()

Queues

// Retrieve a reference to a queue
CloudQueue queue = queueClient.getQueueReference("myqueue");

// Create the queue if it doesn't already exist
queue.createIfNotExist();

Tables

Note: You may notice that unlike blob and queue the table service does not use a CloudObject to represent an individual table, this is due to the unique nature of the table service which will is covered more in depth in the Tables deep dive blog post. Instead, table operations are performed via the CloudTableClient object:

// Create the table if it doesn't already exist
tableClient.createTableIfNotExists("people");

 

Configuration and Execution

In each maximum overload of each method provided in the library you will note there are two or three extra optional parameters provided depending on the service, all of which accept null to allow users to utilize just a subset of the features they require. For example to utilize only RequestOptions simply pass in null to AccessCondition and OperationContext. These objects for these optional parameters provide the user an easy way to determine if an operation should execute, how to execute it, and retrieve additional information about how it was executed when it completes.

AccessCondition

An AccessCondition’s primary purpose is to determine if an operation should execute, and is supported when using the Blob service. Specifically, AccessCondition encapsulates Blob leases as well as the If-Match, If-None-Match, If-Modified_Since, and the If-Unmodified-Since HTTP headers. An AccessCondition may be reused across operations as long as the given condition is still valid. For example, a user may only wish to delete a blob if it hasn’t been modified since last week. By using an AccessCondition, the library will send the HTTP "If-Unmodified-Since" header to the server which may not process the operation if the condition is not true. Additionally, blob leases can be specified through an AccessCondition so that only operations from users holding the appropriate lease on a blob may succeed.

AccessCondition provides convenient static factory methods to generate an AccessCondition instance for the most common scenarios (IfMatch, IfNoneMatch, IfModifiedSince, IfNotModifiedSince, and Lease) however it is possible to utilize a combination of these by simply calling the appropriate setter on the condition you are using.

The following example illustrates how to use an AccessCondition to only upload the metadata on a blob if it is a specific version.

blob.uploadMetadata(AccessCondition.generateIfMatchCondition(currentETag), null /* RequestOptions */, null/* OperationContext */);

Here are some Examples:

//Perform Operation if the given resource is not a specified version:
AccessCondition.generateIfNoneMatchCondition(eTag)

//Perform Operation if the given resource has been modified since a given date:
AccessCondition. generateIfModifiedSinceConditionlastModifiedDate)

//Perform Operation if the given resource has not been modified since a given date:
AccessCondition. generateIfNotModifiedSinceCondition(date)

//Perform Operation with the given lease id (Blobs only):
AccessCondition. generateLeaseCondition(leaseID)

//Perform Operation with the given lease id if it has not been modified since a given date:
AccessCondition condition = AccessCondition. generateLeaseCondition (leaseID);
condition. setIfUnmodifiedSinceDate(date);

RequestOptions

Each Client defines a service specific RequestOptions (i.e. BlobRequestOptions, QueueRequestOptions, and TableRequestOptions) that can be used to modify the execution of a given request. All service request options provide the ability to specify a different timeout and retry policy for a given operation; however some services may provide additional options. For example the BlobRequestOptions includes an option to specify the concurrency to use when uploading a given blob. RequestOptions are not stateful and may be reused across operations. As such, it is common for applications to design RequestOptions for different types of workloads. For example an application may define a BlobRequestOptions for uploading large blobs concurrently, and a BlobRequestOptions with a smaller timeout when uploading metadata.

The following example illustrates how to use BlobRequestOptions to upload a blob using up to 8 concurrent operations with a timeout of 30 seconds each.

BlobRequestOptions options = new BlobRequestOptions();

// Set ConcurrentRequestCount to 8
options.setConcurrentRequestCount(8);

// Set timeout to 30 seconds
options.setTimeoutIntervalInMs(30 * 1000); 

blob.upload(new ByteArrayInputStream(buff),
     blobLength,
     null /* AccessCondition */,
     options,
     null /* OperationContext */);

OperationContext

The OperationContext is used to provide relevant information about how a given operation executed. This object is by definition stateful and should not be reused across operations. Additionally the OperationContext defines an event handler that can be subscribed to in order to receive notifications when a response is received from the server. With this functionality, a user could start uploading a 100 GB blob and update a progress bar after every 4 MB block has been committed.

Perhaps the most powerful function of the OperationContext is to provide the ability for the user to inspect how an operation executed. For each REST request made against a server, the OperationContext stores a RequestResult object that contains relevant information such as the HTTP status code, the request ID from the service, start and stop date, etag, and a reference to any exception that may have occurred. This can be particularly helpful to determine if the retry policy was invoked and an operation took more than one attempt to succeed. Additionally, the Service Request ID and start/end times are useful when escalating an issue to Microsoft.

The following example illustrates how to use OperationContext to print out the HTTP status code of the last operation.

OperationContext opContext = new OperationContext();
queue.createIfNotExist(null /* RequestOptions */, opContext);
System.out.println(opContext.getLastResult().getStatusCode());

 

Retry Policies

Retry Policies have been engineered so that the policies can evaluate whether to retry on various HTTP status codes. Although the default policies will not retry 400 class status codes, a user can override this behavior by creating their own retry policy. Additionally, RetryPolicies are stateful per operation which allows greater flexibility in fine tuning the retry policy for a given scenario.

The Storage Client for Java ships with 3 standard retry policies which can be customized by the user. The default retry policy for all operations is an exponential backoff with up to 3 additional attempts as shown below:

new RetryExponentialRetry(  
    3000 /* minBackoff in milliseconds */,
    30000 /* delatBackoff in milliseconds */,
    90000 /* maxBackoff in milliseconds */,
    3 /* maxAttempts */);

With the above default policy, the retry will approximately occur after: 3,000ms, 35,691ms and 90,000ms

If the number of attempts should be increased, one can use the following:

new RetryExponentialRetry(  
    3000 /* minBackoff in milliseconds */,
    30000 /* delatBackoff in milliseconds */,
    90000 /* maxBackoff in milliseconds */,
    6 /* maxAttempts */);

With the above policy, the retry will approximately occur after: 3,000ms, 28,442ms and 80,000ms, 90,000ms, 90,000ms and 90,000ms.

NOTE: the time provided is an approximation because the exponential policy introduces a +/-20% random delta as described below.

NoRetry - Operations will not be retried

LinearRetry - Represents a retry policy that performs a specified number of retries, using a specified fixed time interval between retries.

ExponentialRetry (default) - Represents a retry policy that performs a specified number of retries, using a randomized exponential backoff scheme to determine the interval between retries. This policy introduces a +/- %20 random delta to even out traffic in the case of throttling.

A user can configure the retry policy for all operations directly on a service client, or specify one in the RequestOptions for a specific method call. The following illustrates how to configure a client to use a linear retry with a 3 second backoff between attempts and a maximum of 3 additional attempts for a given operation.

serviceClient.setRetryPolicyFactory(new RetryLinearRetry(3000,3));

Or

TableRequestOptions options = new TableRequestOptions();
options.setRetryPolicyFactory(new RetryLinearRetry(3000, 3));

Custom Policies

There are two aspects of a retry policy, the policy itself and an associated factory. To implement a custom interface a user must derive from the abstract base class RetryPolicy and implement the relevant methods. Additionally, an associated factory class must be provided that implements the RetryPolicyFactory interface to generate unique instances for each logical operation. For simplicities sake the policies mentioned above implement the RetryPolicyFactory interface themselves, however it is possible to use two separate classes

Note about .NET Storage Client

During the development of the Java library we have identified many substantial improvements in the way our API can work. We are committed to bringing these improvements back to .NET while keeping in mind that many clients have built and deployed applications on the current API, so stay tuned.

Summary

We have put a lot of work into providing a truly first class development experience for the Java community to work with Windows Azure Storage. We very much appreciate all the feedback we have gotten from customers and through the forums, please keep it coming. Feel free to leave comments below,

Joe Giardino
Developer
Windows Azure Storage

Resources

Get the Windows Azure SDK for Java

Learn more about the Windows Azure Storage Client for Java

Learn more about Windows Azure Storage


Viewing all articles
Browse latest Browse all 167

Trending Articles