Zanopia – Stateless application, database & storage architecture

Objects and the cloud.

Accessing S3 buckets with libdroplet

with one comment

 
Amazon Simple Storage Service "S3" was one of the first solutions which allowed any user to store and access its files and documents securely and durably on the Internet.

  Because S3 protocol included a comprehensive set of features guaranteeing security, integrity and durability of storage, and also because it has been made publicly available, it has been widely adopted by open source and proprietary client tools becoming "de facto" a standard.

S3 protocol allowed an interesting model of charging users for data transfer and data storage. Consequently it has been adopted by the hosting industry now offering services compatible with the S3 protocol.

Droplet

The idea is that every user owns a set of "buckets", there is currently a maximum of 100 per user at Amazon. Each bucket can be viewed as a directory containing files. By default a bucket is private and only the owner can access it, but a bucket can be made public or it is possible to set more fine grained permissions on it, e.g. by allowing other users to view files. Users can initiate all kind of requests, e.g. listing buckets, putting files in buckets, getting files, modifying ACL of files, binding metadata to files, etc.

S3 protocol is roughly an improved REST protocol, which adds strong authentication features, first of all it guarantees the identity of the user but it allows nice features like access delegation to other users, temporary access and so on. It is also possible, thanks to protocol, to guarantee the integrity of the content by using a MD5 checksum computed before transfer and re-computed when the file is finally stored in the hosting provider cloud. File transfer can be encrypted by using HTTPS instead of default HTTP.

Libdroplet is a C library which implements the S3 protocol and facilitates the writing of tools which interacts with S3 services.

Libdroplet comes with a set of features which enhances the S3 protocol:

  • Multi-profile system
  • Fully multi-threaded (efficient in a data center environment)
  • Virtual directories with true absolute and relative path support
  • On-the-fly encryption/decryption and buffered I/O
  • Manages storage pricing
  • Simplified metadata management

 It also includes a small shell tool which allows to browse over buckets with file and directory completion.

First, download latest version of libdroplet there.

Untar the archive and compile it:

$ tar zxvf scality-Droplet-2a678dd.tar.gz
$ cd
scality-Droplet-2a678dd
$ make
$ sudo make install

Configure it:

$ mkdir ~/.droplet
$ cp doc/default.profile ~/.droplet
$ cp doc/AWS_US-Standard_Storage.pricing ~/.droplet
$ edit ~/.droplet/default.profile

(set your access_key and secret_key which you get at your hosting provider)

Additional help on configuration file can be found. You can have a look at Pricing files syntax (if you want to enable accounting).

Test your configuration with dplsh:

$ dplsh
:/> la
bucket1
bucket2
...
:/> bucket1:
bucket1:/> mkdir foo
bucket1:/> cd foo
bucket1:/foo/> ls -l
bucket1:/foo/> put /etc/hosts
bucket1:/foo/> ls -l
.
hosts
bucket1:/foo/> ^D

It is possible to create new profiles in ~./droplet directory, e.g. foo.profile. The profile is then selectable by using the DPLPROFILE environment variable. You can find additional help on dplsh.

Droplet library API is split into three layers:

  • S3 request builder API
  • S3 convenience API
  • Vdir high-level API
The S3 request builder API allows caller to create new type of requests:
void dpl_req_free(dpl_req_t *req);
dpl_req_t *dpl_req_new(dpl_ctx_t *ctx);
void dpl_req_set_method(dpl_req_t *req, dpl_method_t method);
dpl_status_t dpl_req_set_bucket(dpl_req_t *req, char *bucket);
dpl_status_t dpl_req_set_resource(dpl_req_t *req, char *resource);
dpl_status_t dpl_req_set_subresource(dpl_req_t *req, char *subresource);
void dpl_req_add_behavior(dpl_req_t *req, u_int flags);
void dpl_req_rm_behavior(dpl_req_t *req, u_int flags);
void dpl_req_set_location_constraint(dpl_req_t *req, dpl_location_constraint_t location_constraint);
void dpl_req_set_canned_acl(dpl_req_t *req, dpl_canned_acl_t canned_acl);
void dpl_req_set_storage_class(dpl_req_t *req, dpl_storage_class_t storage_class);
etc

The S3 convenience API allows caller to perform basic "raw" S3 operations, like listing buckets, creating buckets, creating files, etc:

#include <droplet.h>
//operations on buckets
dpl_status_t dpl_list_all_my_buckets(dpl_ctx_t *ctx, dpl_vec_t **vecp);
dpl_status_t dpl_make_bucket(dpl_ctx_t *ctx, char *bucket, dpl_location_constraint_t location_constraint, dpl_canned_acl_t canned_acl);
dpl_status_t dpl_list_bucket(dpl_ctx_t *ctx, char *bucket, char *prefix, char *delimiter, dpl_vec_t **objectsp, dpl_vec_t **common_prefixesp);
//operation on S3 files
dpl_status_t dpl_put(dpl_ctx_t *ctx, char *bucket, char *resource, char *subresource, dpl_dict_t *metadata, dpl_canned_acl_t canned_acl, char *data_buf, u_int data_len);
dpl_status_t dpl_put_buffered(dpl_ctx_t *ctx, char *bucket, char *resource, char *subresource, dpl_dict_t *metadata, dpl_canned_acl_t canned_acl, u_int data_len, dpl_conn_t **connp);
dpl_status_t dpl_get(dpl_ctx_t *ctx, char *bucket, char *resource, char *subresource, dpl_condition_t *condition, char **data_bufp, u_int *data_lenp, dpl_dict_t **metadatap);
dpl_status_t dpl_get_buffered(dpl_ctx_t *ctx, char *bucket, char *resource, char *subresource, dpl_condition_t *condition, dpl_header_func_t header_func, dpl_buffer_func_t buffer_func, void *cb_arg);
dpl_status_t dpl_delete(dpl_ctx_t *ctx, char *bucket, char *resource, char *subresource);
dpl_status_t dpl_head(dpl_ctx_t *ctx, char *bucket, char *resource, char *subresource, d

The Vdir high-level API allows to specify files by their absolute and relative paths (e.g. "../foo/bar") . For this it uses the Delimiter API from S3. It also enables features likeencryption on the fly, buffered I/O, etc: 

//manipulate virtual directories
dpl_status_t dpl_opendir(dpl_ctx_t *ctx, char *path, void **dir_hdlp);
dpl_status_t dpl_readdir(void *dir_hdl, dpl_dirent_t *dirent);
int dpl_eof(void *dir_hdl);
void dpl_closedir(void *dir_hdl);
dpl_status_t dpl_chdir(dpl_ctx_t *ctx, char *path);
dpl_status_t dpl_mkdir(dpl_ctx_t *ctx, char *path);
dpl_status_t dpl_rmdir(dpl_ctx_t *ctx, char *path);

//manipulates vfiles
dpl_status_t dpl_openwrite(dpl_ctx_t *ctx, char *path, u_int flags, dpl_dict_t *metadata, dpl_canned_acl_t canned_acl, u_int data_len, dpl_vfile_t **vfilep);
dpl_status_t dpl_write(dpl_vfile_t *vfile, char *buf, u_int len);
dpl_status_t dpl_openread(dpl_ctx_t *ctx, char *path, u_int flags, dpl_condition_t *condition, dpl_buffer_func_t buffer_func, void *cb_arg);
dpl_status_t dpl_unlink(dpl_ctx_t *ctx, char *path);
dpl_status_t dpl_getattr(dpl_ctx_t *ctx, char *path, dpl_condition_t *condition, dpl_dict_t **metadatap);

A C example (look at examples/recurse.c):

/*
* simple example which recurses a directory tree
*/

#include <droplet.h>

dpl_status_t
recurse(dpl_ctx_t *ctx,
char *dir,
int level)
{
void *dir_hdl;
dpl_dirent_t dirent;
int ret;

//vfs style call to change directory
ret = dpl_chdir(ctx, dir);
if (DPL_SUCCESS != ret)
return ret;

//vfs style call to open a directory
ret = dpl_opendir(ctx, ".", &dir_hdl);
if (DPL_SUCCESS != ret)
return ret;

while (!dpl_eof(dir_hdl))
{
//vfs style readdir
ret = dpl_readdir(dir_hdl, &dirent);
if (DPL_SUCCESS != ret)
return ret;

if (strcmp(dirent.name, "."))
{
int i;

for (i = 0;i < level;i++)
printf(" ");

printf("%s\n", dirent.name);
if (DPL_FTYPE_DIR == dirent.type)
{
ret = recurse(ctx, dirent.name, level + 1);
if (DPL_SUCCESS != ret)
return ret;
}
}
}

dpl_closedir(dir_hdl); //close a directory

if (level > 0)
{
//vfs like functions manipulate relative paths
ret = dpl_chdir(ctx, "..");
if (DPL_SUCCESS != ret)
return ret;
}

return DPL_SUCCESS;
}

int
main(int argc,
char **argv)
{
int ret;
dpl_ctx_t *ctx;
char *bucket = NULL;

if (2 != argc)
{
fprintf(stderr, "usage: recurse bucket\n");
exit(1);
}

bucket = argv[1];

//initialize the lib
ret = dpl_init();
if (DPL_SUCCESS != ret)
{
fprintf(stderr, "dpl_init failed\n");
exit(1);
}

//create a droplet context
ctx = dpl_ctx_new(NULL, NULL);
if (NULL == ctx)
{
fprintf(stderr, "dpl_ctx_new failed\n");
exit(1);
}

ctx->cur_bucket = bucket; //set current bucket

ret = recurse(ctx, "/", 0);
if (DPL_SUCCESS != ret)
{
fprintf(stderr, "error recursing\n");
exit(1);
}

dpl_ctx_free(ctx); //free the droplet context
dpl_free(); //terminates the library

return 0;
}

That's all folks!


Advertisements

Written by Giorgio Regni

October 1, 2010 at 4:21 pm

Posted in Storage

One Response

Subscribe to comments with RSS.

  1. Hello Sir,
    Thanks for sharing the post related to S3 Compatible Storage which is highly informative. S3 Compatible Storage enables service providers and enterprises to build reliable, affordable and scalable cloud storage solutions.

    Thanks

    John4you

    July 26, 2013 at 1:49 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: