h5serv -- The HDF server -- Notes and examples

Author: Dave Kuhlman
Contact: dkuhlman (at) davekuhlman (dot) org
Address:
http://www.davekuhlman.org
Revision: 1.0a
Date: July 06, 2015
Copyright:Copyright (c) 2015 Dave Kuhlman. All Rights Reserved. This software is subject to the provisions of the MIT License http://www.opensource.org/licenses/mit-license.php.
Abstract:This document provides hints, guidance, and sample code for access to an h5serv server.

Contents

1   More info

h5serv, the HDF SERVER, serves information about and data from HDF5 data files.

2   Start up

I installed h5serv under the Anaconda Python distribution from Continuum. See this for more information: https://store.continuum.io/cshop/anaconda/.

Instructions on installing h5serv under Anaconda and setting up your environment are included with h5serv distribution. See file ../docs/Installation/ServerSetup.rst in the h5serv distribution.

Installation -- Do this under Linux:

$ conda create -n h5serv python=2.7 h5py twisted tornado requests pytz

Set up your environment -- Depending on where you have installed Anaconda, so something like the following:

$ source ~/a1/Python/Anaconda/Anaconda01/envs/h5serv/bin/activate h5serv

If and when you need to deactivate this environment, use:

$ source deactivate

Server startup -- Go to the server sub-directory in your h5serv installation, and run app.py. For example:

$ cd ~/a1/Python/Anaconda/H5serv/Git/h5serv/server
$ python app.py

3   Examples

3.1   cURL

The curl command line tool is an easy way to make REST requests to an h5serv server. Some examples:

$ curl -X GET -H "host:testdata04.hdfgroup.org" http://crow:5000

Here is a bash shell script that makes several requests (I've added echo at the end of each command so that a new line is added.):

#!/bin/bash

# get info about a database hdf5 file.
curl -X GET -H "host: testdata04.hdfgroup.org" http://crow:5000 ; echo
# get the IDs of the datasets in the file.
curl -X GET -H "host: testdata04.hdfgroup.org" http://crow:5000/datasets ; echo
# get info about one specific dataset.
curl -X GET -H "host: testdata04.hdfgroup.org" http://crow:5000/datasets/f416d15c-2114-11e5-81d4-0019dbe2bd89 ; echo
# get the data values from a specific dataset.
curl -X GET -H "host: testdata04.hdfgroup.org" http://crow:5000/datasets/f416d15c-2114-11e5-81d4-0019dbe2bd89/value ; echo

3.2   Python

You will need to install the requests package. You can find it here: https://pypi.python.org/pypi/requests. For this testing, I used the Anaconda distribution of Python, which, I believe, includes requests by default. You can learn about Anaconda here: https://store.continuum.io/cshop/anaconda/.

Using IPython:

In [1]: import requests
In [2]: req = 'http://crow:5000/'
In [3]: hdrs = {'host': 'testdata04.hdfgroup.org'}
In [4]: rsp = requests.get(req, headers=hdrs)
In [5]: rsp
Out[5]: <Response [200]>
In [6]: print rsp.text
{"lastModified": "2015-07-02T23:49:18.303330Z", "hrefs": [{"href": "http://testdata04.hdfgroup.org/", "rel": "self"}, {"href": "http://testdata04.hdfgroup.org/datasets", "rel": "database"}, {"href": "http://testdata04.hdfgroup.org/groups", "rel": "groupbase"}, {"href": "http://testdata04.hdfgroup.org/datatypes", "rel": "typebase"}, {"href": "http://testdata04.hdfgroup.org/groups/f416d152-2114-11e5-81d4-0019dbe2bd89", "rel": "root"}], "root": "f416d152-2114-11e5-81d4-0019dbe2bd89", "created": "2015-07-02T23:49:18.303330Z"}
In [7]:
In [7]: print rsp.json()
{u'lastModified': u'2015-07-02T23:49:18.303330Z', u'hrefs': [{u'href': u'http://testdata04.hdfgroup.org/', u'rel': u'self'}, {u'href': u'http://testdata04.hdfgroup.org/datasets', u'rel': u'database'}, {u'href': u'http://testdata04.hdfgroup.org/groups', u'rel': u'groupbase'}, {u'href': u'http://testdata04.hdfgroup.org/datatypes', u'rel': u'typebase'}, {u'href': u'http://testdata04.hdfgroup.org/groups/f416d152-2114-11e5-81d4-0019dbe2bd89', u'rel': u'root'}], u'root': u'f416d152-2114-11e5-81d4-0019dbe2bd89', u'created': u'2015-07-02T23:49:18.303330Z'}
In [8]:
In [8]: req = 'http://crow:5000/groups'
In [9]: rsp = requests.get(req, headers=hdrs)
In [10]: rsp
Out[10]: <Response [200]>
In [11]: print rsp.json()
{u'hrefs': [{u'href': u'http://testdata04.hdfgroup.org/groups', u'rel': u'self'}, {u'href': u'http://testdata04.hdfgroup.org/groups/f416d152-2114-11e5-81d4-0019dbe2bd89', u'rel': u'root'}, {u'href': u'http://testdata04.hdfgroup.org/', u'rel': u'home'}], u'groups': [u'f416d155-2114-11e5-81d4-0019dbe2bd89', u'f416d158-2114-11e5-81d4-0019dbe2bd89', u'f416d15b-2114-11e5-81d4-0019dbe2bd89']}

And here is a Python script containing examples of several requests like those above:

#!/usr/bin/env python

import requests

def test():
    rsp = requests.get(
        'http://crow:5000',
        headers={'host': 'testdata04.hdfgroup.org'})
    print rsp.text
    print rsp.json()
    rsp = requests.get(
        'http://crow:5000/groups',
        headers={'host': 'testdata04.hdfgroup.org'})
    print rsp.json()
    rsp = requests.get(
        'http://crow:5000/groups/f416d155-2114-11e5-81d4-0019dbe2bd89',
        headers={'host': 'testdata04.hdfgroup.org'})
    print rsp.json()
    rsp = requests.get(
        'http://crow:5000/datasets',
        headers={'host': 'testdata04.hdfgroup.org'})
    print rsp.json()
    rsp = requests.get(
        'http://crow:5000/datasets/f416d154-2114-11e5-81d4-0019dbe2bd89',
        headers={'host': 'testdata04.hdfgroup.org'})
    print rsp.json()
    rsp = requests.get(
        'http://crow:5000/datasets/f416d154-2114-11e5-81d4-0019dbe2bd89/value',
        headers={'host': 'testdata04.hdfgroup.org'})
    print rsp.json()
    value = rsp.json()['value']
    print 'value: {}'.format(value)
    return rsp.json()

def main():
    test()

if __name__ == '__main__':
    main()

And, the following is a Python script that is functionally equivalent to the previous one, but that attempts to hide some of the repetition and messiness in a class:

#!/usr/bin/env python

import requests

class H5servRequest(object):
    def __init__(self, host, machine, port):
        self.host = host
        self.machine = machine
        self.port = port
        self.location = "{}:{}".format(machine, port)

    def get(self, path):
        rsp = requests.get(
            self.location + path,
            headers={'host': self.host})
        return rsp.json()

def test():
    req = H5servRequest(
        'testdata04.hdfgroup.org',
        'http://crow',
        5000)
    data = req.get('')
    print '-----\n{}'.format(data)
    data = req.get('/groups')
    print '-----\n{}'.format(data)
    data = req.get('/datasets')
    print '-----\n{}'.format(data)
    data = req.get('/datasets/f416d154-2114-11e5-81d4-0019dbe2bd89')
    print '-----\n{}'.format(data)
    data = req.get('/datasets/f416d154-2114-11e5-81d4-0019dbe2bd89/value')
    print '-----\n{}'.format(data)

def main():
    test()

if __name__ == '__main__':
    main()

Notes:

  • The class H5servRequest captures and reuses the host, the machine (or node to which we make our requests, and the port.
  • Each call to H5servRequest.get uses the request module to send the request, then returns the JSON payload.

3.3   JavaScript/Node.js

Here is a similar example written in Node.js:

#!/usr/bin/env node

var http = require('http');
var log = console.log;

function do_request(path, cb) {
  var opt = {};
  opt.hostname = 'crow';
  opt.port = 5000;
  opt.method = 'GET';
  opt.headers = {host: 'testdata04.hdfgroup.org'};
  opt.path = path;
  log('opt: ' + JSON.stringify(opt));
  var req = http.request(opt, function (response) {
    response.on('data', function (chunk) {
      log('-----\nbody: ' + chunk);
      if (cb !== null) {
        cb(chunk);
      }
    });
  });
  req.on('error', function(e) {
    log('request error: ' + e.message);
  });
  req.end();
}

function test() {
  var content;
  do_request('/', null);
  do_request('/groups', null);
  do_request('/datasets', null);
  do_request('/datasets/f416d15c-2114-11e5-81d4-0019dbe2bd89', null);
  do_request(
    '/datasets/f416d15c-2114-11e5-81d4-0019dbe2bd89/value',
    function(data) {
      var content, values;
      content = JSON.parse(data);
      values = content.value;
      log('-----\nvalues: ' + values);
  });
}

test();

The HTTP requests in the above example are asynchronous, and, therefore, the results may not come out in the same order as our calls to do_request. Here is an example that uses a recursive loop to execute these operations in a serial order:

#!/usr/bin/env node

var http = require('http');
var async = require('async');
var log = console.log;

var args = [
  ['/', null],
  ['/groups', null],
  ['/datasets', null],
  ['/datasets/f416d15c-2114-11e5-81d4-0019dbe2bd89', null],
  ['/datasets/f416d15c-2114-11e5-81d4-0019dbe2bd89/value', function(data) {
    var content, values;
    content = JSON.parse(data);
    values = content.value;
    log('-----\nvalues: ' + values);
  }],
];

function do_request(args, idx) {
  if (idx < args.length) {
    var path = args[idx][0],
      cb = args[idx][1],
      opt = {};
    opt.hostname = 'crow';
    opt.port = 5000;
    opt.method = 'GET';
    opt.headers = {host: 'testdata04.hdfgroup.org'};
    opt.path = path;
    log('opt: ' + JSON.stringify(opt));
    var req = http.request(opt, function (response) {
      response.on('data', function (chunk) {
        log('-----\nbody: ' + chunk);
        if (cb !== null) {
          cb(chunk);
        }
        do_request(args, idx + 1);
      });
    });
    req.on('error', function(e) {
      log('request error: ' + e.message);
    });
    req.end();
  }
}

function test() {
  do_request(args, 0);
}

test();

Notice that, in this example (above) we do not call do_request recursively until the response.on callback has been called.