The Command-Line RESTafarian

Wednesday, 15 Jul 2015 Tags: hacks terminal

Almost any modern-day service or application provides an HTTP endpoint to work with. Whether they provide metrics, allows remote administration, or accepts complex requests, a system administrator will spend a lot of time working in the terminal accessing and updating such APIs.

There are many many tools to help us, but today we’re going to look at just four key tools: curl, jq, yajl, and httpie.

curl

curl is probably available on every non-Windows system out of the box these days, except minimal builds. It supports almost any protocol you can name, and many you’ve likely never even heard of. For most people, curl is both the reference tool when testing an HTTP API for compatibility with the HTTP spec (note, in 2014 now split into multiple specifications), and the work horse used in many applications, either directly, or as an integrated libcurl library built into the application.

Particularly useful options are:

-v to produce verbose output including headers and indicating transfer direction
-s to suppress all output, including progress indicators during download
-X to specify the HTTP verb used, for example -XPUT or -XPOST
-H to add arbitrary HTTP headers, for example -HContent-Type:application/json;charset=utf-8
-o to write any data output to a file, verbose info such as headers go to stderr still
-# to get a hash on stderr each time a chunk of data is received
-L to follow a link, for example when checking a 301 redirect

Uploading large files

When uploading large files, it’s important to consider the data format used (binary, ascii, UTF-8, BASE64 encoded), and whether the data is streamed or resident in memory. Most people use the -d | --data option, which will load the entire file into memory, and then send it with Content-Type:application/x-www-form-urlencoded which is usually not what you expected to happen - lots of CPU to encode the data, lots of memory to hold the file.

“if you start the data with the letter @, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin. The contents of the file must already be URL-encoded. Multiple files can also be specified. Posting data from a file named ‘foobar’ would thus be done with --data-binary @foobar“.
— http://curl.haxx.se/docs/manpage.html

The --data-binary option is normally more appropriate, as it at least avoids base64-encoding, but it will still load the entire file into memory

I recommend using -T | --upload-file pretty much all the time. It streams data, so your memory usage is not excessive, does not encode the data unnecessarily — most of the time it just Does What You Mean.

jq

jq is one of those tools that you wonder how you ever did without it. It’s a pipe-capable terminal tool that can be used to reformat or select streaming JSON-based data, in the same way you might use grep or wc on some arbitrary data.

Let’s take a look at a simple example, piping the output of curl directly into jq. The . parameter represents the identity function, and as by default jq also pretty-prints JSON, it takes the single-line curl response and gives it pretty colours and indentation. This alone makes me happy.

$ curl -s skunkwerks.cloudant.com | jq .

{
  "couchdb": "Welcome",
  "version": "1.0.2",
  "cloudant_build": "2409"
}

But let’s say we are only interested in the version number of Cloudant’s API. Maybe this is part of a script or cron job, and we want to confirm that the version number is compatible with some operation we want to perform. Easy-peasy.

$ curl -s skunkwerks.cloudant.com | jq .version
"1.0.2"

Perhaps we need to transform that JSON in some fashion. In this case, we’ll destructure the JSON object, and produce a new one that happens to have different keys than the original. This is useful when, for example, you migrate JSON data out of one system that uses id as a key for a document, into CouchDB that wants _id.

$ curl -s skunkwerks.cloudant.com | \
  jq '{message: .couchdb, release: .version}'
{
  "message": "Welcome",
  "release": "1.0.2"
}

Here’s a more complex version to whet your appetite. We’ll query a database view, _all_docs because I’m extremely lazy, and pull out a subset of the returned data.

$ curl -s skunkwerks.cloudant.com/aspiring/_all_docs | \
  jq '.rows[0].id'
"south_face"

$ curl -s skunkwerks.cloudant.com/aspiring/_all_docs | \
  jq '[. .rows[] | {_id: .id, revision: .value.rev} ]'
[
  {
    "_id": "south_face",
    "revision": "3-6b27b124e8669007fbf1a6222974ae7f"
  },
  {
    "_id": "west_face",
    "revision": "3-b6bb3256d343aa05bb79818ecc8f75ca"
  }
]

The second operation pulls out all document ids, and the current revision from the value object, and finally wraps them as a convenient array. Neat.

unbuffered

The final feature of jq that I love is its support for streaming APIs. jq’s default mode loops over data until it gets to a suitable spot, and flushes data when required to the next part of our shell command - perhaps another pipe. But sometimes we need to deal with sources that deliver data at a different rate to how fast it can be processed.

The --unbuffered flag tells jq to spit out JSON data as soon as it has something useful, rather than waiting for enough data to complete parsing.

Our example is a bit contrived, as it would work perfectly well without streaming support. In this case we receive a continual stream of updated metrics from a riemann server that is monitoring a number of Erlang servers.

$ wsc 'ws://lolcat:5556/index?subscribe=true&query=(service =~ "vmstats%")'
| \ jq --unbuffered .

{
  "host": "icouch.wintermute.skunkwerks.at",
  "service": "vmstats memory_atoms",
  "state": "ok",
  "description": null,
  "metric": 256480,
  "tags": [
    "katja_vmstats",
    "instance: icouch",
    "couch",
    "beam"
  ],
  "time": "2015-07-15T12:58:22.000Z",
  "ttl": 60
}
{
  "host": "icouch.wintermute.skunkwerks.at",
  "service": "vmstats error_logger_message_queue",
  "state": "ok",
  "description": null,
  "metric": 0,
  "tags": [
    "katja_vmstats",
    "instance: icouch",
    "couch",
    "beam"
  ],
  "time": "2015-07-15T12:58:22.000Z",
  "ttl": 60
}
...

yajl

yajl is actually the first JSON library I encountered, and it’s still one of my favourites. It’s extremely fast, available as both a library and a command-line tool on every platform I’ve used recently.

I tend to use yajl for validating and reformatting JSON from pretty-printed to packed form. The latter can be done with jq as well of course, so let’s just see validation of piped data. Often somebody will ask why CouchDB is rejecting their valid JSON, and I point them to yajl so that they can check themselves. This is almost always invalid UTF-8, by the way.

$ echo '{"foo": schmnoo }' | json_verify
lexical error: invalid char in json text.
                                        {"foo": schmnoo }
                     (right here) ------^
JSON is invalid

$ echo '{"foo": true }' | json_verify
JSON is valid

httpie

HTTPie is another very powerful tool written in python, using the well known requests library under the hood. I use it a lot when talking to JSON webservices to build up JSON objects rather than copy and paste text into the shell from elsewhere. This way, HTTPie takes care of ensuring I’m supplying valid JSON.

http --verbose --style fruity \
  PUT http://localhost:5984/testy/kv \
  content-type:application/json;charset=utf-8 \
  key=value \
  foo:=true

PUT /testy/kv HTTP/1.1
Accept: application/json
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 29
Host: localhost:5984
User-Agent: HTTPie/0.9.2
content-type: application/json;charset=utf-8

{
    "foo": true,
    "key": "value"
}

HTTP/1.1 201 Created
Cache-Control: must-revalidate
Content-Length: 65
Content-Type: application/json
Date: Wed, 15 Jul 2015 13:45:37 GMT
ETag: "1-158688661a13aa0a0a25849e7ed78da4"
Location: http://localhost:5984/testy/kv
Server: CouchDB/1.6.1 (Erlang OTP/18)

{
    "id": "kv",
    "ok": true,
    "rev": "1-158688661a13aa0a0a25849e7ed78da4"
}

While HTTPIe is capable of much more, you can see above a few things:

Using colon-separated fields like content-type:application/json to specify HTTP headers. In this case, we could have used the inbuilt --json flag instead.
Using = for normal JSON strings and content
Using := to embed raw JSON - otherwise "foo": true would have been stored as a literal "true" string and not the JSON native true value.
All HTTP headers sent and received are visible via the --verbose flag, once again using colour for terminal happiness.

Other Tools

Please let me know if you have other tools you use and love! I’m aware that there’s an equivalent of httpie for most programming languages, for example:

Ruby: http://johnnunemaker.com/httparty/