jq Logo

Using JQ to extract info from AEM

JQ is the JSON Query tool. Useful for parsing JSON from the command line – and combined with cURL makes the default .json extensions for AEM scriptable.

The syntax is explained well the manual but it does take some getting used to. Let’s start with a simple example before looking at a script with AEM

Using the Network tools in Chrome and Firefox you can “Copy as HAR” or “Save as HAR”

2016-03-23 21_48_20-save-as-har

Then when you have that file you can use jq to extract all the requested URLs on the page:

$ jq '.log.entries[] | .request.url' www.tbscg.com.har | head
"http://www.tbscg.com/"
"http://www.tbscg.com/resources/themes/tbscg-theme/css/tbscg-pack.min.css"
"http://www.tbscg.com/dam/jcr:3773445b-6f79-457b-b15a-b605a9b1eaf0/TBSCG_Logo_Horizontal(white).2015-10-29-10-22-35.png"
"http://www.tbscg.com/dam/jcr:e87afa96-d276-4462-a94b-8cab9b51ae0d/Doosan.2015-08-05-18-11-13.2016-01-28-10-36-27.png"
"http://www.tbscg.com/dam/jcr:35e6cd9d-32d2-4d36-8858-62c5665b13ae/uni_20Southampton.2015-08-05-18-11-06.2016-01-28-10-36-32.png"
"http://www.tbscg.com/dam/jcr:c8ee450c-1dfd-49b0-bd46-8f0c730adc5d/Electrabel_GDF_Suez.2015-08-05-18-11-12.2016-01-28-10-36-27.png"
"http://www.tbscg.com/dam/jcr:a68d2861-5905-449b-b3aa-a72348a6185b/American%20Airlines.2016-01-28-10-36-23.png"
"http://www.tbscg.com/dam/jcr:582b3e19-ead1-4be3-871b-52826d30d15d/Renfe.2015-08-05-18-11-09.2016-01-28-10-36-30.png"
"http://www.tbscg.com/dam/jcr:413a7dea-245a-4dc2-98eb-676e2233833a/Philips.2016-01-28-10-36-24.png"
"http://www.tbscg.com/dam/jcr:3c539ce7-57ad-4043-83e4-97395bec2424/Nationwide.2015-08-05-18-11-10.2016-01-28-10-36-29.png"

OK, great. So useful, but not super exciting.

The other day I had some problems with the user profile synchronisation in an AEM6.1 with AEM Communities (using Sling Distribution to sync the user profiles). Essentially I had seen some inconsistencies between a couple of the publish nodes and I wanted to check that out.

All the accounts I was interested in where stored under sub-folders of /home/users/community (in the randomly associated list).

Because of the 10k+ users I was not able to do the infamous “-1” json depth selector. So I had to walk the tree

http://localhost:4502/home/users/community.1.json

{
  "jcr:primaryType": "rep:AuthorizableFolder",
  "jcr:mixinTypes": [
    "rep:AccessControllable",
    "mix:lockable"
  ],
  "jcr:createdBy": "admin",
  "jcr:created": "Wed Jul 29 2015 15:24:55 GMT+0200",
  "k": {
    "jcr:primaryType": "rep:AuthorizableFolder",
    "jcr:mixinTypes": ["mix:lockable"]
    },
  "p": {
    "jcr:primaryType": "rep:AuthorizableFolder",
    "jcr:mixinTypes": ["mix:lockable"]
    },
 "rep:policy": {"jcr:primaryType": "rep:ACL"}, 
....
}

This was saved as aem-users.json

$ jq -r 'to_entries[] | select(.value["jcr:primaryType"]? == "rep:AuthorizableFolder") | .key' < aem-users.json
k
6
p
....

So let’s break this down. Think of it like the command line. JQ treats part of the filter (the part in single quotes above) as a separate command that takes whole JSON objects. We start with the to_entries[]

to_entries takes a object and creates an array of objects with the 2 properties “key” and “value”

$ jq -r 'to_entries' < aem-users.json  | head -n30
[
  {
    "key": "jcr:primaryType",
    "value": "rep:AuthorizableFolder"
  },
  {
    "key": "jcr:mixinTypes",
    "value": [
      "rep:AccessControllable",
      "mix:lockable"
    ]
  },
  {
    "key": "jcr:createdBy",
    "value": "admin"
  },
  {
    "key": "jcr:created",
    "value": "Wed Jul 29 2015 15:24:55 GMT+0200"
  },
  {
    "key": "k",
    "value": {
      "jcr:primaryType": "rep:AuthorizableFolder",
      "jcr:mixinTypes": [
        "mix:lockable"
      ]
    }
  },
  {

adding the square brackets converts the array into a stream of objects instead (notice the lack of a comma between the objects)

$ jq -r 'to_entries[]' < aem-users.json | head -n30
{
 "key": "jcr:primaryType",
 "value": "rep:AuthorizableFolder"
}
{
 "key": "jcr:mixinTypes",
 "value": [
 "rep:AccessControllable",
 "mix:lockable"
 ]
}
{
 "key": "jcr:createdBy",
 "value": "admin"
}
{
 "key": "jcr:created",
 "value": "Wed Jul 29 2015 15:24:55 GMT+0200"
}
{
 "key": "k",
 "value": {
 "jcr:primaryType": "rep:AuthorizableFolder",
 "jcr:mixinTypes": [
 "mix:lockable"
 ]
 }
}

And with a stream of objects we can use select to filter only the objects we want (in this case “rep:AuthorizableFolder”)
The ? at the end of .value[“jcr:primaryType”]? means that the property does not have to exist

$ jq -r 'to_entries[] | select(.value["jcr:primaryType"]? == "rep:AuthorizableFolder")' < aem-users.json  | head -n30
{
  "key": "k",
  "value": {
    "jcr:primaryType": "rep:AuthorizableFolder",
    "jcr:mixinTypes": [
      "mix:lockable"
    ]
  }
}
{
  "key": "6",
  "value": {
    "jcr:primaryType": "rep:AuthorizableFolder"
  }
}
{
  "key": "p",
  "value": {
    "jcr:primaryType": "rep:AuthorizableFolder"
  }
}
{
  "key": "x",
  "value": {
    "jcr:primaryType": "rep:AuthorizableFolder"
  }
}
....

Then after that we only want the value of the key from the original object, so we extract the key property:

$ jq -r 'to_entries[] | select(.value["jcr:primaryType"]? == "rep:AuthorizableFolder") | .key' < aem-users.json
k
6
p
....

From there I took each line ($D below) and ran another curl to get the contents of the folder (knowing that the users are max 1 level deep):

$ curl -ks -u "$AEMUSER" "$HOST/home/users/community/$D.1.json" > aem-users-$D.json
$ # from now we will use the first $D in the list "k"
$ jq '.' <  aem-users-k.json | head -n 20
{
  "jcr:primaryType": "rep:AuthorizableFolder",
  "jcr:mixinTypes": [
    "mix:lockable"
  ],
  "j_ZpYfJvXt6Pe6G_EtTd": {
    "jcr:primaryType": "rep:User",
    "jcr:mixinTypes": [
      "rep:AccessControllable"
    ],
    "jcr:createdBy": "admin",
    "rep:password": "{SHA-256}dac40c901dac879d-1000-9ac833605d2152dab160a88634f3dd4fb698edeed3c96eb197e8c4a4f8907677",
    "jcr:created": "Fri Feb 05 2016 05:21:15 GMT+0100",
    "rep:principalName": "kelly.brett",
    "jcr:uuid": "302413f2-604d-30e0-8c15-17f53c9fdff3",
    "rep:authorizableId": "kelly.brett"
  },
  "kv7kjXcI-d3JD-xdQqXo": {
    "jcr:primaryType": "rep:User",
    "jcr:mixinTypes": [

Note: $ jq '.' will just pretty-print your JSON.

From here we can produce a list of JSON Objects with just the path and username like this:

$ D=k
$ jq "to_entries[] | select(.value[\"jcr:primaryType\"]? == \"rep:User\") | {path: (\"/home/users/community/$D/\" + .key), username: .value[\"rep:authorizableId\"]}" <  aem-users-k.json | head -n 20
{
  "path": "/home/users/community/k/j_ZpYfJvXt6Pe6G_EtTd",
  "username": "kelly.brett"
}
{
  "path": "/home/users/community/k/kv7kjXcI-d3JD-xdQqXo",
  "username": "k.j.001"
}
{
  "path": "/home/users/community/k/s8O1ee7bVhNXP3j51j7r",
  "username": "kepner.christopher"
}
{
  "path": "/home/users/community/k/F-Nv0PITaFIvl5gE4FSF",
  "username": "kamran.muhammad"
}
{
  "path": "/home/users/community/k/JzsrDYmGePItpv46X2Wq",
  "username": "kostadinov.miro"
}

Hope this intro to JQ helps. There is a lot of options for JQ, but the easiest way to handle it is to just build up your filter one part at at time, like I did with the middle example.

Share

Published by

Blair Robertson

I am Senior Solution Architect here at TBSCG specialising in getting projects done. I work on projects around two Enterprise CMS - Adobe AEM and HP Teamsite - however I get involved in all parts of the technical landscape.