AEM6 and Archiva Servlet gotcha

I am working on a project that is using AEM6 and an AEM FeaturePack for Adobe Campaign. As this is a FP, the APIs are not available through the normal public repositories. I needed to get the project to build so the solution is to use the Archive Servlet. This worked great and the project compiled and installed into AEM fine.

I then needed to use HTTPClient to make some external webservice calls, so looked at what was exposed by the Archiva servlet, found that AEM already uses 4.3.3, added the dependency, mvn clean install, code compiled, happy days.

Until I went to install it into AEM. Then I was getting weird bundle dependency issues:

21.04.2015 16:56:53.374 *INFO* [OsgiInstallerImpl] Unable to start bundle [396] : Unresolved constraint in bundle [396]: Unable to resolve 396.9: missing requirement [396.9] osgi.wiring.package; (osgi.wiring.package=com.carrotsearch.hppc)

CarrotSearch? Very strange. A quick look at the MANIFEST.MF revealed a huge number of dependencies now on my bundle:

$ cat ./bundle/target/classes/META-INF/MANIFEST.MF
Manifest-Version: 1.0
Bnd-LastModified: 1429628261535
Build-Jdk: 1.7.0_67
Built-By: brobertson
Bundle-ClassPath: .,httpcore-4.3.2.jar,httpclient-4.3.2.jar
Bundle-Description: Maven Multimodule project.
Bundle-ManifestVersion: 2
Bundle-Name: ST Test Integration Bundle
Bundle-Version: 0.1.1.SNAPSHOT
Created-By: Apache Maven Bundle Plugin
Embed-Dependency: httpcore;httpclient;
Embedded-Artifacts: httpcore-4.3.2.jar;g="org.apache.httpcomponents";a="
Import-Package: com.carrotsearch.hppc,com.carrotsearch.hppc.cursors,com.
Service-Component: OSGI-INF/serviceComponents.xml
Tool: Bnd-1.50.0

And when I removed the HTTPClient dependency, they all disappeared.

What I found was that in my ~/.m2/repository/org/apache/httpcomponents/httpclient/4.3.3 the JAR was suspiciously large at over 10MB. And when I looked at the MANIFEST.MF of this file:

Manifest-Version: 1.0
Bnd-LastModified: 1399600663141
Build-Jdk: 1.7.0_40
Built-By: jzitting
Bundle-Category: oak
Bundle-Description: Oak Solr OSGi support
Bundle-ManifestVersion: 2
Bundle-Name: Oak Solr OSGi
Bundle-SymbolicName: org.apache.jackrabbit.oak-solr-osgi
Bundle-Vendor: The Apache Software Foundation
Bundle-Version: 1.0.0
Created-By: Apache Maven Bundle Plugin
Embed-Dependency: *;scope=runtime;inline=true
Import-Package: org.apache.lucene.expressions;resolution:=optional,org.a
Service-Component: OSGI-INF/org.apache.jackrabbit.oak.plugins.index.solr
Tool: Bnd-

So in the end I moved the Archiva servlet to the last Repository in the Repositories list to make this work.

Hope this helps someone!

JMX Monitoring in a hurry

Have used this 2 times within 1 month on 2 different clients with 2 different application servers, so it must be worth sharing.

Both times I had a need to monitor some values available to me from JMX, and then chart those results afterwards. And in both cases having to make requests to the SysAdmins to setup the monitoring in their normal tools would have taken days/weeks. And this needed to be running by that afternoon…you know how it is.

First case was when we were handling a set of load testing for a client whom we host. The applications where hosted on JBoss EAP and the developers needed to monitor the some of the core features (infinispan caches, datasources, jms) as well as some custom MBeans they had developed.

Second case was an AEM6 application where we wanted to watch some basics like the thread count, heap usage, cpu load while we were running JMeter tests.

What does it do?

It basically output a raw text looking like this:

2015-02-27 16:00:11.736,thread-pool=default.queueSize 0
2015-02-27 16:00:11.739,thread-pool=default.rejectedCount 0
2015-02-27 16:00:11.742,thread-pool=default.completedTaskCount 53588
2015-02-27 16:00:11.744,thread-pool=default.currentThreadCount 10
2015-02-27 16:00:11.747,thread-pool=default.maxThreads 10

This is a TAB separated output with the format:

Timestamp <TAB> MBeanObjectName.AttributeName <TAB> AttributeValue

I have it outputing to STDOUT because that allows me to use rotatelogs that comes with Apache HTTPD to break the logs out into 4hr blocks easily.

This is output format is obviously a really easy format to work with, so as a separate process I have a Perl script that takes these raw logs and turns them into some nice and basic Google Charts that can be viewed locally.


A quick change around and a tiny bit of jQuery allowed generation of JSON files with the data, that could be loaded into a parent page. This allowed multiple data points to be graphed together:



At the moment the JMXClient can only read attributes, but for AEM I wanted to log the workflow count during an Author performance test. Luckily with AEM JMX is also exposed over HTTP.

So a simple shell script with curl and I was able to output the workflow count into the same format as the JMXClient output and append it to the logs, allowing the use of the same scripts to generate some graphs:



The MBeans attributes to watch is done via a TAB separated configuration file in this format:

MBeanObjectName <TAB> AttributeName1 <TAB> AttributeName2 <TAB> ...

Like this one I was using for AEM:

java.lang:type=Threading ThreadCount PeakThreadCount
java.lang:type=Memory HeapMemoryUsage NonHeapMemoryUsage
java.lang:type=Memory HeapMemoryUsage NonHeapMemoryUsage
java.lang:type=OperatingSystem ProcessCpuLoad SystemCpuLoad
Q>org.apache.jackrabbit.oak:name="Consolidated Cache statistics",type="ConsolidatedCacheStats",id=* CacheStats

The line starting with “Q>” tells the JMXClient to perform a query to get the real MBeanObject name. This is useful if that can change. In the example above the id is an integer that might be different between servers. On the JBoss configurations I also used to remove the need to hardcode any application versions under the deployment sub area where they would normally be my-app-1.0.3.war for example

Q>*,subsystem=ejb3,stateless-session-bean=* poolCurrentSize waitTime executionTime peakConcurrentInvocations
Q>*,subsystem=ejb3,message-driven-bean=* poolCurrentSize waitTime executionTime peakConcurrentInvocations

Running JMXClient

Running the client is straightforward:

java -jar jmxclient.jar type \
  ip port jmx_username jmx_password \
  config-file.tsv sleep_time \
  refresh_mbean_names_every \
  perform_gc_logging perform_thread_logging_every
  • type – used to define the JMX Connection String. Only known value is “jboss”, for none JBoss monitoring it uses standard connection (supply any value)
  • ip – IP of the JMX server
  • port – Port of the JMX server
  • jmx_username / jmx_password – connection details
  • config-file.tsv – the MBean attribute configuration file
  • sleep_time – Stop for this number of seconds between monitoring requests
  • refresh_mbeans_names_every – For performance we only perform a refresh of queried MBeans every X loop iterations. Multiply by sleep time to get real time.
  • perform_gc_logging – Log Garbage Collection notifications
  • perform_thread_logging_every – Log Thread State counts every X loop iterations

For example

java -jar jmxclient.jar jboss \ 9998 admin admin \
  config-cluster.tsv \
  60 10 true  10 > jmx-log.txt

Creating the Charts

Then once you have your log output it becomes a simple process of running the perl script to generate the charts.

cat jmx-log.txt | perl destination-dir

I have a shell script wrapper that performs a bit of clean up etc around it, but that is essentially it.

The source for this will be up on GitHub shortly (once I remove a few customer specific configs etc).

Hope this helps someone!

First steps with enterprise search

Since my first day in TBSCG I have been dealing with search engines. In this article I would like to share some of the acquired knowledge regarding this, at least for me 😛 , exciting area of IT.

Data, data, data… big one

I don’t think anyone from the IT world has not heard about Big Data yet. I am going to use this in order to introduce you in this world.

As we know, there is a lot of data over there, plenty of them. Created by publishers, by companies, by users and a lot of self-generated logs, traces and so on. It is also growing every day. How do we manage this? Trying to solve this problem is what Big Data is about, but the concept is still too much wide to be useful in any way. Today, though, there are three very well defined growing areas that exist only because of the big data problem: Business Intelligence, Cloud Systems (including those huge NoSQL databases), and, the present case, Enterprise Search. Anyway, think that it is a still a niche market, but with great expectations of future growth.

A search engine is a software product made to index or organize information. It allows not only to find a document (like a web page, a word document, or a row in a database) but also to retrieve information from it like entities (or language concepts; i.e. names, addresses, dates, etc.), sentiments (positive/negative, at least) or anything what the data-mining science and data-scientists can. Of course not only the text can be indexed. Remember: we are talking about data! So images, videos and audio streams are also available to be indexed, and then found, analyzed and connected to other sources.

In the enterprise world we want to do the same, but with the data of a specific company. When a system like this is implemented, the intention is to do “like” Google does with the WWW. But, of course, we don’t have Google’s budget. Besides, let’s be fair: the web is much easier to index and to structure (having standards for both the document structure and link definition would make the life much easier, indeed).

There are a couple of commercial and open-source solutions available to implement. Google has its own, of course, but there are also HP, IBM, Microsoft, Oracle and many others. In the open source side the most famous, almost standard, option is Apache Solr, based on the Lucene engine. A newer and fancier alternative is ElasticSearch.

You should know something: there’s not much secret in what a search engine should do and cannot do (as I said, it is more a problem for mathematicians and data scientists). All of the alternatives are quite similar and do mostly the same, with small differences. What makes the difference between good search results is not the product itself, but the in-site implementation and how well-adapted is to the company structure and needs.

Here at TBSCG we work with HP Autonomy solution, called IDOL (Intelligent Data Operating Layer), and, even though you can’t have a working copy at home, there is a much nicer way to try it without installation, on the cloud 🙂

IDOL OnDemand

Now let’s get started and do something useful! Go to and Sign Up for an account in order to try the free version of IDOL.

Once you are registered, please go to . This will give you a general idea of what you can do with IDOL. The most important are Connectors and Indexes.

As a concept, a Connector is a software module that “connects” to a source and “sends” the information from that source to an Index. They are necessary to retrieve the information from those unstructured places, and that is why every kind of information has a different connector. In the other hand, Indexes are the containers of such information, and there are also different types of indexes.

It is possible to connect many Connectors to an index, or each connector to a different index. Then, the queries are sent to a specific index, or to all of them. Everything depends on what you want to do; your needs.

I have drawn a simple example schema where 3 connectors are connected to 2 indexes that receive 2 types of queries:


Nowadays IDOL OnDemand supports two types of connectors: web, for indexing web pages, and file system, for indexing files. There is only one type of index: text.

If you check any IDOL API, you would notice that the connection is made through a simple REST interface. That means you can connect any application using any language just using a REST interface, which is great 😀

They have plenty of tutorials in the community site, but I am going to show a small sync interface made in Java and the Unirest library that can be used for anything. You can download the full code here.

First, set your API identifier on top of the file. Get yours from

private final String apikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX"; //Own key

Now, you can check the most important functions for creating an index and a connector:

public String createIndex(String name, String description)
 return runQueryStringGET("createtextindex/v1?index=" + name + "&flavor=standard&description=" + description);
public String createWebConnector(String name, String URL, String index, String description)
 return runQueryStringGET("createconnector/v1?connector=" + name
 + "&type=web&config=%7B%22url%22%3A%22http%3A%2F%2F" + URL + "%22%7D&"
 + "destination=%7B%22action%22%3A%22addtotextindex%22,%22index%22%3A%22" + index + "%22%7D&"
 + "description=%22" + description + "%22");

In order to start doing something, a simple query:

public String queryText(String query){
 return runQueryStringGET("querytextindex/v1?text=" + query);

Calling the client is easy:

IDOLOnDemandClientSync cl1 = new IDOLOnDemandClientSync();
 cl1.createIndex("tbscg_web", "");
 cl1.createWebConnector("tbscg_connector", "", "tbscg_web", "");

Remember there is a limited number of connectors and indexes you can have. You can check the usage of your quotas in

In the future I will go deeper into the IDOL OnDemand configuration and functionalities. HP is always adding new ones and improving the platform.

Here you can find more examples for other programming languages:

And here more tutorials: