Druid is a data store designed for high-performance slice-and-dice analytics (“OLAP“-style) on large data sets. Druid is most often used as a data store for powering GUI analytical applications, or as a backend for highly-concurrent APIs that need fast aggregations. Common application areas for Druid include:
- Clickstream analytics
- Network flow analytics
- Server metrics storage
- Application performance metrics
- Digital marketing analytics
- Business intelligence / OLAP and more
Druid Three Node Cluster Setup
Here we are going to learn how to install druid in three node cluster based on your hardware requirement. We can set it up on a single machine too. We highly recommend to go with cluster setup for production as it is more efficient than single machine.
By the end of this blog, you will be able to set up your own druid cluster to load the data.
You will need:
- Java 8
- Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
- On Mac OS X, you can use Oracle’s JDK 8 to install Java.
On Linux, your OS package manager will be able to help you install Java. If your Ubuntu- based OS does not have a recent version of Java, WebUpd8 offers packages for those Oses.
Download Druid :-
Download the latest version of druid
Extract Druid by running the following commands in your terminal
In the package, you should find:
- bin/* – scripts useful for this quickstart
- conf/* – template configurations for a clustered setup
- extensions/* – core Druid extensions
- hadoop-dependencies/* – Druid Hadoop dependencies
- lib/* – libraries and dependencies for core Druid
- quickstart/* – configuration files, sample data, and other files for the quickstart tutorials
Download Zookeeper :-
Druid has a dependency on Apache ZooKeeper for distributed coordination. You’ll need to download and run Zookeeper.
In the package root, run the following commands:
The startup scripts for the tutorial will expect the contents of the Zookeeper tarball to be located at zk under the apache-druid-0.13.0-incubating package root
Select hardware for three nodes configuration
|1||Coordinator and Overlord||8GB||4||Min of 100GB|
|2||Broker||8GB||4||Min of 100GB|
|3||Historicals and MiddleManagers||8GB||4||Min of 100GB|
Note :- Broker needs more memory for the query process, so we are leaving it as one single node. Higher the memory, faster the query.
If your using any data pushing service in any node (tranquility/kafka… etc), it can occupy more than 8GB/16GB.
Configure addresses for Druid coordination
In this simple cluster, you will deploy a single Druid Coordinator, a single Druid Overlord, a single ZooKeeper instance, and an embedded Derby metadata store on the same server.
In conf/druid/_common/common.runtime.properties, replace “zk.service.host” with the address of the machine that runs your ZK instance:
In conf/druid/_common/common.runtime.properties, replace “metadata.storage.*” with the address of the machine that you will use as your metadata store:
Tune Druid Coordinator and Overlord
In <Druid/path/>/conf/druid/coordinator you will find two configuration jvm.conf and runtime.properties
In jvm.conf change xms and xmx according to your systemc hardware configuration
The flag Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM), while Xms specifies the initial memory allocation pool. This means that your JVM will be started with Xms amount of memory and will be able to use a maximum of Xmx amount of memory
Our hardware compatibility can take upto:
In runtime properties you can leave as default. Only thing you need to take care is druid.service, druid.port and you can add druid.host
- druid.service : druid/coordinator / druid/overlord
- druid.host : 8081 / 8090
- druid.host: coordinatorHost / overlordHost
Note : – It is recommended to add druid.host variable when you setup cluster, provide specific IP of node .Do not use any “localhost” keyword in druid.host. You can change own services, ports and host of coordinator and overlord.
Tune Druid Broker
Same as the coordinator and overlord of xms and xmx can be used as your node memory
Our hardware compatibility can take upto:
Most important properties for broker is buffer size and cache size
Buffer size is the amount of time it takes for your computer to process any incoming signal.
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, closer to a processor core, which stores copies of the data from frequently used main memory locations.
Typically for our hardware we can take
- druid.processing.buffer.sizeBytes = 25600000
We can increase cache and buffer size to speed up your druid queries as fast as you can.
Tune Druid Historical and Middlemanager
Historical and middle manager service can be on single node or diff nodes. If it is single node specify the jvm.conf xms . Specify xmx more to Historical than middle manager, because historical node takes care of handover the data to druid after task is created and succeed.
The buffer size and cache size for both the services is same as broker Node.
For Middle manager, we can increase task running speed by simply increment in
druid.indexer.running.javaOPts: Xmx1g -Xmx2g
If you are using different hardware, we recommend adjusting configurations for your specific hardware. The most commonly adjusted configurations are:
- -Xmx and -Xms
- druid.server.maxSize and druid.segmentCache.locations on Historical Nodes
- druid.worker.capacity on MiddleManagers
If you’re using a firewall or some other system that only allows traffic on specific ports, allow inbound connections on the following:
- 1527 (Derby on your Coordinator; not needed if you are using a separate metadata store like MySQL or PostgreSQL)
- 2181 (ZooKeeper; not needed if you are using a separate ZooKeeper cluster)
- 8081 (Coordinator)
- 8082 (Broker)
- 8083 (Historical)
- 8084 (Standalone Realtime, if used)
- 8088 (Router, if used)
- 8090 (Overlord)
- 8091, 8100–8199 (Druid Middle Manager; you may need higher than port 8199 if you have a very high druid.worker.capacity)
- 8200 (Tranquility Server, if used)
Start Coordinator And Overlord
On your coordination server, cd into the distribution and start up the coordination services (you should do this in different windows or pipe the log to a file):
- java `cat conf/druid/coordinator/jvm.config | xargs` -cp conf/druid/_common:conf/druid/coordinator:lib/* org.apache.druid.cli.Main server coordinator
- java `cat conf/druid/overlord/jvm.config | xargs` -cp conf/druid/_common:conf/druid/overlord:lib/* org.apache.druid.cli.Main server overlord
You should see a log message printed out for each service that starts up. You can view detailed logs for any service by looking in the var/log/druid directory using another terminal.
Start Historicals and Middle Managers
Copy the Druid distribution and your edited configurations to your servers set aside for the Druid Historicals and MiddleManagers.
On each one, cd into the distribution and run this command to start a Data server:
- java `cat conf/druid/historical/jvm.config | xargs` -cp conf/druid/_common:conf/druid/historical:lib/* org.apache.druid.cli.Main server historical
- java `cat conf/druid/middleManager/jvm.config | xargs` -cp conf/druid/_common:conf/druid/middleManager:lib/* org.apache.druid.cli.Main server middleManager
You can add more servers with Druid Historicals and MiddleManagers as needed.
Start Druid Broker
- java `cat conf/druid/broker/jvm.config | xargs` -cp conf/druid/_common:conf/druid/broker:lib/* org.apache.druid.cli.Main server broker
You can add more Brokers as required based on query load.
-Blog by Sai Chandra