If your queries suddenly started doing much worse than before, check for potential bugs in your application code. Test results are recorded accurately and systematically. To do this, check the CPU usage on client machines.Ĭlient-side configurations, like the number of threads, are tuned appropriately to saturate client bandwidth.
The client machines don't become a bottleneck. If you're using benchmarks such as Yahoo! Cloud Serving Benchmark, JMeter, or Pherf to test and tune performance, make sure that: Expected latency for scans averages approximately 100 milliseconds, compared to 10 milliseconds for point gets in HBase.
Compaction is disabled because, even though it is a resource-intensive process, customers have full flexibility to schedule it according to their workloads. By default, major compaction is disabled on HDInsight HBase clusters. CompactionĬompaction is another potential bottleneck that is fundamentally agreed upon in the community. This option is possible only if the WAL feature is enabled. To gain significant improvement in read operations, use Premium Block Blob Storage Account as your remote storage. This tremendously benefits write performance, and it helps many issues faced by some of the write-intensive workloads. It writes the WAL to Azure Premium SSD-managed disks. The Accelerated Writes feature is designed to solve this problem. In HDInsight, this behavior amplified this bottleneck. Until recently, the WAL was also written to Azure Storage. Data is stored remotely on Azure Storage, even though virtual machines host the region servers. HDInsight HBase has a separated storage-compute model. The top bottleneck in most HBase workloads is the Write Ahead Log (WAL). Before you apply configuration changes in a production environment, test them thoroughly.
Many of these tips depend on the particular workload and read/write/scan pattern.
This article describes various Apache HBase performance tuning guidelines and tips for getting optimal performance on Azure HDInsight.