Tuesday, August 12, 2014

Secure JDBC and ODBC Clients’ Access to HiveServer2

Secure JDBC and ODBC Clients’ Access to HiveServer2

By Vinay Shukla
August 12th, 2014

Introduction

HDP 2.1 ships with Apache Knox 0.4.0. This release of Apache Knox supports WebHDFS, WebHCAT, Oozie, Hive, and HBase REST APIs.
Hive is a popular component used for SQL access to Hadoop, and the Hive Server 2 with Thrift supports JDBC access over HTTP. The following steps show the configuration to enable a JDBC client to talk to Hive Server 2 via Knox (Beeline > JDBC over HTTPS > Knox > HTTP > Hive Server2). The picture describes the scenario that the blog covers.
jdbc_knox
This blog focuses on Beeline as the JDBC client; however, a screenshot of Simba ODBC Client configuration for ODBC access is attached at the bottom of the blog.

Step 1

Using Ambari, navigate to Hive > Config. In custom hive-site section set the the following:
  • hive.server2.thrift.http.path to cliservice
  • hive.server2.thrift.http.port
  • hive.server2.transport.mode to http
Note: If you get errors upon Hive Server 2 restart with port 10001, please use 10000 as the value for hive.server2.thrift.http.port and use 10000 in your Knox topology (sandbox.xml file is the topology in this blog )
Here is an example:
knox_1

Step 2

Save these Hive settings and restart HiveServer with Ambari.

Step 3

Ensure that the values you specified match the value for HIVE service. For example, here is my HIVE service values in /etc/knox/conf/topology/sandbox.xml that works against the values I specified in steps 1-3.
[code language=”javaScript”]
HIVE
http://sandbox:10001/cliservice
[/code]

Step 4

Run beeline client.
/usr/bin/beeline
Use the connection string specified in step 5. Note the BOLD values need to be customized for your environment. The values below work in the HDP 2.1 Sandbox and match what I specified in steps 1-3

Step 5

Connect to the Hive Server 2
beeline> !connect jdbc:hive2://sandbox:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive
Enter username and password that Beeline will send to Knox over HTTPS to authenticate the user. The Knox included with HDP 2.1 Sandbox has the account with the username “guest” and the password “guest-password.” Knox also supports using LDAP/AD for authentication, and once you configure Knox against LDAP, you can use any LDAP user to authenticate instead of guest account used in this example.
Note: The value for trustStorePassword is the Knox’s master password when Knox was setup. The default value of this password is “knox” only in HDP Sandbox.
Enter username for
jdbc:hive2://sandbox:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive: guest
Enter password for
jdbc:hive2://sandbox:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive: **************
Connected to: Apache Hive (version 0.13.0.2.1.1.0-237)
In the Beeline connection string, a trust store for HTTPS connection to Knox is specified. This truststore (and its password) is needed only when Knox is not configured to use a well-known SSL certificate. For example, out-of-box, Knox Gateway uses a Self-Signed certificate for SSL, and that certificate needs to be exported and put into a file that the client can use.
However, in a production environment, Knox should be configured to use a CA authorized SSL certificate, and on the JDBC client, you need not configure a truststore and truststore password.

Step 5 with HDP 2.2

With HDP 2.2 the beeline connection URL can be simplified to
beeline> !connect jdbc:hive2://red1:8443/;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox
Note the sslTrustStore and it’s password are needed only when Knox is configured to use Self-Signed SSL Certificate.
Note
When SSL is used between Beeline & Knox and SSL certificate is either self-signed or less well known one needs to import the certificate into trust store of the JVM used by Knox.
Without this step you will get an SSL error like
“Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:397)
at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:126)
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:437) ”
To avoid the error import the SSL certificate as shown below.
Simplifying Beeline Connection String:
1. Export the Knox Gateway’s certificate to a file
2. Copy the file to the host running Beeline
3. Import the certificate from the file into the JDK’s CACert used to run beeline
Here is an example code, note change the paths for your setup.
sudo /usr/jdk64/jdk1.7.0_67/bin/keytool -exportcert -alias gateway-identity -keystore/var/lib/knox/data/security/keystores/gateway.jks -file knoxred1.crt
Copy knoxred1.crt to a host running Beeline
On the host running Beeline, Import Knox’s cert into CACerts
sudo /usr/jdk64/jdk1.7.0_67/bin/keytool -import -trustcacerts -file knoxred1.crt -alias gateway-identity -keystore /usr/jdk64/jdk1.7.0_67/jre/lib/security/cacerts
After doing this the beeline connection string can be simplified as
beeline> !connect jdbc:hive2://red1:8443/;transportMode=http;httpPath=gateway/default/hive;ssl=true;

Step 6

Then issue any SQL query, and the request will follow the path from Beeline > JDBC over HTTPS to Knox > over HTTP to Hive Server 2
For example, the show tables query results in the following output in HDP 2.1 Sandbox.
knox_2

Hive ODBC Client Configuration

The following screenshot illustrates the ODBC client side configuration needed to get  ODBC > HTTP > Knox > HTTP > Hive Server 2 setup working. You can download the ODBC driver from http://hortonworks.com/hdp/addons/
knox3

Optional : Connection to Knox without SSL

Apache Knox comes with SSL enabled out of box and the instructions so far in the blog provide instructions for connection from a JDBC client to Knox over SSL enabled channel.
To make JDBC/ODBC connection to Knox with out SSL you first need to disable SSL at  Knox  and then change the JDBC URL to not be SSL enabled. Note, with out  SSL connection to Knox, passwords sent over the wire can be sniffed.
Stop Knox
su -l knox -c /usr/lib/knox/bin/gateway.sh stop
Edit Knox Gateway config at /etc/knox/conf/gateway-site.xml (or another location if you install Knox at some non-default location)
Put “ssl.enabledfalse>”  in the gateway-site.xml and re-start Apache Knox
su -l knox -c /usr/lib/knox/bin/gateway.sh start
Run Beeline

beeline> !connect jdbc:hive2://sandbox:8443/;hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive
beeline> show tables;

Conclusion

Apache Knox provides the ability to secure Hadoop’s REST API centrally. With Apache Knox, the REST/HTTP access to Hadoop benefits from centralized authentication, authorization, audit, identity management integration and SSO. In this blog we covered the configuration and steps needed for JDBC & ODBC clients to connect via Knox to Hive Server2.
Knox supports Hadoop cluster access both with & with out Kerberos enabled. Refer to Knox Admin guide for steps to configure Knox access to a Kerberos enabled Hadoop cluster.
If you have any comments or feedback, please email me at vshukla@hortonworks.com or post your questions on our Community Forums under Security.