ENTRADA data analysis is performed with standard SQL syntax which follows the SQL-92 standard.
There are multiple ways to submit queries to ENTRADA. Below are four examples that serve different personal preferences and use cases.
The Hue web interface provides an easy access to the data stored in ENTRADA. It enables the user to submit SQL queries directly in a query editor and allows exploring the data interactvely. Based on the results, Hue can create simple charts for a quick analysis. Additionally, data can be exported to other formats like csv or xls.
You can submit commands for setting up tables, inserting data and querying tables through a shell as well. Therefore, the impala-shell is provided. Besides submitting SQL commands, the shell also lets you run shell-only commands for tuning the performance and running diagnostics. impala-shell can be invoked from inside a shell script as well.
A more comfortable way to submit SQL commands to impala from a python script is using the library impyla. Below, there is an example on how to connect to impala, submit a query and handle the response:
from impala.dbapi import connect conn = connect(host='myhost.example.nl', port=21050) impala_cursor = conn.cursor() impala_cursor.execute(''' select qname, src from dns.staging limit 100 ''') rows = impala_cursor.fetchall() for row in rows: print rows
In case you want to authenticate at your server using Kerberos, add the auth_mechanism parameter to the connect function:
conn = connect(host='my.host.nl', port=21050, auth_mechanism="GSSAPI")
Impyla further supports python pandas which is a data structure to handle and analyze large data sets easily. If you want to load the results of a query directly into a panda data structure, modify the code as follows:
from impala.dbapi import connect from impala.util import as_pandas import pandas as pd conn = connect(host='myhost.example.nl', port=21050) impala_cursor = conn.cursor() impala_cursor.execute(''' select qname, src from dns.staging limit 100 ''') df = as_pandas(impala_cursor)
With the help of a JDBC driver, ENTRADA can be queried directly from Java programs as well. After following the instructions here, one can connect to Impala using:
With Kerberos authentication use: