From charlesreid1

Revision as of 18:43, 15 September 2017 by Admin (talk | contribs) (→‎Resources)

In Greek mythology, Cassandra was the daughter of King Priam and Queen Hecuba, of Illiad fame. Apollo granted her the power of prophecy in order to seduce her, and cursed her to never be believed when she rejected him.

It's also software by Apache designed to handle large amounts of data in a scalable way. Intended for cloud and cluster solutions/applications. Used by companies like Netflix.

About Cassandra

Cassandra is a scalable database tool. It is designed for extremely large data sets.

NoSQL

Cassandra is a NoSQL database technology. It was originally intended to implement the Bigtable concept (published by Google in 2006). The Bigtable concept goes beyond relational databases, which are difficult to scale to extremely large data sets, and instead is built on the idea of a "sparse, distributed, persistent multidimensional map".

The fact that Cassandra is NoSQL means the database cannot be modified using SQL. However, Cassandra implements a "language" for querying the database called CQL; it is intended to lower the barrier for SQL developers to understand and interact with the Cassandra database. CQL can access/read the database, but cannot change or manipulate its contents.

CQL

The Cassandra Query Language (CQL) is a query language specific to Cassandra that looks a lot like SQL.

Link to CQL documentation: https://cassandra.apache.org/doc/latest/cql/index.html#cql

Difference between CQL and SQL

In spirit, CQL and SQL are designed to be similar. They both use similar syntax - "select things where condition". The entire point of adding CQL to Cassandra was to make it easier for developers familiar with SQL to be able to pick up CQL and deal with Cassandra databases more easily.

However, they are fundamentally different because CQL sits on top of Cassandra, a NoSQL "database" that is actually designed as a big key-value map, and SQL sits on top of a traditional relational database.

They are also different in practice - there are some operations that CQL does not support. CQL does not support the following:

  • No joins
  • No subqueries
  • No transactions
  • WHERE conditions can only be applied to columns where an index has been created on that column (unless the column is a primary key)

The following operators:

  • No OR operator
  • No NOT operators
  • The AND operator is the only operator available

Types

Cassandra supports several data types:

  • native types
  • collection types
  • user-defined types
  • tuple types
  • custom types

Basic Usage

Installing

The documentation covers installation instructions: https://cassandra.apache.org/doc/latest/getting_started/installing.html

These look exactly like you'd expect - extract and build from source, or use a pre-built package from an aptitude repository.

Cassandra requires Java and Python.

Cassandra works a bit like Hadoop - you set a configuration file that sets options (such as, how much replication of data should Cassandra use, and where are the various data nodes going to live, etc.).

Once you start Cassandra, it uses something called the "snitch" to figure out how to efficiently route requests over all of the nodes that comprise the Cassandra "data universe". Like Hadoop, Cassandra divides nodes by data centers and racks.

Link to configuration file options: https://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html

Link to info about snitch: https://cassandra.apache.org/doc/latest/operating/snitch.html

Command Line Client

Cassandra can be used from the command line via the cqlsh tool, which provides a Cassandra shell from which CQL commands can be executed.

Like MySQL, you start up the command line program, which starts a client that tries to connect to a server instance. To connect to a local instance, specify localhost as the host:

$ bin/cqlsh localhost
Connected to Test Cluster at localhost:9042.
[cqlsh 5.0.1 | Cassandra 3.8 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh>

You can specify various command line options (e.g., username/password, script file, startup file, etc.)

A simple hello world query in CQL:

cqlsh> SELECT cluster_name, listen_address FROM system.local;

 cluster_name | listen_address
--------------+----------------
 Test Cluster |      127.0.0.1

(1 rows)

Full cqlsh documentation: https://cassandra.apache.org/doc/latest/tools/cqlsh.html#cqlsh

Using Other Languages

You can interface with Cassandra from a number of different languages (Java, Python, Node, Scala, Go, Rust, etc.). Full list of client drivers: https://cassandra.apache.org/doc/latest/getting_started/drivers.html

DataStax (company that contributes to Cassandra development) provides several good drivers in various languages:

Using on Google Cloud

Google Cloud offers several pre-configured virtual machine instances that run Cassandra, including one created by Google, and one created by Bitnami. See Google Cloud/Cassandra

Resources

DataStax provides managed cloud solutions involving Cassandra (also involved in developing/contributing to Cassandra): https://www.datastax.com/

DataStax training on Cassandra: https://academy.datastax.com/