Understanding the Cassandra database management system

Cassandra is an open source distributed database management system that is free and open source. It has a large column stores, NoSQL database to handle massive amounts of data on multiple servers that provide high availability and one source of failure. This system was written using Java and is developed by Apache Software Foundation.

Avinash Lakshman & Prashant Malik originally developed Cassandra within Facebook to enable inbox search. Facebook Inbox Search feature. Facebook made available Cassandra as an open source project based on Google codes in July of 2008. In March 2009, it was made an Apache Incubator project , and in February 2010 , it became an official project. Due to its impressive technical characteristics, Cassandra is a huge hit.

Apache Cassandra client is used to manage large quantities of structure information spread all over the globe. It is a highly-available service that has the ability to fail at any point. Below are a few points that are part of Apache Cassandra:

It can be scaled as well as fault-tolerant and reliable.
It is a column-oriented database.
The distributed design of the system is inspired by the model of Amazon’s Dynamo as well as its model for data is based off the Google Big table.
It was developed by Facebook and is quite different from traditional database management systems.

Cassandra utilizes a Dynamo-style replicate model that does not have a single failure point, however it adds a more robust “column family” data model. Cassandra is used in a variety of largest corporations like Facebook, Twitter, Cisco, Rackspace, eBay, Netflix and many more.

The primary goal of Cassandra is to manage large data loads across multiple nodes , without a single source of failure. Cassandra features a peer-to peer distributed system throughout its nodes. Data is distributed across all nodes in the cluster.

All nodes of Cassandra in a cluster serve the same function. Each node is distinct, but however, it is also connected by other nodes. Every node within the cluster is able to accept writes and read requests no matter where data actually situated within the cluster. When a node fails it can allow read or write requests to be fulfilled by other nodes of the network.

Characteristics of Cassandra:

Cassandra has gained popularity due to its technological characteristics. Here are a few characteristics of Cassandra:

Easy data distribution –
It allows you to transfer data to wherever you require by redistributing data across several data centers.
Examples:
If there are 5 nodes such as N1, N2 and N3 5, N4, and using the partitioning algorithm we will determine the range of tokens and then distribute data according to that range. Each node will have a distinct token ranges within which data will be distributed.

Flexible data storage –
Cassandra supports all possible data formats, including semi-structured, structured, as well as unstructured. It is able to dynamically adapt to your data structure according to the requirements of your business.

Scalability elastic —
Cassandra is extremely flexible and can be expanded to include additional hardware to support additional customers and data according to the requirements.

Fast writing Fast writes
Cassandra was created to run on low-cost common hardware. Cassandra is a lightning fast write and is able to store hundreds of Terabytes of datawithout sacrificing efficiency of reading.

Always on Architecture
Cassandra does not have a single point of failure and is available to business critical applications that aren’t able to afford to fail.

Fast linear-scale performance –
Cassandra is scalable linearly, so it can increase the speed of your network by increasing the amount of servers within the cluster. It maintains a quick response time.

Support for transactions Transaction support
Cassandra has properties such as Atomicity Isolation, Consistency and the Durability (ACID) property of transaction.