Cassandra production scenarios/issues


Production issue:

when we are trying to write a select query with 8 lacks ids “in condition “. then we got faced below issue, 

 

To solve the above exception, we used distributed calls in Java client as shown below,

Few Production configurations in cassandra

RetryPolicy

Three scenarios you can control retry policy for:

  • Read time out: When a coordinator received the request and sent the read to replica(s) but the replica(s) did not respond in time.
  • Write time out:  As above but for writes
  • Unavailable: When the coordinator is aware there aren’t enough replica available without sending the read/write request on

Types of retry polices:

DefaultRetryPolicy

  • Read timeout: When enough replica are available but the data did not come back within the configured read time out.
  • Write timeout: Only if the initial phase of a batch write times out.
  • Unavailable time out: Never

DowngradingConsistencyRetryPolicy

  • Read: If at least one replica responded then the read is retried at a lower consistency.
  • Write: Retries for unlogged batch queries when at least one replica responded.
  • Unavailable: If at least one replica is available then the query is re-tried with a lower consistency.

Reconnection Policy

The reconnection policy determines how often a reconnection to a dead node is attempted. There are 2 types of reconnection policies.

ConstantReconnectionPolicy

  • Fixed delay in between each reconnection attempt.
  • delay(configurable property) should be a floating point number of seconds to wait in between each attempt.
  • max_attempts(configurable propertyshould be a total number of attempts to be made before giving up, or None to continue reconnection attempts forever. The default is 64.

ExponentialReconnectionPolicy (default)

  • Exponentially increases the length of the delay in between each reconnection attempt up to a set maximum delay.
  • base_delay and max_delay should be in floating point units of seconds.

Load Balancing Policy

The load balancing policy determines which node to execute a query on. Below are different types of load balancing policy techniques.

RoundRobinPolicy

Distributes queries across all nodes in the cluster, regardless of what data centre the nodes may be in.

DCAwareRoundRobinPolicy

Similar to RoundRobinPolicy, but prefers hosts in the local datacenter and only uses nodes in remote datacenters as a last resort.

TokenAwarePolicy(default)

LoadBalancingPolicywrapper that adds token awareness to a child policy. This alters the child policy’s behavior so that it first attempts to send queries to LOCAL replicas (as determined by the child policy) based on the Statement‘s routing_key. Once those hosts are exhausted, the remaining hosts in the child policy’s query plan will be used.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.