Cassandra deletion best practice -


we have real time data coming in our system. have online queries need serve. in order serve these online queries need doing pre-processing of data can serve faster. query how preprocess online real time data. there should way me figure out if data processed or not. in order find difference, have following approaches:

  • i can have flag says data processed or unprocessed, based on can further take decision process or not
  • i can have column family can insert data with ttl, , topic in message bus kafka gives me row identifier in cassandra can process row in cassandra
  • i can have column family per day , topic in message bus kafka gives me row identifier of corresponding column family
  • i can have keyspace per day , topic in message bus kafka gives me row identifier of corresponding column family

i read if, number of deletions increases, number of tombstones increases , result in slow query times. confused approach have chose among above 4 or there better way solve this?

according datastax blog third option might better fit. cassandra anti-patterns


Comments