...
- To avoid scanning the HBase too much, the messaging service should cache recent scans in memory. This is based on the assumption that consumers shouldn't be falling too far behind, which is one of the reason why we want a messaging system to provide faster reaction time.
- Multiple instances of messaging service instances (not part of the initial scope)
- When we reached the point where we need to scale the messaging service to multiple instances, there are couple choices on how it looks like
- All instances are equal. This mean all instances can handle publishing and consumption from all topics
- Different instances need to generate different
seq_id
for the same timestamp to avoid key collision. We potentially can use the instance ID as the first byte in the sequence id.
- (Pro) Easy to implement as there is no coordination needed
- (Con) The consumer scan cache can take up a lot of spaces because all instances would be caching for all topics
- Different instances need to generate different
- Perform resource assignment through the
ResourceCoordinator
such that a given topic would only be handled by a sub-set of instances only- Still need to handle sequence id generation to avoid key collision
- (Pro) Most flexible to have a trade between scalablity vs resource usage (memory cache) and can be tuned per topic based on traffic
- (Con) Implementation can be complicated as it involves
- Usage of
ResourceCoordinator
Usage of DistributedLock
for the topic transition to have a guaranteed leader set for a topic- Traffic forwarding / retry needed during topic transition from one instance to another
- Usage of
- All instances are equal. This mean all instances can handle publishing and consumption from all topics
- When we reached the point where we need to scale the messaging service to multiple instances, there are couple choices on how it looks like
Security
TBA
API
REST
Publish messagesTBA
Programmatic
TBA