/
...
/
/
20. Cash App’s real-time ranking ML system, Meenal & Austin, Block Inc.
Search
Duplicate
Try Notion

20. Cash App’s real-time ranking ML system, Meenal & Austin, Block Inc.

Cash App is an easy way to send and save money
Use-cases
Use-case
Rewards ranking
Support article ranking
Infra
The intention is to take customer context during the sort phase, to make it more specific
We are going to focus on ranking
Requirements
Each event leads to n item being scored: show relevant reward when the user clicks on the ad
Customer-facing: fast and reliable, target is scoring 50 items < 100ms
Configurable system to support multi use-cases
Basic ML system features
A/B test support
Shadow mode to test in prod background
Logging metrics and results
Options to work on a solution:
A library: business logic is embedded in the use-case itself
pro: reduce latency
cons: tricky for management
Side-car: business logic run in a container along with the product service
same pros and cons
Standalone Service (Weaver) that the product service makes a call to
pro:
easier to bootstrap
easy to manage
easy to onboard
cons
availability
latency
Architecture
Multiple feature stores and model servers, depending on the use-case several are accessible to the user
Also caching metadata for making quick requests in an online environment
Challenges
Feature store service latency
impacted by the backend (is it DynamoDB, Cache)
Model serving
impacted by the network and model evaluation cost
Fanout problem
parallel calls lead to a higher proba of more latency
Testing and optimizations
Quick and dirty solution: slow and unstable
We couldn’t scale, need to adapt all podes
Later on, achieved stability
Latency was optimized
Hardening the network path
Latency had spikes without obvious causes, turn out that the bottleneck was the egress gateway traffic flow
Just avoided this bottleneck to bring latency down to 200ms
⇒ You want to understand to whole network path
Serving from disk vs memory
Switching from DynamoDB to Redis (feature store support both), avg latency drop from 10ms to 5ms
Ready critical when doing fane-out
Hedging request: reduce the impact of tail latency
Idea:
send a request
if no response after a delay, send it again
return the response that completes first while cancelling the other
If the system is overloaded, you might have unexpected bottlenecks
Impact: 1% hedging request bring down the p99 by 20%
The issue with edging: latency can spiral out of control. a safety mechanism is needed to avoid these loops.
Warming up the connection pool
The initial traffic needs to first connect, helpful to improve the stability of the latency
Concurrency tuning
Tricky to find but is key to maximising performances
Are you doing mostly I/O or CPU? (multi-threading vs multi-processing)
Here, removing threadpool reduced CPU usage by 5% and p99 latency by 5ms
Further enhancements
More efficient data structures
Less data to transmit and process, avoid duplication of keys or values
Produce good enough results quickly as opposed to best results slowly
Use a timeout to drop scores that are too slow
Q&A
Did you replace DynamoDB with Redis or was it another layer?
Replaced. We’re also thinking about having a caching layer
Request hedging: how do you determine the timeout?
We monitor the latency of requests in the system so that we can chart an histogram.
Chose the delay with the distribution.
Redis vs DynamoDB operational cost (Redis is not hosted)?
Yes, it required some additional investment