20. Cash App’s real-time ranking ML system, Meenal & Austin, Block Inc.

https://www.youtube.com/watch?v=8KsPtzj1wa8&ab_channel=Tecton

Cash App is an easy way to send and save money

Use-cases

Use-case

Rewards ranking

Support article ranking

Infra

The intention is to take customer context during the sort phase, to make it more specific

We are going to focus on ranking

Requirements

Each event leads to n item being scored: show relevant reward when the user clicks on the ad

Customer-facing: fast and reliable, target is scoring 50 items < 100ms

Configurable system to support multi use-cases

Basic ML system features

A/B test support

Shadow mode to test in prod background

Logging metrics and results

Options to work on a solution:

A library: business logic is embedded in the use-case itself

pro: reduce latency

cons: tricky for management

Side-car: business logic run in a container along with the product service

same pros and cons

Standalone Service (Weaver) that the product service makes a call to

pro:

easier to bootstrap

easy to manage

easy to onboard

cons

availability

latency

Architecture

Multiple feature stores and model servers, depending on the use-case several are accessible to the user

Also caching metadata for making quick requests in an online environment

Challenges

Feature store service latency

impacted by the backend (is it DynamoDB, Cache)

Model serving

impacted by the network and model evaluation cost

Fanout problem

parallel calls lead to a higher proba of more latency

Testing and optimizations

Quick and dirty solution: slow and unstable

We couldn’t scale, need to adapt all podes

Later on, achieved stability

Latency was optimized

Hardening the network path

Latency had spikes without obvious causes, turn out that the bottleneck was the egress gateway traffic flow

Just avoided this bottleneck to bring latency down to 200ms

⇒ You want to understand to whole network path

Serving from disk vs memory

Switching from DynamoDB to Redis (feature store support both), avg latency drop from 10ms to 5ms

Ready critical when doing fane-out

Hedging request: reduce the impact of tail latency

Idea:

send a request

if no response after a delay, send it again

return the response that completes first while cancelling the other

If the system is overloaded, you might have unexpected bottlenecks

Impact: 1% hedging request bring down the p99 by 20%

The issue with edging: latency can spiral out of control. a safety mechanism is needed to avoid these loops.

Warming up the connection pool

The initial traffic needs to first connect, helpful to improve the stability of the latency

Concurrency tuning

Tricky to find but is key to maximising performances

Are you doing mostly I/O or CPU? (multi-threading vs multi-processing)

Here, removing threadpool reduced CPU usage by 5% and p99 latency by 5ms

Further enhancements

More efficient data structures

Less data to transmit and process, avoid duplication of keys or values

Produce good enough results quickly as opposed to best results slowly

Use a timeout to drop scores that are too slow

Q&A

Did you replace DynamoDB with Redis or was it another layer?

Replaced. We’re also thinking about having a caching layer

Request hedging: how do you determine the timeout?

We monitor the latency of requests in the system so that we can chart an histogram. 

Chose the delay with the distribution.

Redis vs DynamoDB operational cost (Redis is not hosted)?

Yes, it required some additional investment