Wasting time on reddit
So bored and on reddit reading about myki
I guess it’s time to do some hypothetical over-engineering.
First the challenge: that a reasonably well designed system could not handle frequent updates from the number of buses and trams in Melbourne (trains are a different question because updating the stations is easier and harder.)
- Reduce the time it takes to update the card balance.
And some assumptions (some of which are there to make the maths easier for me some to over engineer and some because I just felt they were reasonable.)
Assumptions on operation:
- Everything is fine asynchronous. This allows touch on\ off operations and card charges to be applied from local data removing slow network connections.
- We have less than 5000 buses or trams in Melbourne.
- We don’t care about people who have touched off a tram or people who have failed to touch off more than 90 minutes ago.
- The database is always correct (if the central db isn’t always correct other things are a problem)
- We want the product to respond for trips of more than 1 minute
- We want to aim to be able to respond to 75% of 1 to 5 minutes trips.
- 5+ minutes 99% of the time.
- Buses trains and trams operate 24 hours a day 365.26 days per year. (this is one of the ones for the maths)
Assumptions on tech:
- 8 bits to a byte
- b = bits
- B = Bytes
- each field gets a 16 bit label
- 64 bits for card id 1.84467 x 10^19 total cards
- 24 bits for number of cents ( Allows it to fit in 3 bytes and make maths easier)
- 64 bits for bus\ tram\ station ID (never reused)
- 216 bits for Bus location information. This is overkill by design (it’s roughly enough to assign a few thousands of values to every atom in the solar system) (includes 64 bit time, lat, long fields 8 bits for speed (I doubt buses\ trams would exceed 128 KMH (or 256 if you wanted to just record speed and ignore reverse)) 16 bits for direction)
- We don’t see more than 100 people board a bus or tram in a minute. (this will be the baseline number for all future calculations)
- ignore IP / TCP / UDP overheads
OK. Now what? let’s see if we are going to saturate networks with this amount of data flowing in and out.
Each packet would contain at most (assuming 100 people per min) the following
From the source to central:
376 bits of header 16 bits of array id 16 bits of array length 100 objects 80 bit length 64 bits UUID 16 bits of separator / label Raw Size 8408 bits 1051 Bytes add encryption and assume it doubles the file size 2102 Bytes per request 5000 Sources requesting once per minute just over 10MB/minute 1Mb/s average incoming to central vol
From central to source:
64 bits of source 64 bits of time 100 objects each 120 bits long. 64 bits of UUID 24 bits of balance 2 16 bit separator / labels Raw size 12192 bits 1524 Bytes encryption and assumptions 3048B / response 15MB/minute of traffic 2Mb/s outgoing from central average
So from a connection point of view it’s rather possible to provision that much bandwidth to the server and provide that much bandwidth to the entire network (remember each bus or tram is only going to be needing < 5 KB/s for 1 or 2 seconds per minute)
Now RPM(Request Per <inute) wise is another question.
Taking the earlier assumption of 5000 concurrent devices over the entire network and assuming 1 Request Per Minute (RPM)/source You get 5000 RPM (maths is fun!!). Now 5k RPM is a reasonably difficult target to meet if you are running a wonderful enterprisy app hitting an ACID DB with no caching. But if you split the stuff up into something that looks kinda similar to the below:
Message-queues are used heavily in this they are distinguished by the names in parentheses.
Entry High CPU requirements but other than fairly minimal receives request decrypts header information passed into a message queue (ext) for other systems to use. UUIDs and header parsed and placed in a message queue (ResponseGen) checks for SourceId on message-queue (Responded) encrypts and returns if present if none return 200 (or similar) Response Returner moderate CPU and Memory reads (ResponseGen) passes UUIDs to message-queue (CardLookup) and holds for response formats response once returned \ timeout writes response to (ResPonded) Card Lookup High Memory and High Memory IO reads (CardLookup) and performs a look up on UUID into a memory resident key / value store (100 million records should fit with 0 compression in about 2 GB of ram (actually raw it's just over 1GB but we round up ) ) writes response to (CardLookup)
This system written in a interpretative language such as ruby or python would easily hit the 5k RPM required.
So now on about $20,000 worth of servers we can hit around way more RPM than we require. Of course developer time is going to be a more significant cost as will the required hardware or network links in each of your sources.
TL;DR Doable easily. the only hard part will be getting the feed from the current system.