1 April 2013

883 words 5 mins read

Wasting time on reddit

So bored and on reddit reading about myki

I guess it’s time to do some hypothetical over-engineering.

First the challenge: that a reasonably well designed system could not handle frequent updates from the number of buses and trams in Melbourne (trains are a different question because updating the stations is easier and harder.)

Goals:

  • Reduce the time it takes to update the card balance.

And some assumptions (some of which are there to make the maths easier for me some to over engineer and some because I just felt they were reasonable.)

Assumptions on operation:

  • Everything is fine asynchronous. This allows touch on\ off operations and card charges to be applied from local data removing slow network connections.
  • We have less than 5000 buses or trams in Melbourne.
  • We don’t care about people who have touched off a tram or people who have failed to touch off more than 90 minutes ago.
  • The database is always correct (if the central db isn’t always correct other things are a problem)
  • We want the product to respond for trips of more than 1 minute
  • We want to aim to be able to respond to 75% of 1 to 5 minutes trips.
  • 5+ minutes 99% of the time.
  • Buses trains and trams operate 24 hours a day 365.26 days per year. (this is one of the ones for the maths)

Assumptions on tech:

  • 8 bits to a byte
  • b = bits
  • B = Bytes
  • each field gets a 16 bit label
  • 64 bits for card id 1.84467 x 10^19 total cards
  • 24 bits for number of cents ( Allows it to fit in 3 bytes and make maths easier)
  • 64 bits for bus\ tram\ station ID (never reused)
  • 216 bits for Bus location information. This is overkill by design (it’s roughly enough to assign a few thousands of values to every atom in the solar system) (includes 64 bit time, lat, long fields 8 bits for speed (I doubt buses\ trams would exceed 128 KMH (or 256 if you wanted to just record speed and ignore reverse)) 16 bits for direction)
  • We don’t see more than 100 people board a bus or tram in a minute. (this will be the baseline number for all future calculations)
  • ignore IP / TCP / UDP overheads

OK. Now what? let’s see if we are going to saturate networks with this amount of data flowing in and out.

Each packet would contain at most (assuming 100 people per min) the following

From the source to central:

376 bits of header
16 bits of array id
16 bits of array length
100 objects
    80 bit length
        64 bits UUID
        16 bits of separator / label

Raw Size 8408 bits 1051 Bytes
add encryption and assume it doubles the file size
2102 Bytes per request
5000 Sources requesting once per minute
just over 10MB/minute
1Mb/s average incoming to central vol

From central to source:

64 bits of source
64 bits of time
100 objects each
    120 bits long.
        64 bits of UUID
        24 bits of balance
        2 16 bit separator / labels

Raw size 12192 bits 1524 Bytes
encryption and assumptions
3048B / response
15MB/minute of traffic
2Mb/s outgoing from central average

So from a connection point of view it’s rather possible to provision that much bandwidth to the server and provide that much bandwidth to the entire network (remember each bus or tram is only going to be needing < 5 KB/s for 1 or 2 seconds per minute)

Now RPM(Request Per <inute) wise is another question.

Taking the earlier assumption of 5000 concurrent devices over the entire network and assuming 1 Request Per Minute (RPM)/source You get 5000 RPM (maths is fun!!). Now 5k RPM is a reasonably difficult target to meet if you are running a wonderful enterprisy app hitting an ACID DB with no caching. But if you split the stuff up into something that looks kinda similar to the below:

Message-queues are used heavily in this they are distinguished by the names in parentheses.

Entry
    High CPU requirements but other than fairly minimal
    receives request decrypts
        header information passed into a message queue (ext) for other systems to use.
        UUIDs and header parsed and placed in a message queue (ResponseGen)
        checks for SourceId on message-queue (Responded)
        encrypts and returns if present if none return 200 (or similar)

Response Returner
    moderate CPU and Memory
    reads (ResponseGen)
        passes UUIDs to message-queue (CardLookup)
        and holds for response
    formats response once returned \ timeout
    writes response to (ResPonded)

Card Lookup
    High Memory and High Memory IO
    reads (CardLookup) and
        performs a look up on UUID into a memory resident key / value store
            (100 million records should fit with 0 compression in about 2 GB of ram
                (actually raw it's just over 1GB but we round up ) )
        writes response to (CardLookup)

This system written in a interpretative language such as ruby or python would easily hit the 5k RPM required.

So now on about $20,000 worth of servers we can hit around way more RPM than we require. Of course developer time is going to be a more significant cost as will the required hardware or network links in each of your sources.

TL;DR Doable easily. the only hard part will be getting the feed from the current system.