Rittman Analytics

View Original

Monzo, BigQuery, Looker and “Fintech”: The Other London Tech Startup Scene

I’ve written a number of blogs over the past few months about bringing my fitness, smart devices and other digital data sources into Google BigQuery and Looker to do interesting analysis with the combined dataset, whilst also keep my tech skills up to date and relevant for my day job as Product Manager for Analytics at Qubit, a London tech startup in the hot new MarTech (“Marketing Technology”) space.

At Qubit we also use Google BigQuery and Looker to enable our customers to run fast, interactive queries against petabytes of consumer behavioural data whilst BigQuery abstracts away the hassles in running a production data warehouse, whilst Looker abstracts away the complexities of SQL, joining datasets together and trying to visualize it all in D3 or RStudio.

One big gap in the data I’d managed to collect up until recently was anything to do with what I actually spent each day; whilst my current bank along with most others provides a means to manually download statement data in Excel or CSV format, or supports commercial services that load that data into your personal finance software for a monthly fee, I wanted to access my data in the same way that I accessed my health, timekeeping and cycling workout data … through a REST API, cURL and a cron job that loads new data into BigQuery on the hour every day.

So that’s why I smiled when I heard about Monzo, another tech startup just a mile or so from my office in London that’s disrupting another industry: consumer banking, with Monzo, Starling and other new app-only banks taking advantage of new regulations such as the EU’s new Payment Services Directive (PSD2) that open up existing bank accounts to authorised third-parties looking to provide a better user experience (UX) over your existing bank arrangements; or as in Monzo’s case, actually obtaining a banking license and offering a complete end-to-end service that focuses on solving customer problems rather than you providing a cheap source of money to lend for the bank.

Along with other technology-led startups working in the financial services space Monzo are the best example yet of a compelling new financial services company with a strong appeal to consumers more familiar today with Facebook than chequebooks.

Image Courtesy of Medici, “Awe-Inspiring London FinTech Industry Is Firing on All Cylinders”

Me, I was more interested in the fact that Monzo also use Looker and Google BigQuery to run their analytics infrastructure, and that they also offer a developer-orientated REST API that provides exactly what I’ve been looking for to enable myself and increasingly others in the developer community to download, analyze and visualize their spending data and start to build applications that when the full Monzo API comes out will revolutionize how we do our banking in the coming years.


To start analyzing my banking data through Monzo I first needed to do a historic batch download of all my historic transactions and top-ups and put that into a BigQuery table, and then setup a trigger on Monzo’s systems that sends across all the subsequent transactions as they happen to keep my data in-sync with Monzo’s record of my bank account.

To set up the initial transaction history transfer I first registered at https://developers.monzo.com/ and then generated an access token for my original pre-paid Monzo card and the current account card that recently replaced it. Note that this access token only lasts for a short time so you can’t generate it just once and use it in a script forever, and more importantly the developer API is meant just for personal use and can’t be used to access other people’s accounts, so you can’t build one of the new apps made possible by the UK’s new open banking regulations just yet (but this is coming soon through their interim AIS API now in testing with a small number of third-party developers)

To find out the account IDs for your Monzo accounts either use the API playground web app as I’ve done in the screenshot above, or use the Developer API now for the first time to get them yourself, via the cURL tool:

curl -G “https://api.monzo.com/accounts" -H “Authorization: Bearer <<YOUR_ACCESS_TOKEN_HERE>>" ./accounts.json

and then retrieve your account IDs from the JSON output through a tool such as jsonpp (brew install jsonpp if you’re on a Mac, similar tools for other OSs)

{
"accounts": [
{
"id": "acc_<<MY_PREPAID_CARD_ID>>"
"created": "2017-04-06T07:07:15.175Z",
"description": "Mark Rittman",
"type": "uk_prepaid"
},
{
"id": "acc_<<MY_CURRENT_ACCOUNT_ID>>"
"created": "2017-11-28T08:35:28.323Z",
"description": "Mark Robert Rittman",
"account_number": "<<MY_AC_NUMBER>",
"sort_code": "<<MY_SORTCODE>>",
"type": "uk_retail"
}
]
}

Then you can retrieve all your account transactions with another cURL request, like this one for my current account:

curl -G "https://api.monzo.com/transactions" -d account_id=acc_<<MY_CURRENT_ACCOUNT_ID>> -H "Authorization: Bearer <<YOUR_ACCESS_TOKEN_HERE>>" > /transactions.json

and then you can use tools such as “sed” (to strip out the transactions: []) array from the json output) and one of the many json2csv converters out there (for example this one that has npm as a prerequisite, something I’m guessing most people attempting this sort of project will have already, if not then first install npm via brew and you’re ready)

sed -i -e 's/{"transactions"://g' ./transactions.json
sed -i -e 's/]}/]/g' ./transactions.json
brew install json2csv
json2csv -i ./transactions.json -f created,amount ,description,currency,category,include_in_spending,<<ANY_OTHER_REQUIRED_FIELDS>> -o transactions.csv

There are many more fields available to extract from the JSON documented downloaded via the developer API depending on whether you’re accessing a Monzo prepaid card or current account; the developer docs on the Monzo site go through some of them but you’re best going through a handful of transactions to see what’s available and then create your own custom -f extract list.

Then as the final step I use the bq command-line tool to load the data in the CSV file output in the previous step into a BigQuery table, like this:

bq load --source_format=CSV --replace=true --skip_leading_rows=1 --autodetect dw_staging.monzo_transactions_stg ./transactions.csv

and then go into the BigQuery Web UI to check my transaction data has come in as expected.

In-practice for my analysis exercise I brought in a lot more fields including merchant name and map X/Y location, Foursquare metadata and other fields that are only available for the current account product, so check through the JSON elements available for your particular Monzo product and extract the ones that most interest you.


So what about capturing transactions after this date, how do we do that without some complicated process that downloads all the transactions again and then applies just the ones that are new to the BigQuery table? The way I bring in all my other app, smart device and other event data into BigQuery is to either post them directly to a fluentd server running on Google Compute Engine or use services such as IFTTT and Zapier that interface with my apps and then post to fluentd using outgoing webhook services, or their “Maker Channel” as IFTTT used to call their service when I created for a blog post back in 2016.

The Monzo Developer API also comes with a webhook feature that lets you register a URL, for example the one use when posting all my other event data to fluentd, with all subsequent transaction data then being posted to that server URL and then onto the various BigQuery tables that store my data for analysis.

In the case of the Monzo transactions I added an intermediate step and had the Monzo webhook post to a Zapier “zap” that made it easier to decode the nested JSON elements Monzo was sending over that held merchant and other payment processor-related information.

Zapier then sends over just those elements of the transaction I’m interested in to the fluentd server which then streams them into BigQuery using the fluentd BigQuery plugin. From there I can then write SQL to analyze this data in BigQuery’s web UI, remembering to divide the transaction amount by 100 as they’re all stored as pence in integer datatypes within the Monzo system, and to remove any top-ups, declines and and other non-spend transactions from the results by filtering on the “include_in_spending” column to remove all records with “false” as the predicate value.


Analyzing the data in a tool such as Looker makes things much more interesting as I can start to make use of the locational metadata for merchants that Monzo provide for current account transactions, showing me for example where I took out cash from ATMs on the way back home from the pub and then forgot I’d done so the morning after.

Or I can see how my spend on essentials like entertainment and eating out rises in the summer, especially when I appear to do away with non-essentials like food and other groceries.

I can see where it was I spend the most money when out on cycling trips in the summer, mostly in pubs around the New Forest and up near Bath, Frome and Warminster (and how I wish winter would end and I can get back out on my bike again).

I can even work out how my choice of restaurant and entertainment changed from lots of spend on music in August through to lots of eating out in October.

And more importantly, I can see I’m spending far too much money on coffees and breakfast on the way into work each day, but that’s the downside I guess of working just off Covent Garden in what must be one of the nicest work locations I’ve ever had the privilege to base myself out of.

The beauty of Looker is that I can define a data model in LookML that takes table structures and SQL expressions and turns them into more user-friendly measure and dimension definitions, like this:

These LookML “views” can then be joined to other views that share a common join key — in my case, the truncated “date” part of a date and time timestamp field — that then makes it possible for me to analyze spend data in the context of where I was on that day and that time; whether I was out on the bike and if so whether I’d had a good workout or not so good; what the weather was like on the day and what I’d been chatting about on Slack at the time; whether the spend was for an Uber ride and if so where to.

Analysis that shows me, now I’m finally able to join my financial and activity dots together for the first time, where all my money’s gone since the nights got darker and I’m clearly in the pub rather than out on my bike. Lucky spring is around the corner or I’ll be applying for Monzo’s new overdraft service that’s just gone into review rather than registering new Strava personal records next month.