We’ve been working hard behind the scenes to improve our merchant categorisation models used by our Enrich API. This means the results from any transactions analysed by the API have wider coverage (i.e.: number of merchants recognised) and increased accuracy (i.e.: the percentage correctly categorised).

ML algorithms enhanced with NER

For those unfamiliar with Enrich, we apply advanced Machine Learning (ML) algorithms to payment transaction data to identify the merchant’s name, location and detailed industry classification (e.g.: ANZSIC Level 4).

We have recently added NER (Named Entity Recognition), sometimes known as “entity chunking”, to our machine learning engine. This analyses each transaction and enhances it with multiple attributes such as category, merchant and location. NER is used to identify and classify individual words or groups of words that refer to the same thing (e.g.: company name or location) in large volumes of text.

A unique strength of Basiq’s categorisation engine is that we first standardise and aggregate transaction data from each institution the customer has connected to, before sending it to the model. By implementing NER, we are further improving the quality of the search data before it is submitted to the main classification model.

This approach is key in maintaining our market leading accuracy as each institution describes transactions in their own way, so that a purchase from the same cafe could look completely different depending on the card used. The growing number of payment providers and “buy now pay later” options is compounding this.

BPAY Biller codes added to our data sources

We’ve also improved our ability to classify and categorise BPAY transactions. Our models are supported by a number of data sources that validate and build confidence in the results. Rather than just transaction strings to train our models, we also use these additional datasets (e.g.: merchant IDs from a large number of institutions).

By adding over 17,000 BPAY biller codes to our engine, we now have an authoritative data source to validate any potential code identified in a transaction, and compare it to any other text in the description. For example, if a transaction description included “BPAY ENRGAUHS 115667”, the model can validate the biller as “Energy Australia Home Services” by first identifying the 6 digit number “115667” as a potential BPAY code, searching the BPAY database and comparing the results against “ENRGAUHS”.

These are just two examples of the continuous improvements we are making to our ML engines that are delivering greater insights to our partners and their customers.

Find out more by visiting our Enrich page or check it out for yourself by signing up for a free trial.