Stuck on String Matching? UniFrame (uniframe.io) can help

4 minute read

UniFrame (uniframe.io): a String-Matching-as-a-Service

TL;DR

UniFrame (https://uniframe.io https://github.com/uniframe-io) is a String-Matching-as-a-Service product. It provides a ready-to-go string matching solution, and can be used in scenarios like data enrichment, KYC, data deduplication in data-intensive organisations. By using UniFrame, users can make string matching much easier and save a lot of time and effort. Below are the key features of UniFrame:

  • A ready-to-go string matching product with user, data and task management
  • Support both Web and Restful API interactive for business and developer. Can be easily integrated by Python, Excel and any other industry software.
  • Accurate results on varied dataset, optimised algorithm to achieve efficient computation, highly customised configuration
  • Data secured in all-round
  • Can be hosted on your enterprise infrastructure

What is fuzzy string matching and why it is useful

Fuzzy string matching is literally matching a string against a list of strings, and finding the most similar one(s). A string matching is basically comparing strings pair-wisely, computing the similarity scores and picking up the most similar string(s). This animation illustrates how string matching can help to link entities.

Visual illustration of fuzzy string matching (Source: author)
Visual illustration of fuzzy string matching (Source: author)

Basically, when there is no unique key for exact match, fuzzy string matching becomes very useful. Here are some examples in different industry:

  • Cyber-security department in banking: banks are obliged to monitor the names on all transactions to see if these match with international watch lists. String matching can be used to analysed the transaction name and link the party in a transaction with the international watch lists
  • Supply chain department in Fast-moving consumer goods industry: match external dataset with internal customer data
  • Data management department in the Geo Information industry: addresses of point of interest have different ways of representation in different data sources. String matching can help data aggregation.

How can UniFrame help?

String matching can be widely used in many scenarios in enterprise. However, due to constraints from technique complexities and cost insufficiency, it is difficult to deploy a product-ready string matching solution to get benefit for business and organisation from using string matching. However, by using UniFrame (https://uniframe.io), a String-matching-as-a-Service product, users can just simply spin up dedicated string matching tasks and do string matching for their data on cloud or their own enterprise infrastructure environment. Here we pick 4 typical personas whom UniFrame can help.

4 typical personas in enterprise who need string matching. (Source: author)
4 typical personas in enterprise who need string matching. (Source: author)

Business analyst: spin up a string matching task to save half day Excel work

I am a junior business analyst. Every week I spend one day to connect and integrate an external marketing data into our CRM system. After Excel VLOOKUP matching, there are still dozens of thousands rows can not be linked because the customer names are not always consist. I have to search manually to find the best matches, which cost me a whole day. UniFrame can offers a configurable and accurate string matching solution to adapt varied dataset. Data integration will be much simpler.

Data warehouse developer: integrate string matching into data pipeline

I am a lead software developer. We need string matching to enhance the data quality of our product. String matching is not my field. I don’t know how to do it properly. I just want an existing robust solution, and I can integrate it robustly into my data workflow in a programmatical way. UniFrame offers Restful API with bearer token authentication, so that developers can do string matching UniFrame in any programming language.

Data scientist: let UniFrame do the lifting so that I can focus on other add-value tasks

I am a senior data scientist. My manager wants me to pilot a string matching solution for business team. I meet 3 challenges: 1) the data is large scale, the popular string matching algorithm is slow; 2) different dataset has different characteristic, need flexible configuration; 3) there are I am more interested with ML-based solution, and feel boring about the literal string matching. UniFrame offers multiple matching algorithms to balance accuracy and speed. Using UniFrame string matching result as a feature and input to your ML model.

Enterprise data department manager: reduce time-to-market to build a centralised string matching service for the organisation

I am the senior manager of Enterprise Data Analytics team. Our team wants to build a centralised string matching solution since it has been requested cross multiple organisations. Build a functionable, stable and secure string matching system with sufficient user, data, task management functionalities is not an easy task. I need to invest at least 4 FTE (0.5 UX + 1 frontend + 1 backend + 1 data scientist + 0.5 Devops) for half a year. UniFrame offers ready-to-go SaaS solution in enterprise environment deployment to reduce time-to-market.

How to use UniFrame

Users can upload dataset, create and run string matching tasks via website UI as the videos below.

We also support interactions via RESTful API. Here is an example of using Python and Excel VBA.

Summary

UniFrame will make your string matching never be so easy. It provides an accurate, fast, configurable, ready-to-go string matching product, and it can be visited via web UI or RESTful API. The application is secure in all-round and can be deployed in the enterprise cloud infrastructure. Free use can upload up to 5MB dataset and create a small task with 0.2 CPU and 150 MB memory without any cost. Interested? Start using it with this demo account info+demo@uniframe.io with password 123456 to string matching to 1M company dataset, or create a free account to use the full UniFrame string matching functionalities!

Update on Nov 30th

I do not maintain the website anymore. Most code are open sourced in https://github.com/uniframe-io. It is is subject to a Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0). Please contact me via info@uniframe.io for any commercial usage.