Feature engineering for thin transactional data
There are over 250 million people in the world who live outside their country of birth. Many of these people rely heavily on mobile financial services, and according to the GSMA, person-to-person (P2P) transfer, airtime top-up and international remittance are the most commonly used mobile money services worldwide. Enabling these transfers and top-ups globally is a core part of DT One’s business. We are integrated into the payments systems of over 600 telecom companies and Mobile Network Operators in 160 countries. Our customers are businesses around the world (telecom companies, remittance companies, retail shops, and others) who integrate with our platform so that their users can make these transfers.
One of the critical advantages of this service for many of our end-users is that the bar for accessing and using this service is extremely low. Unlike most other digital financial services, many of the channels through which our service is reached require nothing more than a prepaid mobile phone number. No lengthy Know Your Customer process (KYC), no difficult identity verification, or financial transaction authorisation. Accessibility is a core value proposition of airtime remittance top-ups.
On the other hand, this also means that we often have low identity resolution for our end-users. We are dealing with transactional data and often the only attribute that identifies a user is his or her phone number and the recipient phone number she is sending value to. Our customers usually offer additional services and value propositions to these users, beyond airtime top-ups, and ideally these services and offerings would be personalised for different segments. For example, women from Southeast Asia working as domestic helpers in the Gulf states probably have different needs and interests than men from South Asia working in the construction industry in the Gulf. While we work diligently with our partners on campaigns and incentives to encourage the transfer of value for our end-users and to identify other effective value propositions for them, such endeavours become challenging without a good sense of the user profile. In other words, we need to help our customers identify what kind of personas populate our shared user base in order to create and manage more relevant and effective campaigns.
Fortunately, we can use creative feature engineering to overcome some of the limitations of thin data and to enable better services for our customers and end-users. That’s what this post is about.
One way of generating implicit information from our data is using one-hot encoding. Most attributes of a user’s transaction are drawn from a defined set of possibilities. Let’s say we’re working with a country in the Gulf, where we have two telco customers, Telco A and Telco B. A single user of one of these telcos could use the service to send a particular top-up to ~60 different destination countries. A user might also send to different countries over time (perhaps indicating a reseller, or a recycled number). So we could describe a ‘top receiving country count’ across all of a user’s transactions. But that throws away information on other countries the user had sent to that could eventually become important. A more comprehensive approach would be to describe a count for every possibility on each user. Consider the following two users and their transactions:
We can expand all possibilities on the receiving country of the transaction the following way:
With ~60 countries, we can imagine each user being accounted for all possibilities via 60 columns each describing a country name. How is this useful? Perhaps we might find through further analysis that there are correlations in the countries that multi-destination users send to - a user who sends to India and Nepal also tends to send to Bangladesh, for example. This is valuable information that potentially enables new decision rules on what sort of incentives might be suitable for a subset of users in a campaign.
Now, taking this approach we can get creative and expand many more features. Here is an example:
Here we have created new features based on a combination of country and product (the kind and amount of top-up the user sent). The motivation for this would be a hypothesis that users who top-up different amounts actually have different profiles even when they are sending to the same country. A user who sends smaller amounts every time could indicate an opportunistic user, who might be more responsive to promotional campaigns. This hypothesis may turn out to be incorrect, but we won’t know if we ignore or overlook the data. So we suspend judgement on defining a user’s most important attributes and instead characterize them in as many ways as possible. And then we let the results of campaigns and other interventions show us which ones are actually important. We want to learn from as much of our data as possible.
Another particularly useful feature for describing a user on his or her pattern of transactions is entropy, which describes the amount of uncertainty for the set of attributes of a user’s transactions. For example, there are 7 days in a week and we might label each transaction from a user according to the day of the week it belongs to. If a user has 10 transactions in total and all of them fall on a Sunday, the user has low entropy on ‘day of the week’ because, knowing nothing else, our best bet is that his or her next transaction will likely be on a Sunday. However, if the pattern of transaction looks more like the following:
Mon: 2, Tue: 2, Wed: 2, Thur: 2, Fri: 2, Sat: 0, Sun: 0
Such a user has high entropy on day of the week because we are more uncertain over which day his transaction is likely to fall on. Now if we were to group the transactions according to weekday vs. weekend, the user reverts to low entropy as we are certain that his past transactions fell on a weekday. Note that this is different from just capturing the total count of unique attributes. For example, for a user with the following pattern of transactions:
Mon: 100, Tue: 1, Wed: 1, Thur: 1, Fri: 1, Sat: 0, Sun: 0
Such a user also has 5 unique labels on day of week, but in fact has low entropy on day of week because most of the transactions are happening on Monday. In short, entropy is an efficient way to capture regularities in a user’s pattern of transactions, something that a summary measure like total or mean cannot achieve.
Temporal and Switch Entropy
Two really useful measures of entropy that result from this approach are:
- temporal entropy: how evenly distributed are the user’s transactions across time? For each user, transactions are binned into weeks up to the present and the entropy on this distribution is computed. Higher entropy indicates a more regular distribution of transactions.
- switch entropy: how often does the user switch recipients? For each user, we look at a sliding window of two transactions at a time in sequence, measure the entropy across recipients within this window and average this entropy across all of the user’s transaction windows.
Figure 1 shows a sample of 100 users from a certain customer, who started their top-ups in May ‘19, having between 2-20 transactions to date, grouped by categories of temporal and switch entropy.
Figure 1. Sample of 100 users grouped by levels of temporal and switch entropy. Each dot is a top-up and each color is a unique recipient phone number
There appears to be a natural segmentation of users according to these two dimensions of entropy, as they capture two important attributes related to each user’s pattern of transactions: regularity over time and consistency of receiving number. Going clockwise, from upper left:
- Users low on both temporal and switch entropy appear to be intermittent or irregular users, who sometimes recycle to other intermittent users. Most users (~50%) in the sample belong to this category.
- Users low on temporal entropy but high on switch entropy appear to be similar to (1) but on a compressed timescale, meaning the receiving numbers churn much more quickly. These are potentially our lowest-value users in terms of durability of transactions.
- Users high on both temporal and switch entropy appear to include resellers: frequent transactions but to multiple receiving numbers repeatedly across time.
- Users high on temporal entropy and low on switch entropy appear to be users who transact regularly, including numbers that recycle to other regular users. These are potentially our highest-value users, but also the rarest.
Many promising lines of inquiry can arise from this approach. We gain some clarity about what our highest-value users look like. More importantly, we can begin to ask what might induce users from the other quadrants to move towards the highest-value quadrant. We also gained a quick way to identify potential resellers, who might be less responsive to marketing campaigns. Even better, we might be able to run experiments to test what marketing strategy best improves the transaction rates for each type of user.
We have attempted to describe ways to meaningfully characterize our users using feature engineering techniques. While sometimes transactional data is initially thin, there are ways to be creative in feature engineering that allow for more insightful and productive uses of the data than first appeared possible. Equipped with these features, we gain a deeper view of our users, enabling us to work better with our customers and partners to identify the most promising segments for a campaign or new product/value initiative.