Football Analytics - Association Rule Mining (ARM)
Overview
Association Rule Mining (ARM) is a machine learning technique used to uncover relationships between items in large datasets. It is widely applied in market basket analysis, recommendation systems, and other data-driven decision-making processes.
A simple view or clustering and ARM.
Key measures used in ARM include:
- Support: Measures how frequently an item or itemset appears in the dataset.
- Confidence: Indicates the likelihood that an item Y appears in a transaction given that item X is already present.
- Lift: Evaluates how much more likely Y is to appear with X compared to random chance.
- Lift > 1: Items are positively correlated (strong association).
- Lift = 1: No correlation.
- Lift < 1: Items are negatively correlated.
What Are Association Rules?
Association rules take the form of X → Y, meaning that if X occurs in a transaction, Y is likely to occur as well. Example:
- {Milk, Bread} → {Butter}
- If a customer buys milk and bread, they are likely to buy butter as well.
Apriori Algorithm
The Apriori algorithm is a popular method for ARM that works as follows:
- Identify frequent itemsets that meet the minimum support threshold.
- Use these itemsets to generate association rules that satisfy confidence and lift thresholds.
- Prune non-significant rules to improve efficiency.
Apriori Algorithm
Data Preparation
Association Rule Mining requires unlabeled transaction data, meaning datasets where each row represents a collection of items purchased together or characteristics occurring together. Unlike supervised learning, ARM does not rely on predefined labels.
Sample Data Structure:
Example of transaction data format:
| Transaction ID | Item 1 | Item 2 | Item 3 |
|---|---|---|---|
| 1 | Milk | Bread | Butter |
| 2 | Bread | Butter | |
| 3 | Milk | Cheese | Butter |
Data Used in This Analysis:
The dataset consists of categorical variables related to football players, including positions, outfitter brands, and dominant foot.
Snapshot of the dataset:
Dataset used for ARM
Results
After running the Apriori algorithm, we extracted key association rules based on support, confidence, and lift.
Top 15 Rules by Lift
| Antecedents | Consequents | Support | Confidence | Lift |
|---|---|---|---|---|
| Position_Goalkeeper | Foot_hand | 0.102 | 1.000 | 9.768 |
| Foot_hand | Position_Goalkeeper | 0.102 | 1.000 | 9.768 |
| Position_Left-Back | Foot_left | 0.070 | 0.965 | 4.026 |
| Position_Right-Back | Foot_right | 0.072 | 0.978 | 1.540 |
| Position_Defensive Midfield | Foot_right | 0.069 | 0.842 | 1.327 |
| Outfitter_Nike, Position_Centre-Forward | Foot_right | 0.013 | 0.834 | 1.314 |
| Position_Centre-Forward | Foot_right | 0.116 | 0.831 | 1.308 |
| Position_Centre-Forward, Outfitter_adidas | Foot_right | 0.011 | 0.829 | 1.306 |
| Position_Centre-Back, Club_name_Without Club | Foot_right | 0.012 | 0.814 | 1.282 |
| Club_name_Without Club, Position_Centre-Forward | Foot_right | 0.011 | 0.806 | 1.269 |
| Outfitter_adidas, Position_Central Midfield | Foot_right | 0.012 | 0.798 | 1.257 |
| Position_Central Midfield | Foot_right | 0.091 | 0.785 | 1.236 |
| Position_Left Winger | Foot_right | 0.060 | 0.777 | 1.224 |
| Position_Centre-Back | Foot_right | 0.122 | 0.715 | 1.127 |
| Position_Centre-Back, Outfitter_Nike | Foot_right | 0.011 | 0.680 | 1.071 |
Top 15 Rules by Support
| Antecedents | Consequents | Support | Confidence | Lift |
|---|---|---|---|---|
| Position_Centre-Back | Foot_right | 0.122 | 0.715 | 1.127 |
| Position_Centre-Forward | Foot_right | 0.116 | 0.831 | 1.308 |
| Position_Goalkeeper | Foot_hand | 0.102 | 1.000 | 9.768 |
| Foot_hand | Position_Goalkeeper | 0.102 | 1.000 | 9.768 |
| Position_Central Midfield | Foot_right | 0.091 | 0.785 | 1.236 |
| Position_Right-Back | Foot_right | 0.072 | 0.978 | 1.540 |
| Position_Left-Back | Foot_left | 0.070 | 0.965 | 4.026 |
| Position_Defensive Midfield | Foot_right | 0.069 | 0.842 | 1.327 |
| Outfitter_Nike | Foot_right | 0.067 | 0.659 | 1.038 |
| Position_Left Winger | Foot_right | 0.060 | 0.777 | 1.224 |
| Outfitter_adidas | Foot_right | 0.059 | 0.642 | 1.011 |
| Club_name_Without Club | Foot_right | 0.057 | 0.656 | 1.033 |
| Position_Attacking Midfield | Foot_right | 0.047 | 0.668 | 1.052 |
| Position_Right Winger | Foot_right | 0.041 | 0.560 | 0.882 |
| Outfitter_Puma | Foot_right | 0.020 | 0.639 | 1.006 |
Top 15 Rules by Confidence
| Antecedents | Consequents | Support | Confidence | Lift |
|---|---|---|---|---|
| Position_Goalkeeper | Foot_hand | 0.102 | 1.000 | 9.768 |
| Foot_hand | Position_Goalkeeper | 0.102 | 1.000 | 9.768 |
| Position_Right-Back | Foot_right | 0.072 | 0.978 | 1.540 |
| Position_Left-Back | Foot_left | 0.070 | 0.965 | 4.026 |
| Position_Defensive Midfield | Foot_right | 0.069 | 0.842 | 1.327 |
| Outfitter_Nike, Position_Centre-Forward | Foot_right | 0.013 | 0.834 | 1.314 |
| Position_Centre-Forward | Foot_right | 0.116 | 0.831 | 1.308 |
| Position_Centre-Forward, Outfitter_adidas | Foot_right | 0.011 | 0.829 | 1.306 |
| Position_Centre-Back, Club_name_Without Club | Foot_right | 0.012 | 0.814 | 1.282 |
| Club_name_Without Club, Position_Centre-Forward | Foot_right | 0.011 | 0.806 | 1.269 |
| Outfitter_adidas, Position_Central Midfield | Foot_right | 0.012 | 0.798 | 1.257 |
| Position_Central Midfield | Foot_right | 0.091 | 0.785 | 1.236 |
| Position_Left Winger | Foot_right | 0.060 | 0.777 | 1.224 |
| Position_Centre-Back | Foot_right | 0.122 | 0.715 | 1.127 |
| Position_Centre-Back, Outfitter_Nike | Foot_right | 0.011 | 0.680 | 1.071 |
Thresholds Used:
- Minimum Support: 0.01
- Minimum Confidence: 0.5
- Minimum Lift: 1.2
Visualization: Association Rule Network
The network graph below visualizes the relationships found among the top 15 rules:
Apriori Vizualization
Conclusions
From the Association Rule Mining analysis, we found several interesting insights:
- Strong Position-Foot Correlations
- Goalkeepers always have “Foot_hand” (100% confidence).
- Left-backs predominantly use their left foot (96% confidence).
- Right-backs predominantly use their right foot (97% confidence).
- Outfitter Preferences and Player Roles
- Nike-sponsored Centre-Forwards tend to be right-footed (83% confidence).
- Adidas-sponsored Centre-Forwards follow a similar trend (82% confidence).
- Clubs and Footedness
- Players without a club tend to be Centre-Backs or Centre-Forwards, and most are right-footed.
These insights can help in player scouting, team strategy development, and sponsorship decisions.
This structured tab provides a complete breakdown of the Association Rule Mining process—from methodology to results—giving users clear insights into data-driven patterns.
The code for implementing ARM can be found here
