Home Apriori
Post
Cancel

Apriori

What is Apriori?

Apriori is an algorithm for frequent itemset mining and association rule learning. It proceeds by identifying the frequent individual items in the dataset and extending them to larger and larger itemsets as long as those itemsets appear frequently enough in the dataset.
The Apriori algorithm has three parts:

  • Support
    It is very similar to Bayes.
    Let’s assume that we are doing a movie recommendation.
$ support(M) = \frac{\text{user watching lists containing } M}{\text{user watchlists}} $
  • Confidence
    Confidence is defined as the number of people who have seen both M1 and M2 movies divided by the number of people who have seen M1.
$ confidence(M_{1} \rightarrow M_{2}) = \frac{\text{user watching lists containing } M_{1} \text{ and } M_{2}}{\text{user watchlists containing } M_{1}} $
  • Lift
$lift(M_{1} \rightarrow M_{2}) = \frac{confidence(M_{1} \rightarrow M_{2})}{support(M_{2})}$



The order of progression of apriori



  • Step 1.
    Set minimum support and confidence thresholds.
  • Step 2.
    Take all subsets in transactions with support higher than the minimum support threshold.
  • Step 3.
    Take all the rules of these subsets with confidence higher than the minimum confidence threshold.
  • Step 4.
    Sort the rules by decreasing lift values.



Example



Code



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from apyori import apriori
rules = apriori(
                transactions=transactions, 
                min_support=0.003, 
                min_confidence=0.2, 
                min_lift=3, 
                min_length=2, 
                max_length=2)

results = list(rules)


def inspect(results):
  lhs = [tuple(result[2][0][0])[0] for result in results]
  rhs = [tuple(result[2][0][1])[0] for result in results]
  supports = [result[1] for result in results]
  confidences = [result[2][0][2] for result in results]
  lifts = [result[2][0][3] for result in results]
  return list(zip(lhs, rhs, supports, confidences, lifts))


resultsinDataFrame = pd.DataFrame(inspect(results), columns = [
                                                                'Left Hand Side', 
                                                                'Right Hand Side', 
                                                                'Support', 
                                                                'Confidence', 
                                                                'Lift'])

resultsinDataFrame = resultsinDataFrame.nlargest(n=10, columns='Lift')                                                                

Result







Implementation

This post is licensed under CC BY 4.0 by the author.