Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Factorization Machine and Field-aware Factorization Machine for CTR prediction



Both FM and FFM are extensions for linear regression(LR). The basic formula for LR is:

\[y(w, x) = w_0 + \sum_{i=1}^{n}w_i x_i\]

where is the number of features. As its name implies, its a linear and simple model. The feature conjunction is not considered.


FM has addtional terms to represent the conjunction between features. For example, feature “country: China” and “holiday: Spring Festival” must have somehow relation bewtween themselves. A guest from China may be prone to click item related to Spring Festival, while an American may not. Other feature pair may be “country: US” and “holiday: Thanksgiving”.
With linear regression, if the model take both two click examples like “country: China; holiday: Spring Festival” and “country: US; holiday: Thanksgiving” and some other unrelated example, the weights for “China” and “US” will be the same and the weights for “Spring Festival” and “Thanksgiving” will be the same.
If we have a new example “country: China; holiday: Thanksgiving”, we will predict the same click-through rate as “country: China; holiday: Spring Festival”. Obviously, we don’t want something like this. Therefore, considering feature conjunction is neccessary.

The formula for FM is as below, with features as well

\[y(w, x) = w_0 + \sum_{i=1}^{n}w_i x_i + \sum_{i=1}^{n}\sum_{j=i+1}^{n}<\mathbf{v}_i \centerdot \mathbf{v}_j> x_i x_j\]

Here we introduce a new parameter matrix , whose size is . is a constant, such as 2. is formulated as

\[\mathbf{V} = \left[ \begin{array}{c} \mathbf{v}_1 \\ \mathbf{v}_2 \\ … \\ \mathbf{v}_i \\… \\ \mathbf{v}_n\end{array} \right]\]

is the latent vector for feature . We are going to train these latent vectors to learn the latent effects between feature pairs.

The total number of parameters is .

One last thing to note is that FM can be simplified to be trained and used in time. So it’s a quite efficient algorithm.


Now we have an example as below, it represents that a male user clicked Nike ad on Amazon.

isClick? gender advertisement platform
1 male Nike Amazon

We would need to perform One-Hot encoding to expand the feature, so that feature gender will be expanded to “gender_male” and “gender_female”. “gender: male” will be converted to “gender_male = 1; gender_female = 0”. So it will be easier to perform numerical calculations.

Ignoring LR, the output value for this example can be calculated as:

Recall that we have latent effects between features. In this case, we are using to learn two latent effects <male,nike> and <male, amazon>. However, these two feature pairs may have different latent effects. It may be inappropriate to do it this way. Therefore, FFM is proposed.


For FFM, we divide features into fields, such as country, gender, brand… Every feature will maintain different latent vectors. For a feature pair, we get the latent vectors for each feature and for the field of the other feature. Therefore, if we have feature , and , are fields, training with pair won’t affect training with pair . The latent effect between feature pair in fields are unrelated to that in fields .

The total number of parameters will be . Meanwhile, the calculation cannot be simplified as FM. So the training will be in time. But it may be worthy.

Now, for the example mentioned in FM. With FFM, the output value can be calculated as

where means field for ad. Latent vectors for feature gender_male and are separated in order to learn different latent effects involving feature gender_male, with feature ad_nike and platform_amazon, respectively.

The formula for FFM is given as


  1. Field-aware Factorization Machines for CTR Prediction

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.