Factorization Machine and Field-aware Factorization Machine for CTR prediction

Background

Both FM and FFM are extensions for linear regression(LR). The basic formula for LR is:

\[y(w, x) = w_0 + \sum_{i=1}^{n}w_i x_i\]

where $n$ is the number of features. As its name implies, its a linear and simple model. The feature conjunction is not considered.

FM

FM has addtional terms to represent the conjunction between features. For example, feature “country: China” and “holiday: Spring Festival” must have somehow relation bewtween themselves. A guest from China may be prone to click item related to Spring Festival, while an American may not. Other feature pair may be “country: US” and “holiday: Thanksgiving”.
With linear regression, if the model take both two click examples like “country: China; holiday: Spring Festival” and “country: US; holiday: Thanksgiving” and some other unrelated example, the weights for “China” and “US” will be the same and the weights for “Spring Festival” and “Thanksgiving” will be the same.
If we have a new example “country: China; holiday: Thanksgiving”, we will predict the same click-through rate as “country: China; holiday: Spring Festival”. Obviously, we don’t want something like this. Therefore, considering feature conjunction is neccessary.

The formula for FM is as below, with $n$ features as well

\[y(w, x) = w_0 + \sum_{i=1}^{n}w_i x_i + \sum_{i=1}^{n}\sum_{j=i+1}^{n}<\mathbf{v}_i \centerdot \mathbf{v}_j> x_i x_j\]

Here we introduce a new parameter matrix $\mathbf{V}$ , whose size is $n \times k$ . $k$ is a constant, such as 2. $\mathbf{V}$ is formulated as

\[\mathbf{V} = \left[ \begin{array}{c} \mathbf{v}_1 \\ \mathbf{v}_2 \\ … \\ \mathbf{v}_i \\… \\ \mathbf{v}_n\end{array} \right]\]

$\mathbf{v}_i$ is the latent vector for feature $i$ . We are going to train these latent vectors to learn the latent effects between feature pairs.

The total number of parameters is $1 + n + kn$ .

One last thing to note is that FM can be simplified to be trained and used in $O(kn)$ time. So it’s a quite efficient algorithm.

Example

Now we have an example as below, it represents that a male user clicked Nike ad on Amazon.

isClick?	gender	advertisement	platform
1	male	Nike	Amazon

We would need to perform One-Hot encoding to expand the feature, so that feature gender will be expanded to “gender_male” and “gender_female”. “gender: male” will be converted to “gender_male = 1; gender_female = 0”. So it will be easier to perform numerical calculations.

Ignoring LR, the output value for this example can be calculated as:

$y = \mathbf{v}_{male} \centerdot \mathbf{v}_{nike} + \mathbf{v}_{male} \centerdot \mathbf{v}_{amazon} + \mathbf{v}_{nike} \centerdot \mathbf{v}_{amazon}$

Recall that we have latent effects between features. In this case, we are using $\mathbf{v}_{male}$ to learn two latent effects <male,nike> and <male, amazon>. However, these two feature pairs may have different latent effects. It may be inappropriate to do it this way. Therefore, FFM is proposed.

FFM

For FFM, we divide features into $m$ fields, such as country, gender, brand… Every feature will maintain $m$ different latent vectors. For a feature pair, we get the latent vectors for each feature and for the field of the other feature. Therefore, if we have feature $a_1 \in A$ , $b_1, b_2 \in B$ and $c_1 \in C$ , $A, B, C$ are fields, training with pair $<a_1, b_1>$ won’t affect training with pair $<a_1, c_1>$ . The latent effect between feature pair in fields $<A, B>$ are unrelated to that in fields $<A, C>$ .

The total number of parameters will be $1 + n + mkn$ . Meanwhile, the calculation cannot be simplified as FM. So the training will be $O(kn^2)$ in time. But it may be worthy.

Now, for the example mentioned in FM. With FFM, the output value can be calculated as

$y = \mathbf{v}_{male,f_{ad}} \centerdot \mathbf{v}_{nike,f_{gen}} + \mathbf{v}_{male,f_{plat}} \centerdot \mathbf{v}_{amazon,f_{gen}} + \mathbf{v}_{nike,f_{plat}} \centerdot \mathbf{v}_{amazon,f_{ad}}$

where $f_{ad}$ means field for ad. Latent vectors for feature gender_male $\mathbf{v}_{male,f_{ad}}$ and $\mathbf{v}_{male,f_{plat}}$ are separated in order to learn different latent effects involving feature gender_male, with feature ad_nike and platform_amazon, respectively.

The formula for FFM is given as

$y(w, x) = w_0 + \sum_{i=1}^{n}w_i x_i + \sum_{i=1}^{n}\sum_{j=i+1}^{n}<\mathbf{v}_{i,f_j} \centerdot \mathbf{v}_{j,f_i}> x_i x_j$

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1