Background
Both FM and FFM are extensions for linear regression(LR). The basic formula for LR is:
\[y(w, x) = w_0 + \sum_{i=1}^{n}w_i x_i\]
where is the number of features. As its name implies, its a linear and simple model. The feature conjunction is not considered.
FM
FM has addtional terms to represent the conjunction between features. For example, feature “country: China” and “holiday: Spring Festival” must have somehow relation bewtween themselves. A guest from China may be prone to click item related to Spring Festival, while an American may not. Other feature pair may be “country: US” and “holiday: Thanksgiving”.
With linear regression, if the model take both two click examples like “country: China; holiday: Spring Festival” and “country: US; holiday: Thanksgiving” and some other unrelated example, the weights for “China” and “US” will be the same and the weights for “Spring Festival” and “Thanksgiving” will be the same.
If we have a new example “country: China; holiday: Thanksgiving”, we will predict the same click-through rate as “country: China; holiday: Spring Festival”. Obviously, we don’t want something like this. Therefore, considering feature conjunction is neccessary.
The formula for FM is as below, with features as well
\[y(w, x) = w_0 + \sum_{i=1}^{n}w_i x_i + \sum_{i=1}^{n}\sum_{j=i+1}^{n}<\mathbf{v}_i \centerdot \mathbf{v}_j> x_i x_j\]
Here we introduce a new parameter matrix , whose size is . is a constant, such as 2. is formulated as
\[\mathbf{V} = \left[ \begin{array}{c} \mathbf{v}_1 \\ \mathbf{v}_2 \\ … \\ \mathbf{v}_i \\… \\ \mathbf{v}_n\end{array} \right]\]
is the latent vector for feature . We are going to train these latent vectors to learn the latent effects between feature pairs.
The total number of parameters is .
One last thing to note is that FM can be simplified to be trained and used in time. So it’s a quite efficient algorithm.
Example
Now we have an example as below, it represents that a male user clicked Nike ad on Amazon.
isClick? | gender | advertisement | platform |
---|---|---|---|
1 | male | Nike | Amazon |
We would need to perform One-Hot encoding to expand the feature, so that feature gender will be expanded to “gender_male” and “gender_female”. “gender: male” will be converted to “gender_male = 1; gender_female = 0”. So it will be easier to perform numerical calculations.
Ignoring LR, the output value for this example can be calculated as:
Recall that we have latent effects between features. In this case, we are using to learn two latent effects <male,nike> and <male, amazon>. However, these two feature pairs may have different latent effects. It may be inappropriate to do it this way. Therefore, FFM is proposed.
FFM
For FFM, we divide features into fields, such as country, gender, brand… Every feature will maintain different latent vectors. For a feature pair, we get the latent vectors for each feature and for the field of the other feature. Therefore, if we have feature , and , are fields, training with pair won’t affect training with pair . The latent effect between feature pair in fields are unrelated to that in fields .
The total number of parameters will be . Meanwhile, the calculation cannot be simplified as FM. So the training will be in time. But it may be worthy.
Now, for the example mentioned in FM. With FFM, the output value can be calculated as
where means field for ad. Latent vectors for feature gender_male and are separated in order to learn different latent effects involving feature gender_male, with feature ad_nike and platform_amazon, respectively.
The formula for FFM is given as