Regression and Singular Value Decomposition Approaches

The Application of Linear Algebra to Recommender Systems:
Linear Regression and Singular Value Decomposition Approaches
SAMPLE FINAL REPORT
Professor E. G. Puckett.
MAT 167
Spring Quarter 2014
Table of Contents
Introduction and Motivation……………. 1
Data Set…………………………………. 2
Methods…………………………………. 2
Results…………………………………… 6
Summary and Conclusions………………. 9
Figures…………………………………… 10
Tables…………………………………… 15
Appendix: Code…………………………. 17
1
Introduction and Motivation
The ubiquity of Internet access and its ever-growing role in activities ranging from
commerce to personal leisure has created an explosion in data over the last decade. Much of this
data is gathered from user-generated behavior within a web browser whether or not the user is
aware of its collection. The quantity and richness of this data has helped foster demand for a userdriven model of online experiences, particularly with regard to hedonic pursuits.
As the realm of human experience has increasingly shifted from the physical space to the
digital, businesses have attempted to exploit data on website users in order to increase revenue or
provide a better service in order to retain loyal customers. With consumers’ brief attention spans
mercilessly pulled from one corner of the Internet to the next, a business’ survival may depend upon
delivering the right material to the right people at the right time. A recommender system is one
vehicle by which this goal is accomplished. Such systems are algorithms that analyze data for
revealed user preferences and translate those preferences into targeted content. User data may come
in the form of clicks on links, time spent reading a given page, or ratings provided for a product.
Recommendation systems come in forms both subtle and explicit. Services such as Google
News analyze consumer interests by browsing habits and recommend news stories tailored to users’
tastes. Advertisers collect data on website visits in order to classify a user into pre-specified market
segments. Online shopping goliath Amazon.com records purchases and monitors previously viewed
items in order to suggest products that will catch a customer’s eye. Streaming music providers like
Pandora use song or artist seeds to create a station suited to an individual listener. And, perhaps
most famously of all, Netflix makes use of movie-watching history and user ratings in order to
recommend movies the user has not seen but is likely to enjoy.
As the first online movie rental service to excel in its niche, Netflix developed a physical and
digital delivery system and unique business model that rapidly put pressure on its store-based
competitors that retained an antiquated and unpopular regime of due dates and late fees. Netflix has
remained competitive by, among other things, maintaining a state-of-the-art recommender system
that predicts preferences with reasonable accuracy in order to retain the interest of consumers and to
keep the service fresh and relevant. Netflix famously sought to enhance this recommender system
through a public competition. Though the Netflix Prize Contest ended in 2009, the idea of movie
recommender systems has remained a teaching point and a case study in the field of machine
learning.
Recommender systems are categorized by the kind of data on which they rely. So-called
content-based filtering systems build a profile of users and products; this kind of system relies on
the relationship between a single user and the available products. The user profile indicates the
kinds of product features a user prefers (e.g. a user watches mostly science fiction movies) and the
product profile indicates to what degree the product can be placed in a certain category (e.g. a movie
may be science fiction or a romantic comedy, but rarely both). In contrast, recommendations made
by a collaborative filtering system are formed by an analysis of similarities among users and
products; this system relies principally on the relationship of users to users or products to products,
rather than users to products. It is common in movie recommendation to use a hybrid contentbased and collaborative filtering system that relies on all the movies to receive at least one rating by
some user, and recommends movies to users based on their own preferences. This approach is
particularly vital for the regression and singular value decomposition approaches employed below.
2
Data Set
Although the Netflix Prize data, including nearly a half-million users and 18,000 movies,
were released for the duration of the competition, they are no longer publicly available.
Nevertheless the University of Minnesota’s GroupLens Research team has released a similar data set1
with ratings of 1682 movies by 943 users from the MovieLens database. These ratings, like those from
Netflix, come in integer values between 1 and 5, sometimes called the number of “stars” a user gives a
movie. Thus these data can be used in an identical way to the Netflix Prize data. The MovieLens data
come with a variety of additional information, including the genre of each movie in the database. While
there are ways to make use of information about movie features, this project is focused on specific
applications of linear algebra and will make predictions based only on known user ratings. Movie genres
will be discovered ex post, rather than entered ex ante.
Due to computational limitations it was impracticable to work with the entire MovieLens
data set. Even with its relatively small size of 1.5 million entries (an order of magnitude smaller than
that of the Netflix data), the rating matrix is simply too large to analyze in a reasonable amount of
time on a humble laptop computer. Instead I elected a much smaller subset of the data, choosing 20
movies and 200 users at random. The purpose of choosing such a small data set was to make the
results easy to interpret and present, and to keep the computations brief enough that they could be
repeated many times under different conditions to explore the effects of changing model
parameters. With enough resources, however, the methods described in this report could be
employed with every movie and user provided in the MovieLens data.
Methods
Why Matrix Factorization Works
The primary interest of the movie recommender system is to predict a user’s ratings for
movies he has not seen, and recommend those that he is likely to enjoy. At the heart of this
method is the assumption that a user’s rating for any given movie is a linear combination of a vector
describing the degree to which the movie contains certain characteristics and themes (the genre vector)
and the degree to which the user enjoys these characteristics (the preference vector). Each movie,
containing a unique combination of elements from different genres, has its own genre vector whose
elements correspond to the level of each characteristic; for example, a romantic comedy would
contain high values for humor, romance, and feel-good categories and low values for action or
thriller categories. Similarly, because of the uniqueness of personal preferences, each user has his or
her own preference vector; a hopeless romantic would have high values for the romantic comedy
characteristics just described and perhaps low values for the action and thriller categories.
In the context of movie ratings, matrix factorization methods essentially break down the
ratings matrix into component matrices representing movie genres and user preferences. Each
movie and each user is assigned a vector in feature-space (genres for the movies, preferences over
those genres for the users). When the interests of a user align with the genres of a movie, that user
is likely to enjoy the movie; geometrically, this is like a close alignment of the genre and preference
vectors in space. When user preferences do not agree with a movie’s features, the genre and
preference vectors are orthogonal. Therefore, the inner product of the movie’s genre vector with
the user’s preference vector would provide a prediction about the user’s enjoyment of the movie.
However, we do not have the right data in the first place: we lack both the genre vectors of
movies and preference vectors of users. All we have is a sparse rating matrix! Moreover, while we
1
Available at http://www.grouplens.org/node/73.
3
require genre and preference vectors, these are not our primary ends in movie recommendation;
rather, we want the predicted user ratings themselves. Because the vectors of interest are unknown
to us, these genre and preference parameters are called “latent” factors and we must discover them
by means of collaborative and content-based filtering. If there is at least one rating for every movie
by some user, then we can discover the genre vector that makes the best rating predictions. The
converse is also true: if there is at least one rating for every movie then we have enough data to
discover the preference vector that makes the best prediction.
Linear Regression Approach
Given the nature of MovieLens rating data, in which the diversity of user rating strategies
makes rating distributions highly non-normal and varied across users, the notion of multiple linear
regression for user rating predictions would seem something of a folly. Instead, however, the
method is quite powerful with the aid of a carefully controlled gradient descend algorithm.
Formally, we would represent the user-specific linear model as

where is the rating given by the

Don't use plagiarized sources. Get Your Custom Essay on
Regression and Singular Value Decomposition Approaches
Just from $13/Page
Order Essay

user for the
movie. The parameters stand for the
values of movie j’s genre vector for genre g, where the (arbitrary) number of genres is G. The user’s
preference vector is represented by the parameters with similar notation. (The intercept is
omitted for reasons that will become clear.) More generally, this linear model defines a (J*I) x 1
rating vector whose entries may be arranged in a J x I rating matrix Y with movies as rows and users
as columns, whose entries are defined by

where is the J x G matrix formed by stacking movie genre vectors and P is the I x G matrix
formed by stacking user preference vectors. In reality, this rating matrix is sparse, meaning that
most of its entries are empty because not every user has rated every movie. Thus the challenge is to
use a form of collaborative filtering in order to impute these missing ratings as accurately as possible.
The engine of linear regression is the minimization of the sum of squared errors (SSE), and
in order to get errors we need predictions. This problem is one of concurrently predicting the genre
and preference matrices and . In order to accomplish this, we define the sum of squared errors
as
∑ (
)

( )
where indicates the set of all pairs (i, j) for which user i has provided a rating for movie j, where a
bold character indicates a vector, and where is the actual rating provided by the user. Our
objective is to predict genres and preferences in a way that minimizes this error. Furthermore, we
are interested in obtaining small and realistic values of the genre and preference vectors.
One danger of unchecked simultaneous optimization is that the parameters may take any
values necessary to achieve the objective. This problem can lead to excessively large values of some
parameters and exceedingly small values of others, posing the problem of overfitting the model to a
specific data set where such parameters may produce good estimates but fail to generalize to other
samples. The simple solution is to penalize (or “regularize”) the elements of the genre and
preference vectors so that the individual elements do not become too large. Therefore we let
denote the regularization parameter and define the objective function as
4

 

∑ (
)

( )

 

∑ ∑

 

 

 

∑ ∑

 

 

where we optimize with respect to genre and preference vectors simultaneously, controlling the size
of and by introducing squared terms into the cost function so that their expansion in the
positive or negative direction is checked by rapidly increasing penalties. The value of need not be
large, and in fact by the nature of the cost function the root-mean-squared error of the predictions
will increase as increases. Nevertheless, its presence in the objective is critical to a plausible
outcome.
With an infinite number of combinations of the parameters to test, the optimization is
computationally unwieldy without a well-defined rule for sniffing out the minimum. Rather than
map out every value of SSE in parameter space, we compute the error at a given value of our
parameters, we determine the slope of the SSE in that neighborhood, and we ride the slope
downward toward a lower error and repeat. In calculus the gradient of a function is defined as the
direction of steepest ascent for a given value of inputs and is computed as the vector of partial
derivatives of all variables evaluated at those inputs. Thus we employ a method known as gradient
descent to determine, at some point in parameter space, the appropriate change in parameters that
reduces the SSE most efficiently. This procedure employs a parameter known as the learning rate,
defined as the size of the “step” down the gradient before the next computation. A larger learning
rate will seek out the optimum more quickly, but too large a value will make too large a step and will
risk overshooting the minimum and increasing the parameters uncontrollably.
Because we predict both genres and preferences, the gradient descent is applied to both the
genre and preference vectors in each iteration of the algorithm, a process called the gradient descent
update. Using the partial derivatives of the objective function and the learning rate, the updates are

(∑(
)

)

(∑(
)

)
which define new values of the parameters
and
. These are then tested in the objective
function again and another round of gradient descent is conducted.
Thanks to the gradient descent update, we can in theory start from any point in parameter
space and, given enough iterations of the algorithm and assuming we do not fall into a local
minimum, we can work our way toward the global minimum of the objective function. In this way
the machine can “learn” or discover the optimal parameters, once we have specified an arbitrary
quantity of genres. This is also the reason that we do not need an intercept in our original linear
model: if an intercept term belongs in the model, the algorithm will adapt and learn this feature.
Therefore, we assemble the final algorithm in the following way: first, we initialize the values of the
genre and preference vectors to small random values; second, we run the gradient descent update
until the convergence of the objective function to its minimum; and finally we predict ratings with
the inner product of the genre and preference vectors for each user and movie. This process is
considered a low-rank matrix factorization procedure because a minimization is used to approximate
the actual user ratings while the predicted rating matrix is decomposed into preference and genre
matrices, so that the final prediction is given by
Ì‚ Ì‚ Ì‚
.
5
Without much surprise we note that this result is simply the estimated factorization proposed by our
linear model above. With some exploration of the model’s behavior and the right combination of
learning rate and regularization parameter, this method performs astoundingly well in terms of
minimizing the difference between prediction and actual rating. The predicted star rating will, of
course, not occur in integer values like the ratings supplied by users. This does not pose an obstacle:
in fact, Netflix’s “best guess” for ratings has always come in the form of fractions of stars, and there
would be tremendous loss of information if predictions were rounded to the nearest star.
Singular Value Decomposition Approach
Many of the principles that make prediction effective in the case of linear regression also
apply to another commonly implemented recommendation technique, the singular value
decomposition (SVD). Like regression, the SVD also relies on discovering component matrices
whose product is the ratings matrix, and on a gradient descent update to refine those factors.
Furthermore the component matrices are the genre and preference matrices whose product is the
prediction of use ratings for a movie.
The singular value decomposition factors any real matrix A into three components: a matrix
of (orthogonal) eigenvectors of AAT
, a matrix of (orthogonal) eigenvectors of ATA, and a diagonal
matrix of the singular values (the square roots of the eigenvalues of ATA). This SVD can serve the
purpose of dimensionality reduction because it orders the eigenvectors and singular values by the
size of the singular values; that is, by the relative importance (i.e. amount of information stored) of
each vector. Then the most important vectors can be chosen and recombined to produce an
approximation of the original matrix. This is relevant to the movie rating matrix because the act of
discovering the latent genre and preference vectors is exactly the problem of dimensionality
reduction: insofar as the assumption holds that movies are categorized into genres and users may
have similar tastes, we may use the SVD to discover the hidden features that characterize these
similar movies and users into fewer groups.
Specifically, if we let our sparse rating matrix be the matrix A just described (filling in
missing values with the mean rating by movie), we can decompose it with the SVD. What emerges
is a complete picture of all possible genres (as many as there are movies) and all user preferences (as
many as there are users). In other words, by the SVD we obtain

where, as for linear regression, represents the matrix of movie genres and represents the matrix
of user preferences. Here, represents the diagonal matrix of singular values. Some of these genres
and preferences do not actually contain much information on the variety of the movies themselves,
but SVD has already done the job of sorting out the important genres by ordering the vectors
appropriately. Now we can choose the k most important genres by taking appropriate subsets of
the three component matrices.
With these subset matrices, we can derive a prediction for all the entries of the original
matrix A by multiplying the factors back together. This prediction will be reasonable for a first
approximation, but a gradient descent is still needed in order to refine the prediction. Specifically,
after subsetting the component matrices, we merge the genre matrix with the matrix of singular
values :

̃

where ̃ . We then proceed with an iterative gradient descent update process. In the SVD
setting we do not explicitly pursue the objective of minimizing the sum of squared errors as we do in
linear regression, but our updates are still guided by seeking the lowest possible difference between
predicted and actual ratings.
6
We define the predictive error (for all movies with a rating), , as

where, as above, we have the genre vector
,the preference vector
, and the actual rating . To
find the descent trajectory, we use the partial derivative of the squared sum of errors with respect to
each parameter. As before, we subtract this derivative (times a learning rate) from the original
parameter in order to update the genres and preferences. The updates are given by

∑(
)

 

∑(
)

where is again the learning rate. After the completion of the gradient descent update, we have a
predicted genre matrix
̂̃ and the predicted preference matrix ̂ which we multiply for the final
prediction,
̂ ̂̃ ̂

Results
Computational Challenges
In order to accomplish the gradient descent updates required for accurate prediction, the
algorithms implemented in this study compute anywhere from tens of thousands to hundreds of
thousands of matrix multiplications with each run of the custom functions. Thus the machine time
required for a single run can quickly escalate when very accurate predictions are desired. The
computation time is highly dependent both on the size of the matrix being computed and the
number of gradient descent iterations used. Moreover, both the matrix size and number of
iterations have a bearing on the accuracy of the final prediction; a larger matrix contains more
information about users and movies, and more iterations allows for a further descent into an error
minimum.
Figures 1 and 2 report the effect of matrix size and number of iterations on computation
time for a single run of both predictive approaches. Happily, Figure 1 shows that computational
time increased linearly in the number of matrix entries for both the regression and SVD approach,
indicating a stable function that operates as efficiently as it can as the matrix grows. The SVD
approach has a slight time advantage over the regression approach, an advantage that appears to
grow with the number of entries. This result indicates that, for my chosen default matrix of 20
movies and 200 users, a regression approach with 10,000 iterations requires about 5.5 seconds, while
an equivalent SVD approach requires about 4.5 second.
Not surprisingly, the two functions also exhibit linear time consumption trends in the
number of iterations in the gradient descent, as demonstrated in Figure 2. Here again the SVD
method had a slight advantage that widened as the number of iterations grew. However, the
regression algorithm could achieve more accurate results with fewer iterations, giving it a
comparative advantage in the end. In fact, regression could perform better with 2,000 iterations
than SVD could with 200,000 iterations. Thus regression was a more efficient algorithm.
7
Number of Iterations
With the root-mean-squared error (RMSE) as the judge of performance, we can quantify
how much better the regression approach performed. Both functions were run using successively
larger numbers of iterations ranging from 10,000 to 200,000. The RMSE was computed for each
and the results were plotted in Figure 3. Although regression has a slightly longer run time for any
given number of iterations, it has a vastly superior predictive capability. In fact, over the range
tested, SVD never outperformed regression at a given number of iterations. Moreover, the
regression approach appears to have settled into a local minimum at around 50,000 iterations, with
the very low RMSE of 0.07. In contrast, even at 200,000 iterations, the RMSE of the SVD approach
was still descending, presumably toward some minimum that it could achieve if given enough time.
It should be noted that, although these two functions employ similar gradient descent methods, they
start with very different matrices. Thus their descent path toward a minimum is also quite different,
a fact that may help explain their different behavior.
Learning Rate
There is another factor that causes the disparity in Figure 3, however. Recall that the
learning rate is a control parameter that determines how quickly the gradient descent moves in the
direction of minimum error. If this parameter is too small, the steps toward the best predictions are
also too small and finding the minimum is too slow. However, if the parameter is too large, the
steps will also be too large and will overshoot the minimum, possibly causing a gradient ascent away
from the minimum instead of a gradient descent toward the minimum.
Figure 4 plots the RMSE by value of the learning rate. Both methods were highly sensitive
to the value of the learning rate and pinning down a good range for the learning rate was difficult.
Finding the best rate was especially challenging for the SVD approach. Although Figure 4 does not
reflect it, the RMSE could not be resolved with a learning rate larger than about 8×10-5 in the SVD
case. The learning rates thus had to be quite small in order for SVD to return real results. This is
another reason why the SVD method required many more iterations than the regression method.
With a relatively smaller learning rate, the gradient descent required more time to locate the
minimum.
Number of Genres
One of the unique features of these recommender algorithms is the ability to decide how
many genres should fully characterize the movies in the data set. This is a fairly arbitrary decision
but it has implications for the performance of the models, as shown in Figure 5. Most notably, the
RMSE decreases in the number of genres, reaching its minimum when the number of genres equals
the number of movies. So what is the advantage of choosing fewer genres than movies? First, we
want to reduce computation time. In a matrix with thousands of movies, reducing the movies to a
few dozen genres can result in enormous time savings in practical implementations of these
algorithms. Second, we want to label movies. Rather than pay employees to research movies in
order to determine their genre designations, a business can instead allow the computer to discover
the genres after someone selects how many genres are likely to be present in the data set. Then the
movies can be labeled appropriately with genres for which they score highly. Third, and most
important for recommendation systems, we want to avoid the problem of overfitting. If we let
every movie represent its own “genre” then we are creating a system that makes good predictions
for that particular matrix, but will not generalize well to other rating matrices, or may perform a
good deal more poorly when new movies and users are added. And of course, in the case of the
8
SVD approach, the method has a built-in mechanism for discovering the most important genres.
After a point, there will be rapidly diminishing marginal benefits to additional genres.
Predictive Performance
When we use the right model parameters, we see a decrease in root-mean-squared error. But
the RMSE is just a number and can feel abstract. Instead, it is helpful to visualize how the right
choice of parameters leads to a better rating prediction. Figure 6 plots predicted ratings versus
actual user ratings for both “good” parameters and “bad” parameters. In both the regression and
SVD approaches, we see that bad parameters, leading to a high RMSE, create inaccurate predictions
with a wide variance. In contrast, good parameters allow predicted and actual ratings to align so that
they are nearly indistinguishable. In this case we can be confident that our predictions are accurate,
at least for movies that the user has already rated: the predictions conform exceedingly well with the
true values of known ratings. For movies without known ratings, the predictions can take on any
value in the rating scale, including non-integer numbers, and so do not indicate the exact star rating
the user would give (because the user is limited to an integer selection) but rather indicate the degree
to which a user is expected to enjoy a film. If a user were to watch one of these films and rate it, a
new implementation of the algorithms would use this information to make all the predictions more
accurate. Thus, in these algorithms, every user unknowingly collaborates to produce better
recommendations for everyone.
Genre Space
An underlying principle in recommender dimensionality reduction is that movies will tend
to “cluster” together in genre-space. That is, movies that share similar features will be described by
similar genre vectors. If we could visualize genre-space, we would see similar films grouping
together. Fortunately we can get a glimpse of genre space by running the algorithms, setting the
number of genres to two, and plotting the results on a two-way scatterplot. Then each dimension
would act as a kind of aggregate genre that describes the films. Figures 7 and 8 plot the films by
their values of aggregate genre. While it is difficult to interpret exactly what these two genres really
represent, we are more interested in the clustering effects.
Indeed, we see this clustering to some extent in Figure 7 (note that the axes merely represent
relative levels, so they can take on any value). Both The Godfather and The Godfather: Part II fall on the
far left, while both Star Trek III and Star Trek VI fall on the far right. Moreover, the comedies Life of
Brian and Raising Arizona both appear in the middle. With a few small differences, we see similar
movie groupings in Figure 8, which comes from the SVD approach. In fact, Figure 8 looks like
nothing more than a 90-degree clockwise rotation of Figure 7, with a few tweaks, suggesting that the
two algorithms are discovering the same genres.
We must be careful how we interpret these plots when we use only two genres. Did The
Lion King and Jaws group together because they are both about ferocious animals? It would be
humorous, but probably untrue. Did Twelve Monkeys and Back to the Future group together because
they are both about time travel? Possibly. We would need to compare them in higher dimensions
to know for sure. And perhaps with more genres we would see The Lion King and Babe pull away
from the more serious and adult-oriented movies. Nevertheless, the clustering effects we see here
indicate that users with similar tastes rate movies similarly, allowing the algorithm to separate movies
into different categories based on this implicit collaboration effect. This is the true secret of the
matrix factorization approach!
9
Recommendations
We cannot venture into recommender systems without finally making recommendations. In
order to demonstrate how the algorithms provide predicted ratings, I generated a small rating matrix
of six movies and eight users from the MovieLens data (Table 1). Some users have rated only one
movie, some have rated a few, and one has rated all of them. The results of the predictions are
presented in Tables 2 and 3. The most striking feature is how closely the predictions match the
actual ratings for both methods, though linear regression appears to have done slightly better.
Whether the user has rated only one movie or all of them, the predictions are impressively accurate.
There is, however, a difference in how the two methods predict ratings for movies the users have not
seen. For example, regression predicts user H will give Top Gun a rating of 1.000, while SVD
predicts a 3.439 for the same. In effect, one method would tell user H that Top Gun will be his least
favorite, and the other method would tell him it will be his favorite among those not yet rated. This
is a curious result, especially given the similar genre plots in Figures 7 and 8. This result implies that
the two algorithms have found different matrices that provide similarly good predictions for alreadyrated movies but different predictions for unrated movies. It seems that, because the two methods
begin from dramatically different component matrices, the gradient descent has identified two
different local minima.
Returning to the original 20×200 rating matrix, we can again see the difference in the two
methods in Table 4. For the same user, the two methods have mixed opinions about what the user
will enjoy. In some cases the predictions are similar, in order if not in absolute rating, while in other
cases the algorithms make rather different predictions. Again, this result may occur because the
methods have settled into different minima. In fact, neither method may be fully correct. It is
common in these kinds of machine learning problems to combine or average the results of many
different models. It may be that the best predictions come from a blend of both the regression and
SVD approaches, but that is an avenue for future work.
Summary and Conclusion
The goal of this project was to explore two different but related methods for building a
recommender system around user ratings for movies. The objective was to make accurate
predictions of already-rated movies under the assumption that the accompanying predictions for
unrated movies would also be accurate. One approach was linear regression with simultaneous
parameter estimation. It began with genre and preference matrices initialized to small random
values and then used gradient descent in order to minimize the sum of squared errors. The other
approach was a singular value decomposition of the original rating matrix and the selection of the
most important genres discovered by the decomposition. The predictions that resulted from the
SVD were then updated with another form of gradient descent.
The two methods performed well but differed in some of their recommendations. This
result indicates that the algorithms settled at different local equilibria. It is possible that one is
“more correct” than the other, or that they may both be two pieces of the same puzzle whose
solution is more complete with a combination of the methods. Indeed, the winners of the Netflix
Prize Contest ultimately used an ensemble of 107 different methods to make the winning
predictions. Machine learning tools such as my algorithms are rarely complete on their own.
Moreover, a larger computer with more processing power would be able to deal with a larger rating
matrix and take advantage of the additional information afforded by including more users and
movies. Nevertheless, this experiment has laid the foundations for a reasonably reliable
recommender system that employs the collaboration of individual users for the benefit of all.
10
Figure 1. Effect of matrix size on computation time.
Figure 2. Effect of number of iterations on computation.
11
Figure 3. RMSE for different number of iterations of gradient descent.
Figure 4. RMSE for different learning rates.

12
Figure 5. RMSE for different number of genres.
Figure 6. Predicted and actual ratings for bad and good models.
13
Figure 7. Movies in genre-space, regression approach.
14
Figure 8. Movies in genre-space, SVD approach.
15
Table 1. Sample data set with actual user ratings.
Movie \ User A B C D E F G H
Phenomenon 3 3 4
Sneakers 3 2
Titanic 5 4 4 4 5
Top Gun 2 5
Cool Hand Luke 4 3 4
Return of the Jedi 4 5 5 5 2 3
Table 2. Prediction results from regression approach (already-rated movies in bold).
Movie \ User A B C D E F G H
Phenomenon 2.675 3.571 2.995 3.305 2.999 3.999 2.063 2.841
Sneakers 2.502 3.320 2.624 3.118 2.996 3.426 1.998 2.719
Titanic 3.127 4.996 3.838 3.944 4.000 3.997 3.982 5.000
Top Gun 2.545 2.150 2.421 3.008 2.001 4.995 1.000 1.000
Cool Hand Luke 3.184 4.465 3.401 3.997 4.000 4.092 3.024 3.974
Return of the Jedi 3.997 4.481 3.265 4.996 4.996 5.000 1.993 3.008
Table 3. Prediction results from SVD approach (already-rated movies in bold).
Movie \ User A B C D E F G H
Phenomenon 3.364 3.672 3.000 3.503 2.885 3.943 2.937 3.417
Sneakers 2.532 2.829 2.293 2.729 2.909 2.371 1.887 2.606
Titanic 4.380 4.999 3.932 4.287 4.112 4.022 4.050 5.000
Top Gun 3.506 3.714 3.078 3.570 2.023 5.000 3.441 3.439
Cool Hand Luke 3.699 4.155 3.338 3.867 3.955 3.486 2.986 3.943
Return of the Jedi 3.999 4.140 3.626 4.999 5.000 5.000 2.028 3.013
16
Table 4. User preferences and ordered predictions with rating in parentheses.
User Preference
(Actual Rating)
Regression Prediction
(Predicted Rating)
SVD Prediction
(Predicted Rating)
User 1
The Godfather (5) Leaving Las Vegas (5.000) Star Trek VI (4.730)
The Godfather II (5) Lone Star (4.016) Primal Fear (4.618)
True Lies (5) Babe (3.934) Babe (4.486)
Goodfellas (4) Lion King (3.857) Die Hard (4.076)
Back to the Future (4) Primal Fear (2.915) Raising Arizona (3.757)
Silence of the Lambs (4) Star Trek VI (2.293) Lone Star (3.559)
Jaws (4) Die Hard II (2.245) Star Trek III (3.038)
First Wives Club (3) Raising Arizona (2.164) Leaving Las Vegas (1.787)
Twelve Monkeys (3) Life of Brian (1.932) Lion King (1.000)
Dr. Strangelove (2) Star Trek III (1.609) Life of Brian (1.000)
User 2
The Godfather (5) Babe (5.000) Leaving Las Vegas (5.000)
The Godfather II (5) Lion King (4.504) Raising Arizona (5.000)
Jaws (5) Lone Star (4.273) Back to the Future (5.000)
Goodfellas (4) Dr. Strangelove (4.067) Lone Star (4.482)
True Lies (4) Back to the Future (3.689) Primal Fear (3.890)
Silence of the Lambs (4) Leaving Las Vegas (3.577) Babe (3.083)
Die Hard II (3) Life of Brian (3.462) Lion King (3.071)
Twelve Monkeys (3) First Wives Club (2.303) Life of Brian (2.647)
Star Trek III (2) Raising Arizona (1.969) First Wives Club (2.640)
Star Trek VI (2) Primal Fear (1.737) Dr. Strangelove (2.574)
User 3
Dr. Strangelove (5) Life of Brian (5.000) Lion King (5.000)
Twelve Monkeys (5) Goodfellas (4.939) Life of Brian (5.000)
The Godfather (4) The Godfather II (3.755) The Godfather II (3.403)
Back to the Future (4) True Lies (3.379) Goodfellas (3.382)
Silence of the Lambs (4) Die Hard II (3.219) Lone Star (2.699)
Jaws (4) Lion King (3.190) Star Trek III (2.660)
Raising Arizona (3) Lone Star (2.947) True Lies (2.496)
Leaving Las Vegas (3) First Wives Club (2.057) Primal Fear (2.481)
Babe (2) Primal Fear (1.422) First Wives Club (2.031)
Star Trek VI (1) Star Trek III (1.000) Die Hard II (1.340)
17
APPENDIX: CODE
QUICK GUIDE TO R
Symbol/Term Meaning
# Comment character
<- Stores value to name on left
$ Calls sub-object from object
( ) Denotes quantity or the use of a function
[ ] Subsets vector or matrix
[[ ]] Subsets a list
{ } Denotes boundary of the body of a function or loop
= Sets levels of arguments of a function
== Logical, equal to
!= Logical, not equal to
: Creates sequence of integers, e.g. 1:20 is 1, 2, …, 20
* Element-wise multiplication
%*% Matrix multiplication
… Passes additional arguments to functions downstream
c( ) Creates vector with supplied values
lapply/sapply Loops over first argument using the following function
##############################################
#### FUNCTION TO RETRIEVE RATING MATRIX ####
##############################################
# film.list = the list of all films (mov.list)
# min.n.ratings = the minimum number of ratings a movie can have to be included
# n.films = number of films in final data set
# n.users = number of users in final data set
# rows = which variable, movies or users, should be the “observations” i.e. rows
# This function will never generate the same data set twice.
get.pops <- function(film.list, min.n.ratings, n.films, n.users, rows = “movie”){
# which movies have >= min.n.ratings
popular.i <- which(sapply(film.list, nrow) >= min.n.ratings)
# sample from them to get random subset of most popular movies
sample.pop <- sample(popular.i, n.films, replace = FALSE)
# get smaller list of movies we want
new.list <- film.list[sample.pop]
# bind them into a single data set (no longer a list)
long.frame <- do.call(rbind, new.list)
# get wide data frame by reshaping
wide.frame <- reshape(long.frame, timevar = “movie_id”, idvar = “user”, direction = “wide”,
drop = “time”)[, -1]
# subset to desired number of users
user.index <- sample(1:nrow(wide.frame), n.users, replace = FALSE)
wide.frame <- wide.frame[user.index, ]
# return the results
if(rows == “movie”) return(t(wide.frame)) else
if(rows == “user”) return(wide.frame)
}
#############################################
#### FUNCTION TO RETRIEVE MOVIE TITLES ####
#############################################
18
# This function takes the matrix of ratings and the title data set
# and matches them so I know which movies were selected.
get.titles <- function(rating.matrix, title.frame){
# get character vector of row names from rating matrix
row.names <- rownames(rating.matrix)
# pull off the movie ID number from the row names
ids <- gsub(“rating.”, “”, row.names)
# match this ID number with the movie title from the title data set
index <- sapply(ids, function(i) which(title.frame$movie_id == i))
# return the results
title.frame$title[index]
}
############################################
#### FUNCTION TO MAKE RECOMMENDATIONS ####
############################################
# This function takes the matrix of predictions and recommends movies
# sorted by level of predicted enjoyment. It determines which movies
# the user has seen, and which the user has not. It returns, for each user,
# the movies that were seen, and those that were not in order of prediction.
recommend <- function(result, setup){
# Which movies the user has seen
seen <- lapply(1:ncol(setup$Ind), function(i) which(setup$Ind[,i]!=0))
# What the user rated those movies
rated <- lapply(1:length(seen), function(i) setup$Y[,i][seen[[i]]])
# The order of those movies by rating (i.e. by preference)
pref.order <- lapply(rated, sort, decreasing = TRUE)
# Which movies the user has not seen
unseen <- lapply(1:ncol(setup$Ind), function(i) which(setup$Ind[,i]==0))
# What the function predicted for movies not seen
predicted <- lapply(1:length(unseen), function(i) result$pred[,i][unseen[[i]]])
# The order of those movies by prediction
pred.order <- lapply(predicted, sort, decreasing = TRUE)
# Return the results in a list, one element per user
lapply(1:length(seen), function(i) list(pref.order[[i]], pred.order[[i]]))
}
###################################################
#### FUNCTIONS TO PLOT MOVIE TITLES BY GENRE ####
###################################################
# These functions perform the algorithms with 2 genres.
# Movies are then plotted in genre-space in order to
# see how the algorithms discover latent movie characteristics.
gplot.reg <- function(reg.setup, …){
results <- regression(reg.setup, 2, …)
dev.new()
plot(results$G[,1], results$G[,2], xlab = “Genre 1”, ylab = “Genre 2”,
main = “Movies by Value of Latent Genres, Regression”)
text(results$G[,1], results$G[,2], rownames(results$G), cex = 0.7, pos = 1)
}
gplot.svd <- function(svd.setup, …){
results <- svd.descent(svd.setup, 2, …)
19
dev.new()
plot(results$G[,1], results$G[,2], xlab = “Genre 1”, ylab = “Genre 2”,
main = “Movies by Value of Latent Genres, SVD”)
text(results$G[,1], results$G[,2], rownames(results$G), cex = 0.7, pos = 1)
}
############################################
#### FUNCTIONS FOR COMPUTATIONAL COST ####
############################################
# Regression computation time for matrices of different sizes
time.dim.reg <- function(film.list, max.dim = 100, …){
# create vector of desired matrix dimensions
dims <- seq(20, max.dim, by = 10)
# make largest rating matrix
maxdim <- max(dims)
data.big <- get.pops(film.list, min.n.ratings = 216, 20, maxdim)
# make smaller rating matrices as subsets of largest, looping over dims
ratings <- lapply(dims, function(dim) data.big[, 1:dim])
# make list of setups
setups <- lapply(ratings, function(i) reg.setup(i, rownames(i)))
# run regression for each size of rating matrix, timing each
times <- sapply(setups, function(i)
system.time(regression(i, n.genres = 10, …))[3])
# return data frame with matrix dimension and time elapsed
data.frame(dim = dims, seconds = times)
}
# Regression computation time for different numbers of iterations
time.its.reg <- function(setup, max.its = 200000, …){
# Create vector of number of iterations
its <- seq(10000, max.its, by = 10000)
# Run regression for each number of iterations, timing each
times <- sapply(its, function(i)
system.time(regression(setup, n.genres = 10, n.iters=i))[3])
# Return results
data.frame(its = its, seconds = times)
}
# SVD computation time for matrices of different sizes
time.dim.svd <- function(film.list, max.dim = 100, …){
# create vector of desired matrix dimensions
dims <- seq(20, max.dim, by = 10)
# make largest rating matrix
maxdim <- max(dims)
data.big <- get.pops(film.list, min.n.ratings = 216, 20, maxdim)
# make smaller rating matrices as subsets of largest, looping over dims
ratings <- lapply(dims, function(dim) data.big[, 1:dim])
# make list of setups
setups <- lapply(ratings, function(i) svd.setup(i, rownames(i)))
# run regression for each size of rating matrix, timing each
times <- sapply(setups, function(i)
system.time(svd.descent(i, n.genres = 10, …))[3])
20
# return data frame with matrix dimension and time elapsed
data.frame(dim = dims, seconds = times)
}
# Regression computation time for different numbers of iterations
time.its.svd <- function(setup, max.its = 200000, …){
# Create vector of number of iterations
its <- seq(10000, max.its, by = 10000)
# Run regression for each number of iterations, timing each
times <- sapply(its, function(i)
system.time(svd.descent(setup, n.genres = 10, n.iters=i))[3])
# Return results
data.frame(its = its, seconds = times)
}
####################################################
#### FUNCTIONS FOR LINEAR REGRESSION APPROACH ####
####################################################
# This function establishes indicator matrix (1 if movie has been rated, else 0)
# and special rating matrix (0 if missing value, else actual rating)
# to be used for later calculations.
reg.setup <- function(data, data.titles){
# Ensure data is the right kind of object (matrix)
if(class(data) != “matrix”) Y <- as.matrix(data)
# Get number of users and movies from matrix
n.movies <- nrow(data)
n.users <- ncol(data)
# Create indicator matrix, 1 if rated, 0 if not rated
Ind <- ifelse(is.na(data), 0, 1)
# Replace missing (non-rated) values in rating matrix with 0
Y <- ifelse(is.na(data), 0, data)
# Name the rows with the titles so they can be used later
rownames(Ind) <- data.titles
rownames(Y) <- data.titles
# Return a list with results
list(nm = n.movies, nu = n.users, Ind = Ind, Y = Y)
}
# This function performs the linear regression with gradient descent.
regression <- function(setup, n.genres, alpha = 0.001, lambda = 0.005, n.iters = 10000){
# Restrict number of genres to be no greater than number of movies
if(n.genres > nrow(setup$Y)) stop(“Cannot have more genres than movies.”)
# Initialize parameters to small random values
G <- matrix(runif(setup$nm * n.genres, -0.1, 0.1), nrow = setup$nm)
P <- matrix(runif(setup$nu * n.genres, -0.1, 0.1), nrow = setup$nu)
# Shorten some names used later
Y <- setup$Y
Ind <- setup$Ind
# Perform iterative gradient descent
for(i in 1:n.iters){
GPY <- G %*% t(P) – Y
G <- G – alpha * ((GPY * Ind) %*% P + lambda * G)
P <- P – alpha * ((t(GPY) * t(Ind)) %*% G + lambda * P)
21
}
# Compute final prediction, bounding predictions between 1 and 5
pred <- G %*% t(P)
pred <- ifelse(pred < 1, 1, pred)
pred <- ifelse(pred > 5, 5, pred)
# Compute RMSE
rmse <- sqrt(1/(sum(Ind))*sum(((pred – setup$Y)^2) * Ind))
# Return list with resutls
list(G = G, P = P, pred = pred, rmse = rmse)
}
######################################
#### FUNCTIONS FOR SVD APPROACH ####
######################################
# Unless otherwise annotated, these work similarly to
# the functions for linear regression.
svd.setup <- function(data, data.titles){
if(class(data) != “matrix”) Y <- as.matrix(data)
n.movies <- nrow(data)
n.users <- ncol(data)
Ind <- ifelse(is.na(data), 0, 1)
# compute mean rating for each movie
means <- rowMeans(data, na.rm = TRUE)
# for each movie replace missing values with mean rating for that movie
Y <- t(sapply(1:nrow(data), function(i)
ifelse(is.na(data[i,]), means[i], data[i,])))
rownames(Ind) <- data.titles
rownames(Y) <- data.titles
list(nm = n.movies, nu = n.users, Ind = Ind, Y = Y)
}
svd.descent <- function(setup, n.genres, alpha = 0.00001, n.iters=50000) {
if(n.genres > nrow(setup$Y)) stop(“Cannot have more genres than movies.”)
# Start with a singular value decomposition of the rating matrix
svd.mat <- svd(setup$Y)
# Choose the most important movie genres and sigmas as determined by the SVD
G <- svd.mat$u[, 1:n.genres]
P <- svd.mat$v[, 1:n.genres]
sigma <- diag(svd.mat$d[1:n.genres])
# Multiply the movie genre and sigma matrices together.
# We have now established our initial genre and preference
# matrices, G.s and P, for gradient descent.
G.s <- G %*% sigma
Ind <- setup$Ind
for(i in 1:n.iters){
error <- (G.s %*% t(P) – setup$Y) * Ind
G.s <- G.s – alpha * (error %*% P)
P <- P – alpha * (t(error) %*% G.s)
}
pred <- G.s %*% t(P)
pred <- ifelse(pred < 1, 1, pred)
pred <- ifelse(pred > 5, 5, pred)
rmse <- sqrt(1/(sum(Ind))*sum(((pred – setup$Y)^2) * Ind))
list(G = G.s, P = P, pred = pred, rmse = rmse)
}
22
############################
#### SET UP WORKSPACE ####
############################
# Set working directory where files will be stored
setwd(“C:/Users/Adam/Desktop/biostat/mat167/project/”)
# Read in movie data (creates data set from Movie Lens data)
mov <- read.table(“movie_lens_cols.txt”)
# Change names of variables
names(mov) <- c(“user”, “movie_id”, “rating”, “time”)
# Read in movie titles (creates data set of titles from Movie Lens data)
all.titles <- read.table(“titles.txt”, sep=”|”, quote=””)
all.titles <- titles[, 1:2] # grab only the relevant info: move ID and title
names(all.titles) <- c(“movie_id”, “title”)
# Make list of ratings by movie
# (in other words separate user ratings into groups by movie)
mov.list <- lapply(unique(mov$movie_id), function(id) subset(mov, movie_id==id))
# Now I use a custom function to get a rating matrix from among
# the most-rated movies. I use most-rated movies because I take
# a random subset of movies and users due to computational limitations, and
# I require at least one rating for each movie and user. A more dense
# matrix is easier to work with the purpose of this project.
ratings <- get.pops(mov.list, min.n.ratings=150, n.films=20, n.users=200)
# Now I use a custom function to get the titles of the selected movies.
titles <- get.titles(ratings, all.titles)
# Use custom functions to set up data that will be passed to
# my algorithms that perform the regression and SVD approaches.
for.reg <- reg.setup(ratings, titles)
for.svd <- svd.setup(ratings, titles)
# Perform small test example with a few users and movies
# for expository purposes
ratings.small <- get.pops(mov.list, 150, 6, 8)
titles.small <- get.titles(ratings.small, all.titles)
for.reg.small <- reg.setup(ratings.small, titles.small)
for.svd.small <- svd.setup(ratings.small, titles.small)
regression(for.reg.small, 3, n.iters = 20000)$pred
svd.descent(for.svd.small, 3, n.iters = 100000)$pred
############################
#### COMPUTATION TIME ####
############################
# Time regression and SVD approach on rating matrices with different number
# of users to determine matrix size on computational cost.
reg.times.dim <- time.dim.reg(mov.list, max.dim = 500)
svd.times.dim <- time.dim.svd(mov.list, max.dim = 500, n.iters=10000)
reg.times.its <- time.its.reg(for.reg)
svd.times.its <- time.its.svd(for.svd)
# Plot the results
plot(reg.times.dim, type = “l”, xlab = “number of users”,
ylab = “seconds to completion”, lwd = 3,
main = “Time for 10,000 Iterations with 20 Movies”, ylim = c(0, 14))
lines(svd.times.dim, lty = 2, lwd = 3)
legend(“topleft”, legend=c(“Regression”, “SVD”), lty=c(1, 2), lwd=2)
plot(reg.times.its, type = “l”, xlab = “number of iterations”,
23
ylab = “seconds to completion”, lwd = 3,
main = “Time for 200 Users and 20 Movies”, ylim = c(0, 120))
lines(svd.times.its, lty = 2, lwd = 3)
legend(“topleft”, legend=c(“Regression”, “SVD”), lty=c(1, 2), lwd=2)
#############################################
#### PLOT MOVIE TITLES BY LATENT GENRE ####
#############################################
# Use custom function to plot movies in genre space
gplot.reg(for.reg)
gplot.svd(for.svd)
#######################################################
#### BEHAVIOR OF RMSE BY DIFFERENT NUMBER GENRES ####
#######################################################
# Create vector of genres from 2 to 20
genres <- 2:20
# Fix function defaults, but loop over different number of genres
rmse.genre.reg <- sapply(genres, function(i) regression(for.reg, i)$rmse)
rmse.genre.svd <- sapply(genres, function(i) svd.descent(for.svd, i)$rmse)
plot(genres, rmse.genre.reg, xlab = “number of genres”, ylab = “RMSE”,
main = “RMSE by Number of Genres”, type = “l”, lwd = 3)
lines(genres, rmse.genre.svd, lty = 2, lwd = 3)
legend(“topright”, legend=c(“Regression”, “SVD”), lty=c(1, 2), lwd=2)
#############################################
#### BEHAVIOR OF RMSE BY LEARNING RATE ####
#############################################
# Create vector of learning rates appropriate to each method
alphas.reg <- seq(0.0005, 0.0035, by = 0.00025)
alphas.svd <- seq(2e-5, 9e-5, by = 1e-5)
# Fix function defaults, but loop over different learning rates
rmse.alpha.reg <- sapply(alphas.reg, function(i) regression(for.reg, 10, alpha = i)$rmse)
rmse.alpha.svd <- sapply(alphas.svd, function(i) svd.descent(for.svd, 10, alpha = i)$rmse)
plot(alphas.reg, rmse.alpha.reg, xlab = “Learning rate”, ylab = “RMSE”,
main = “RMSE by Learning Rate, Regression”, type = “l”, lwd = 3)
plot(alphas.svd[-8], rmse.alpha.svd[-8], xlab = “Learning rate”, ylab = “RMSE”,
main = “RMSE by Learning Rate, SVD”, type = “l”, lwd = 3, lty=2)
########################################################
#### BEHAVIOR OF RMSE BY REGULARIZATION PARAMETER ####
########################################################
# Create vector of regulariazation parameters
lambdas <- seq(0.1, 3, by = 0.1)
# Fix alpha = 0.001, genres = 10, iterations = 10000
rmse.lambda.reg <- sapply(lambdas, function(i) regression(for.reg, 10, lambda = i)$rmse)
plot(lambdas, rmse.lambda.reg, xlab = “Value of regularization parameter”, ylab = “RMSE”,
main = “RMSE by Regularization Parameter, Regression”, type = “l”, lwd = 3)
####################################################
#### BEHAVIOR OF RMSE BY NUMBER OF ITERATIONS ####
####################################################
# Create vector of number of iterations with the goal of plotting on same graph
its.reg <- c(seq(1000, 20000, by = 2000), seq(40000, 200000, by = 20000))
its.svd <- seq(10000, 200000, by = 10000)
24
# Fix function defaults, but loop over different number of iterations
rmse.its.reg <- sapply(its.reg, function(i) regression(for.reg, 10, n.iters = i)$rmse)
rmse.its.svd <- sapply(its.svd, function(i) svd.descent(for.svd, 10, n.iters = i)$rmse)
plot(its.reg, rmse.its.reg, xlab = “Number of iterations”, ylab = “RMSE”,
main = “RMSE by Number of Iterations”, type = “l”, lwd = 3,
ylim = c(0.05, max(rmse.its.svd)))
lines(its.svd, rmse.its.svd, lty=2, lwd=3)
legend(“topright”, legend = c(“Regression”, “SVD”), lty = c(1, 2), lwd = 2)
##################################################
#### DISTRIBUTION OF RMSE FOR A GIVEN MODEL ####
##################################################
# Compute RMSE multiple times to get distribution
rmse.dist.reg <- sapply(1:200, function(i) regression(for.reg, 10)$rmse)
plot(density(rmse.dist.reg), xlab = “Value of RMSE”,
main = “Sample Distibution of RMSE for a Given Model, Regression”, lwd = 3)
###############################################
#### PREDICTED VALUES VERSUS TRUE VALUES ####
###############################################
# REGRESSION
# Choose a bad model and a good model and plot prediction versus actual
model.bad.reg <- regression(for.reg, 2, alpha = 0.001, lambda = 0.01, n.iters = 100)
model.good.reg <- regression(for.reg, 15, alpha = 0.0025, lambda = 0.005, n.iters = 10000)
plot(ratings[2,], model.bad.reg$pred[2,], xlab = “Actual rating”, ylab = “Predicted rating”,
main = “Poor Model Choice, Regression (Lion King)”)
plot(ratings[2,], model.good.reg$pred[2,], xlab = “Actual rating”, ylab = “Predicted rating”,
main = “Good Model Choice, Regression (Lion King)”)
# SVD
# Choose a bad model and a good model and plot prediction versus actual
model.bad.svd <- svd.descent(for.svd, 2, alpha = 1e-6, n.iters = 10000)
model.good.svd <- svd.descent(for.svd, 15, alpha = 6e-5, n.iters = 200000)
plot(ratings[2,], model.bad.svd$pred[2,], xlab = “Actual rating”, ylab = “Predicted rating”,
main = “Poor Model Choice, SVD (Lion King)”)
plot(ratings[2,], model.good.svd$pred[2,], xlab = “Actual rating”, ylab = “Predicted rating”,
main = “Good Model Choice, SVD (Lion King)”)
################################
#### MAKE RECOMMENDATIONS ####
################################
# Store prediction results for regression and SVD approaches
result.reg <- regression(for.reg, 10)
result.svd <- svd.descent(for.svd, 10, n.iters = 100000)
# Store recommendations for all users
rec.reg <- recommend(result.reg, for.reg)
rec.svd <- recommend(result.svd, for.svd)
# Pick a few users and look at recommendations by each approach
rec.reg[c(29, 33, 38)]
rec.svd[c(29, 33, 38)]

Still stressed from student homework?
Get quality assistance from academic writers!
error: Content is protected !!
Open chat
1
Need assignment help? You can contact our live agent via WhatsApp using +1 718 717 2861

Feel free to ask questions, clarifications, or discounts available when placing an order.

Order your essay today and save 30% with the discount code LOVE