Comparing Recommendations Made by Online Systems and Friends

Comparing Recommendations Made by Online Systems and Friends

@jonathan
@jonathan
36 Followers
1 month ago 198

This presentation compares recommendations from online systems and friends, focusing on the quality of suggestions for books and movies. Results showed that friends consistently provided better recommendations, while online systems were appreciated for suggesting new and unexpected items. Users preferred interfaces that allowed for more input and offered detailed information, shedding light on usability and trust in recommender systems.

Comparing Recommendations Made by Online Systems and Friends

@jonathan1 month ago

Comparing Recommendations Made by Online Systems and Friends

Rashmi Sinha and Kirsten Swearingen SIMS, University of California Berkeley, CA 94720 {sinha, kirstens}@sims.berkeley.edu

Abstract: The quality of recommendations and usability of six online recommender systems (RS) was examined. Three book RS (Amazon.com, RatingZone & Sleeper) and three movie RS (Amazon.com, MovieCritic, Reel.com) were evaluated. Quality of recommendations was explored by comparing recommendations made by RS to recommendations made by the user's friends. Results showed that the user's friends consistently provided better recommendations than RS. However, users did find items recommended by online RS useful: recommended items were often 'new' and 'unexpected', while the items recommended by friends mostly served as reminders of previously identified interests. Usability evaluation of the RS showed that users did not mind providing more input to the system in order to get better recommendations. Also users trusted a system more if it recommended items that they had previously liked.

The logic behind this time-tested method is that one shares tastes in books, movies, music etc., with one's friends.

create a technological proxy for this social filtering process. The assumption behind many RS is that a good way to personalize recommendations for a user is to identify people with similar interests and recommend items that have interested these like-minded people (Resnick & Varian (1997), Goldberg, Nichols, Oki & Terry (1992)). This premise forms the statistical basis of most collaborative filtering algorithms. Since the goal of most RS is to replace (or at least augment) what is essentially a social process, we decided to directly compare the two ways of receiving recommendations (friends & online RS). Do users like receiving recommendations from an online system? How do the recommendations provided by online systems differ from

they know the user well, and have intimate knowledge of his / her tastes in a number of domains. In contrast, RS only have limited, domainthe sophistication of human judgment processes.

STUDY DESIGN

We based on differences in interfac this study, we only examined systems that relied upon explicit input. Systems studied were three book RS

        (Amazon's Recommendation Wizard, Sleeper and RatingZone's Quick Picks) and three movie RS (Amazon's Recommendation Wizard, Reel.com's Movie Matches, and MovieCritic).

        Independent Variables : (a) Source of Recommendations: Friend or online RS (b) Item Domain: Books or Movies (c) System itself.

        Dependent Measures other measures focused on interface issues. The dependent measures are described below.

        • (a) Quality of Recommendations : To evaluate the recommendations provided by online RS and by friends, we computed three metrics: Good Recommendations : This was a measure of the system's ability to provide recommendations that highly on this metric. This metric can be broken down further into two categories: (i) Useful Recommendations : These are recommendations that the user is interested in, and has not experienced before. This is the sum total of useful information a user gets from the system--ideas for books to read / movies to watch in the future. (ii) TrustGenerating Recommendations : These are recommendations that the user has had positive experiences with will also like "new" recommended items.
        • (b) Overall Satisfaction with recommendations and with online RS: We asked users to rate their overall satisfaction with the recommendations.
        • (c) Time Measures: We measured time spent registering and receiving recommendations from the system.
        • (d) Interface Issues Other interfac

        METHOD

        Participants University of California, Berkeley. ( Participant Details : Age range: 20 to 35 years; Gender Ratio: 6 males and 13 females; Technical Background: 9 worked in or were students in technology-related fields; the other 10 were studying or working in non technical fields). Participants were given the choice to explore books or movies. Each friends.

        Procedure : For each of the three book / movie recommendation systems (presented in random order), participants completed the following tasks: (a) Completed online registration process (if any) using a false e-mail address, so Rated the initial set of recommendations did not provide anything that was both new and interesting, participants were willing to try, or they grew tired of searching. (e) Completed satisfaction and usability questionnaire for each RS.

        The second part of the experiment involved the human recommenders. Participants gave us e-mail addresses for three friends familiar enough with their tastes to be able to recommend 3 books or movies. The only constraint was item recommended by a friend, users reviewed a plot synopsis and a cover image. They evaluated the friends' recommendations on the same dimensions as recommendations made by online RS.

        RESULTS

        Comparing Online Recommender Systems

              Number of items in initial set: The first metric by which we compared the RS was the number of items that the

              12 items to rate

              * Note: The totals are less than 100% in cases where individuals checked the "no opinion" option

              Table 1: Comparison of Recommender Systems: Number of Ratings Required and Recommendations Given

              the target user, the comparison between RS and friends was not interesting, and we focused on the differences

              between the various RS. The average number of items recommended by the various systems ranged between 7 & 20.

              Input and Output for the Recommender Systems: We

              examined the interface for RS

              from a number of perspectives (See Table 1). First we examined how people felt about the number of ratings only at Amazon (books) did a majority of users feel that the number of ratings required by the system was "just right." For the other two book sites, the opinions diverged in an interesting way- at Sleeper and RatingZone, about the same number of users felt that the system asked for too much information as felt that it required "not movie system

              again had mixed ratings, with the majority at RatingZone and nearly half at Amazon claiming there were too few results. On the other hand, a majority of users at all the movie sites felt that the number of results was just right.

              Time Measures: Next, we compared the RS on time taken to register at the website and to receive recommendations (See Table 2). Reel was the only website that did not ask people to register, while MovieCritic took the longest to both register and to receive recommendations. The two systems that took the least time to register and get recommendations (Reel and RatingZone) were the only systems not named as -0.20 0.00 0.20 0.40 -0.20 0.00 0.20 0.40

                    favorites in post-test interviews.

                    Interface Factors: Users were asked to indicate whether the system's navigation, layout, color, graphics or user instructions had a positive / negative impact on their experience (See Table 3 and Figure 2). Sleeper performed best overall, followed closely by

                    and navigation. Our analysis shows no correlation between graphics, color etc. and perceived ease of use / satisfaction.

                    Overall Usefulness of System

                    We also asked users to rate the overall usefulness and ease of use of each RS. Table 4 (below) shows the correlations between the rated usefulness and ease of use of a system with the other metrics we created. The table shows that the overall usefulness of a system correlated highly with % Good and Useful Recommendations. It also correlated with % Previously Experienced and % Trust-Generating Recommendations.

                    Ease of use correlates with aspects of the interface such as User Instructions and Navigation. Ease of use does not correlate with the number of ratings required to receive recommendations. This is interesting because it indicates that people do not mind spending a few minutes indicating their choices to receive quality recommendations. % Good Recs. Description of Item % Good Recs. Description of Item

                    The Description of Item ratings indicate whether users felt the system provided enough information for them to make a decision as to interested or not interested. This metric correlates highly both with overall usefulness of the system and ease of use.

                    Figure 3: Comparing Human Recommenders to RS Figure 3: Comparing Human Recommenders to RS

                    Comparing Recommendations made by Online RS and by Friends

                    The bulk of our analysis focused on comparing the quality of the recommendations made by friends and by RS on three metrics (Good, Useful, and Trust-Generating Recommendations).

                    Good and Useful Recommendations: Next we examined the differences in the quality of the recommendations provided by both friends and RS. As the Figure 3 shows, for Good recommendations, friends performed at significantly higher levels than

                          RS The same pattern was repeated for Useful recommendations (Friends=67., RS=32.57, t=5.26, p<.000). During a post-test interview we also asked users to indicate which gave them the best overall set of recommendations--one of the 3 online RS or their friends. Despite the friends' strong performance, 11 of the 19 users said they preferred an online RS: Amazon-Books (3), Amazon-Movies (3), Sleeper (3) and MovieCritic (2). This finding does not support our hypothesis that users would prefer recommendations made by friends over those made by online RS. We

                          propose possible explanations in the Discussion section.

                          Previously Experienced Recommendations: On average, the percentage of items previously experienced is higher for movies than for books (Movies = 37.4, Books = 17.79, t=3.89, p<.000). This suggests that it is easier for both RS and friends to tap into movies previously experienced by a user. This could be indicative of greater accuracy in movie predictions, or it could be indicative of a smaller universe of items for movies than for books. Of the items previously

                          experienced, a larger percentage of books fell in the Trust-Generating category (Movies =54.70, Books =89.88, t=3.88, p<.000).

                          Trust-Generating Recommendations: Recommended items that had been previously liked by users play a unique role in establishing the that Amazon had the highest % of trust-generating recommendations. (It should be noted that we had asked friends disadvantage for this metric.) In post-test interviews, 7 users cited the RS' ability to suggest items they had not heard of as a key advantage over recommendations offered by friends.

                          DISCUSSION AND DESIGN RECOMMENDATIONS

                          The quantitative results of our experiment indicate that users prefer recommendations made by their friends to those expressed a high level of overall satisfaction with online RS. Their qualitative responses in the post-test questionnaire indicate equally well. Therefore, we analyzed both the q for RS.

                          • 1. Users don't mind rating more items initially to receive quality recommendations . Our results indicate us
                          • 2. Allow users to provide initial ratings on a continuous rather than binary choice scale . Several interest level, rather than forcing them into making ratings on a binary choice or a 4-5 item scale.
                              • 3. Provide enough information about the recommended item for user to make a decision. Make this information readily available. The presence of longer descriptions of individual items correlated positively with both the usefulness and ease of use of RS. This indicates that users like to have detailed useful. This finding is reinforced by the difference between the two versions of Rating Zone. About midway through provide enough information and user evaluations were almost wholly negative as a result. User evaluations information was offered but users had trouble finding it, due to poor navigation design.
                              • 4. Provide easy ways to generate new recommendation sets. RatingZone's Quick Picks initially generated a very short list of items but did not offer the means to see more recommendations-users found themselves at a dead end in the system. For this reason, 3 of the 10 users found no useful recommendations at RatingZone.
                              • 5. Interface matters, mostly when it gets in the way. In designing the interface, navigation and layout seem r and Useful recommendations.

                              LIMITATIONS O F PRESENT STUDY / FUTURE PLANS

                              firstMovieCritic of a major source of strength--the opportunity to learn user preferences by accumulating information from different sources over t users knew which recommendations came from systems and which from their friends. In the post-test interviews, several users acknowledged that they simply had more faith in the quality of items recommended by friends. of the recommendations. Users will be asked to rate their level of interest in an item as before, but they will not find tudy in real life as in the lab.

                              REFERENCES

                              • · David Goldberg, Daniel Nichols, Brian M. Oki, and Douglas Terry. "Using Collaborative Filtering to Weave an Information Tapestry." Communications of the ACM, December 1992. 32(12)
                              • · Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins, "Eigentaste: A Constant-Time Collaborative Filtering Algorithm," Information Retrieval, accepted January 2001.
                              • · P. Resnick and H.R. Varian, "Recommender systems." Communications of the ACM, 1997. 40(3) 56-58
                              • · J. Ben Schafer, Joseph Konstan, and John Riedl. "Recommender Systems in E-Commerce." ACM Conference on Electronic Commerce 1999. http://www.cs.umn.edu/Research/GroupLens/ec-99.pdf
Comparing Recommendations Made by Online Systems and Friends
Rashmi Sinha and Kirsten Swearingen
…
1/6
(Amazon's Recommendation Wizard, Sleeper and RatingZone's Quick Picks) and three movie RS (Amazon's…
2/6
Number of items in initial set: The first metric by which we compared the RS was the number of item…
3/6
favorites in post-test interviews. 
Interface Factors: Users were 
asked to indicate whether the …
4/6
RS—again (the arrows on the friend bars indicate the RS averages). (Friends=85.44, RS=45.99, t=5.17…
5/6
3. Provide enough information about the recommended item for user to make a decision. Make this 
i…
6/6


  • Previous
  • Next
  • f Fullscreen
  • esc Exit Fullscreen
@jonathan

Share

Comparing Recommendations Made by Online Systems and Friends

Embed code


Swipe LEFT
to view Related

Scroll DOWN
to read doc

Login

OR

Forgot password?

Don't have an account? Sign Up