Computers.DataMining History

Hide minor edits - Show changes to markup

November 06, 2011, at 08:41 PM by 94.227.148.33 -
Changed line 19 from:

http://anthony.liekens.net/images/datamining-dogbert.png

to:

http://anthony.liekens.net/images/datamining-dogbert.png

Added line 165:
Added line 173:
Added line 179:
November 05, 2011, at 05:13 PM by 94.227.148.33 -
Changed lines 2-3 from:

Data mining musical profiles Anthony Liekens, March, 28-April, 2 2007

to:

Data mining musical profiles April, 2 2007

July 18, 2007, at 12:53 AM by 84.195.45.87 -
Changed lines 6-7 from:

See also:

to:

See also

July 18, 2007, at 12:52 AM by 84.195.45.87 -
Added lines 6-9:

See also:

Changed lines 95-96 from:

This tag cloud indeed gives a good indication of my personal listening profile. I have also written a more detailed discussion of single user tag clouds, and finding recommendations based on tags. There's also more examples of musical tag clouds, representing weekly statistics of several friends.

to:

I have written a few scripts so you can generate your own tag cloud, amongst many other things based on tag clouds. I have also written a more detailed discussion of single user tag clouds, and finding recommendations based on tags. There's also more examples of musical tag clouds, representing weekly statistics of several friends.

April 02, 2007, at 10:32 AM by 131.155.65.61 -
Changed lines 143-144 from:

Picturing 2840 points in a 735-dimensional space is quite troublesome, as we're only used to visualising at most 3 dimensional data. We need a way to flatten the 735 dimensions down to 2 or 3 dimensions in order to view it. Principal components analysis (PCA) is such a technique, and it's got some extra's that come in hand here. It allows us to flatten the highly dimensional space down to it's principal components, which gives a new set of dimensions, where the first dimension is the one that remains most of the initial data's variance. This trick thus allows us to map the vectors to a lower dimensional space, and if significant differences among sub populations exist in the initial data, it's likely to also observe them in this lower dimensional representation. Nice!

to:

Picturing 2840 points in a 735-dimensional space is quite troublesome, as we're only used to visualising at most 3 dimensional data. We need a way to flatten the 735 dimensions down to 2 or 3 dimensions in order to view it. Principal components analysis (PCA) is such a technique, and it's got some extra's that come in hand here. It allows us to flatten the highly dimensional space down to it's principal components, which gives a new set of dimensions, where the first dimension is the one that remains most of the initial data's variance. This trick thus allows us to map the vectors to a lower dimensional space, and if significant differences among sub populations exist in the initial data, it's likely to also observe them in this lower dimensional representation. Nice!

Changed lines 164-165 from:

K-Means clustering is a straightforward technique that tries to find a classification of the vectors, putting them in clusters of users that are similar in their musical preferences. Their definition to ending up in the same cluster, is that they are all closest to their cluster's centre point (with respect to Euclidian distance). When the number of clusters is set to 5, we get a clear separation of sub populations in Last.fm. Below is a depiction of the clusters, where each colour denotes a cluster. It is clear that the clustering algorithm found "indie", "rock" and "metal" to be three significant sub populations of Last.fm users.

to:

The k-means clustering algorithm is a straightforward technique that attempts to find a classification of the vectors, putting them in clusters of users that are similar in their musical preferences. Their definition to ending up in the same cluster, is that they are all closest to their cluster's centre point (with respect to Euclidian distance). When the number of clusters is set to 5, we get a clear separation of sub populations in Last.fm. Below is a depiction of the clusters, where each colour denotes a cluster. It is clear that the clustering algorithm found "indie", "rock" and "metal" to be three significant sub populations of Last.fm users.

April 02, 2007, at 01:15 AM by 84.194.127.106 -
Changed lines 19-20 from:

These sorts of information shows the (economic) value of large online communities. If you are a music label, festival organiser, or otherwise at work in the corporate music business, this information provides insights in the global market, and shows how to act to attend to a perfect target audience for your commercial activities. This is not necessarily a bad thing for the consumer of these products. Last.fm's service is completely free and legally co-operates with labels on a unique basis. The consumers receive a lot of free services in return, learn to enjoy new musical genres and artists, which they are likely to pay for, and can develop a social network based on their musical preferences. Consumers and producers both gain. (It's probably very obvious that I'm a big fan of Last.fm!)

to:

This sort of information shows the (economic) value of large online communities. If you are a music label, festival organiser, or otherwise at work in the corporate music business, this information provides insights in the global market, and shows how to act to attend to a perfect target audience for your commercial activities. This is not necessarily a bad thing for the consumer of these products. Last.fm's service is completely free and legally co-operates with labels on a unique basis. The consumers receive a lot of free services in return, learn to enjoy new musical genres and artists, which they are likely to pay for, and can develop a social network based on their musical preferences. Consumers and producers both gain. (It's probably very obvious that I'm a big fan of Last.fm!)

Changed lines 272-277 from:

This article will probably never be finished, and will evolve over time as I find out more stuff about these sub populations and their musical tag clouds. If you want to comment, share an idea, or know more about all this, e-mail me at anthony<at>liekens<dot>net. This article is an attempt to mix academic and informal writing styles, I hope it's not too confusing.

to:

This article will probably never be finished, and will evolve over time as I find out more stuff about these sub populations and their musical tag clouds. If you want to comment, share an idea, or know more about all this, e-mail me at

  • take my first name
  • add an "at" character
  • take my last name
  • add a "dot" character
  • add "net"
April 02, 2007, at 01:10 AM by 84.194.127.106 -
Changed line 272 from:

This article will probably never be finished, and will evolve over time as I find out more stuff about these sub populations and their musical tag clouds. If you want to comment, share an idea, or know more about all this, e-mail me at anthony<at>liekens<dot>net.

to:

This article will probably never be finished, and will evolve over time as I find out more stuff about these sub populations and their musical tag clouds. If you want to comment, share an idea, or know more about all this, e-mail me at anthony<at>liekens<dot>net. This article is an attempt to mix academic and informal writing styles, I hope it's not too confusing.

April 02, 2007, at 01:08 AM by 84.194.127.106 -
Changed lines 12-13 from:

Here, we have taken a sample of 2840 Last.fm users, their 28302 recorded artists and these artists' commonly used tags. We show how one can determine important groups of musical genres that can classify Last.fm's user base into separate groups, adopting a range of elementary data mining algorithms, such as principal components analysis and K-means clustering.

to:

Here, we have taken a sample of 2840 Last.fm users, their 28302 recorded artists and these artists' commonly used tags. The sample was taken in the last week of March 2007. We show how one can determine important groups of musical genres that can classify Last.fm's user base into separate groups, adopting a range of elementary data mining algorithms, such as principal components analysis and K-means clustering.

April 02, 2007, at 01:06 AM by 84.194.127.106 -
Changed lines 268-272 from:
  • In contrast with the "electronic" cluster, the "hip-hop" cluster is the most clearly defined cluster, with "rap" and "rnb," its close neighbours (and "hip hop," a different spelling). Fans of "hip-hop" do not usually mix other genres with theirs. They are a clearly defined group of fans that can serve as an easy target for marketing. If you're organising a "hip-hop" festival, don't hire a "punk rock" band or your festival will end in a drive-by shooting bonanza.
to:
  • In contrast with the "electronic" cluster, the "hip-hop" cluster is the most clearly defined cluster, with "rap" and "rnb," its close neighbours (and "hip hop," a different spelling). Fans of "hip-hop" do not usually mix other genres with theirs. They are a clearly defined group of fans that can serve as an easy target for marketing. If you're organising a "hip-hop" festival, don't hire a "punk rock" band or your festival will end in a drive-by shooting bonanza.

More?

This article will probably never be finished, and will evolve over time as I find out more stuff about these sub populations and their musical tag clouds. If you want to comment, share an idea, or know more about all this, e-mail me at anthony<at>liekens<dot>net.

April 02, 2007, at 12:59 AM by 84.194.127.106 -
Changed lines 4-5 from:

Abstract. Here's a preliminary data mining analysis of musical social networking service Last.fm. An automated classification into clusters or sub populations with related musical genres reveals the structure of musical preferences among the users in a relatively large sample population.

to:

Abstract. Here's a preliminary data mining analysis of musical social networking service Last.fm. An automated classification into clusters or sub populations with related musical genres reveals the structure of musical preferences among the users in a relatively large sample population. Musical tag clouds are adopted to characterise users and populations, which adds a highly descriptive value and aids with the interpretation of the results.

April 02, 2007, at 12:50 AM by 84.194.127.106 -
Changed lines 4-5 from:

Abstract. Here's a preliminary data mining analysis of musical social network service Last.fm. An automated classification into clusters or sub populations with related musical genres reveals the structure of musical preferences among the users in a relatively large sample population.

to:

Abstract. Here's a preliminary data mining analysis of musical social networking service Last.fm. An automated classification into clusters or sub populations with related musical genres reveals the structure of musical preferences among the users in a relatively large sample population.

April 02, 2007, at 12:49 AM by 84.194.127.106 -
Changed lines 4-5 from:

Abstract. Provided with data from musical social networking service Last.fm, here's a preliminary data mining analysis of a sample population. A classification into clusters or sub populations with related musical genres is provided.

to:

Abstract. Here's a preliminary data mining analysis of musical social network service Last.fm. An automated classification into clusters or sub populations with related musical genres reveals the structure of musical preferences among the users in a relatively large sample population.

April 02, 2007, at 12:46 AM by 84.194.127.106 -
Changed lines 4-5 from:

Abstract. Provided with data from musical social networking service Last.fm, here's a preliminary data mining analysis of a sample population and a classification into clusters or sub populations with related musical genres.

to:

Abstract. Provided with data from musical social networking service Last.fm, here's a preliminary data mining analysis of a sample population. A classification into clusters or sub populations with related musical genres is provided.

April 02, 2007, at 12:43 AM by 84.194.127.106 -
Changed line 267 from:
  • The "electronic" cluster packs a lot of different genres, and is the most versatile cluster. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. A re-clustering of this "electronic"/"pop"/"punk" sub population can be found here. The 6 main sub clusters of this big cluster are "pop", "japanese", "ambient", "electronic", "industrial" and "punk." The same is also true, sub populations of the 5 main clusters each define a more specific target audience with significantly different musical needs.
to:
  • The "electronic" cluster packs a lot of different genres, and is the most versatile cluster. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. A re-clustering of this "electronic"/"pop"/"punk" sub population can be found here. The 6 main sub clusters of this big cluster are "pop", "japanese", "ambient", "electronic", "industrial" and "punk." Sub populations of the other main clusters also define more specific target audiences with significantly different musical needs.
April 02, 2007, at 12:42 AM by 84.194.127.106 -
Changed lines 2-6 from:

Data mining musical profiles Anthony Liekens, March, 28-April, 1 2007

This article is still under construction, and will probably be finished gradually over the week (first of April, 2007). Please come back later if you want to read the final version.

to:

Data mining musical profiles Anthony Liekens, March, 28-April, 2 2007

April 02, 2007, at 12:38 AM by 84.194.127.106 -
Changed lines 269-274 from:
  • I've never heard of "indie" music. I thought it had a lot in common with "rock," and would have ended up in the same cluster, but this is expectation was false. The target audience for independent music is significantly separated from other, e.g., "rock". This can also be observed by the mutual exclusivity of "rock" and "indie" in their tag clouds. Moreover, "indie" represents over 20% of the sample population.
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. A re-clustering of this "electronic"/"pop"/"punk" sub population can be found here. The 6 main sub clusters of this big cluster are "pop", "japanese", "ambient", "electronic", "industrial" and "punk." In more cases, sub populations define a more specific target audience with significantly different musical needs.
  • In contrast with the "electronic" cluster, the "hip-hop" cluster is the most clearly defined cluster, with "rap" and "rnb," its close neighbours (and "hip hop," a different spelling). Fans of "hip-hop" do not usually mix other genres with theirs. They are a clearly defined group of fans that can serve as an easy target for marketing. If you're organising a "hip-hop" festival, don't hire a "punk rock" band or your festival will end in a drive-by shooting bonanza.

Conclusions

to:
  • I've never heard of "indie" music. I thought it had a lot in common with "rock," and would have ended up in the same cluster, but this is expectation was false. The target audience for independent music (over 20% of the sample population) is significantly separated from others, e.g., "rock". This can also be observed by the relative mutual exclusivity of "rock" and "indie" in their tag clouds.
  • The "electronic" cluster packs a lot of different genres, and is the most versatile cluster. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. A re-clustering of this "electronic"/"pop"/"punk" sub population can be found here. The 6 main sub clusters of this big cluster are "pop", "japanese", "ambient", "electronic", "industrial" and "punk." The same is also true, sub populations of the 5 main clusters each define a more specific target audience with significantly different musical needs.
  • In contrast with the "electronic" cluster, the "hip-hop" cluster is the most clearly defined cluster, with "rap" and "rnb," its close neighbours (and "hip hop," a different spelling). Fans of "hip-hop" do not usually mix other genres with theirs. They are a clearly defined group of fans that can serve as an easy target for marketing. If you're organising a "hip-hop" festival, don't hire a "punk rock" band or your festival will end in a drive-by shooting bonanza.
April 02, 2007, at 12:25 AM by 84.194.127.106 -
Changed line 270 from:
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. Sub populations of this "electronic"/"pop"/"punk" cluster is here. The 6 main sub clusters of this big cluster are "pop", "japanese", "ambient", "electronic", "industrial" and "punk." In more cases, sub populations define a more specific target audience with significantly different musical needs.
to:
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. A re-clustering of this "electronic"/"pop"/"punk" sub population can be found here. The 6 main sub clusters of this big cluster are "pop", "japanese", "ambient", "electronic", "industrial" and "punk." In more cases, sub populations define a more specific target audience with significantly different musical needs.
April 02, 2007, at 12:16 AM by 84.194.127.106 -
Changed line 270 from:
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. Sub populations of this "electronic"/"pop"/"punk" cluster is here.
to:
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. Sub populations of this "electronic"/"pop"/"punk" cluster is here. The 6 main sub clusters of this big cluster are "pop", "japanese", "ambient", "electronic", "industrial" and "punk." In more cases, sub populations define a more specific target audience with significantly different musical needs.
Changed lines 273-274 from:

Enough said.

to:

Conclusions

April 02, 2007, at 12:11 AM by 84.194.127.106 -
Changed line 270 from:
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. It would be interesting to study the sub-populations of the "electronic" cluster.
to:
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. Sub populations of this "electronic"/"pop"/"punk" cluster is here.
April 01, 2007, at 11:43 PM by 84.194.127.106 -
Deleted lines 176-182:

We find a classification of 5 clusters of music fans in our sample population:

  • Electronic/pop (828 users)
  • Rock (792 users)
  • Indie (589 users)
  • Metal (479 users)
  • Hip-hip (152 users)
Changed lines 260-261 from:

The clustering clearly separates different musical styles, with little or no overlap with respect to the musical genres represented in these tag clouds.

to:

The clustering clearly separates different musical styles, with little or no overlap with respect to the musical genres represented in these tag clouds:

  • Electronic/pop (828 users)
  • Rock (792 users)
  • Indie (589 users)
  • Metal (479 users)
  • Hip-hop (152 users)
April 01, 2007, at 11:39 PM by 84.194.127.106 -
Added lines 177-183:

We find a classification of 5 clusters of music fans in our sample population:

  • Electronic/pop (828 users)
  • Rock (792 users)
  • Indie (589 users)
  • Metal (479 users)
  • Hip-hip (152 users)
Changed line 271 from:
  • I've never heard of "indie" music. I thought it had a lot in common with "rock," and would have ended up in the same cluster, but this is expectation was false. The target audience for independent music is significantly separated from other, e.g., "rock". This can also be observed by the mutual exclusivity of "rock" and "indie" in their tag clouds. Moreover, "indie" is the second biggest cluster after the "electronic"/"pop" cluster.
to:
  • I've never heard of "indie" music. I thought it had a lot in common with "rock," and would have ended up in the same cluster, but this is expectation was false. The target audience for independent music is significantly separated from other, e.g., "rock". This can also be observed by the mutual exclusivity of "rock" and "indie" in their tag clouds. Moreover, "indie" represents over 20% of the sample population.
April 01, 2007, at 11:03 PM by 84.194.127.106 -
Changed lines 7-8 from:

Abstract. Provided with data from a musical social networking service, here's a preliminary data mining analysis of a sample population and a classification into clusters or sub populations with related musical genres. We adopt user tag clouds as a descriptive means to analyse and represent our findings.

to:

Abstract. Provided with data from musical social networking service Last.fm, here's a preliminary data mining analysis of a sample population and a classification into clusters or sub populations with related musical genres.

April 01, 2007, at 10:59 PM by 84.194.127.106 -
Changed lines 7-8 from:

Abstract. Provided with data from a musical social networking service, here's a preliminary data mining analysis and a classification of users in a sample population into clusters of sub population with related musical genres.

to:

Abstract. Provided with data from a musical social networking service, here's a preliminary data mining analysis of a sample population and a classification into clusters or sub populations with related musical genres. We adopt user tag clouds as a descriptive means to analyse and represent our findings.

April 01, 2007, at 10:57 PM by 84.194.127.106 -
Changed lines 7-8 from:

Abstract. Provided with data from a musical social networking service, we apply elementary data mining analysis procedures and create a classification of users into clusters of related musical genres.

to:

Abstract. Provided with data from a musical social networking service, here's a preliminary data mining analysis and a classification of users in a sample population into clusters of sub population with related musical genres.

April 01, 2007, at 10:46 PM by 84.194.127.106 -
Changed lines 13-16 from:

Music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample population in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different from the methodology adopted by Last.fm itself. In the latter approach, data mining is based on shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population and its main structure can be given.

Here, we study the structure of musical preferences in this social network. Based on the information in Last.fm's database, we can describe a user's profile by common tags or labels of its artists, with proportions of e.g., "rock," "jazz" or "hip-hop" as his preferred music genres. Here, we have taken a sample of 2840 Last.fm users, their 28302 recorded artists and these artists' commonly used tags. We show how one can determine important groups of musical genres that can classify Last.fm's user base into separate groups, adopting a range of elementary data mining algorithms, such as principal components analysis and K-means clustering.

to:

Here, we study the structure of musical preferences in a sample population of this social network. Based on the information in Last.fm's database, we can describe a user's profile by common tags or labels of its artists, with proportions of e.g., "rock," "jazz" or "hip-hop" as his preferred music genres. This approach is similar, but different from the methodology adopted by Last.fm itself. In the latter approach, data mining is based on shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population and its main structure can be given.

Here, we have taken a sample of 2840 Last.fm users, their 28302 recorded artists and these artists' commonly used tags. We show how one can determine important groups of musical genres that can classify Last.fm's user base into separate groups, adopting a range of elementary data mining algorithms, such as principal components analysis and K-means clustering.

April 01, 2007, at 10:43 PM by 84.194.127.106 -
Changed lines 7-8 from:

Abstract. Provided with data from a musical social networking service, we apply elementary data mining analysis procedures and create a classification of users into clusters of musical genres in the sample population.

to:

Abstract. Provided with data from a musical social networking service, we apply elementary data mining analysis procedures and create a classification of users into clusters of related musical genres.

April 01, 2007, at 10:41 PM by 84.194.127.106 -
Changed lines 7-8 from:

We use data from a musical social networking service, Last.fm, to study a sample population. Tag clouds are adopted as a clear depiction of a user's musical preferences. We provide an elementary data mining analysis of this data and create a classification of users into clusters of musical genres in the sample population.

to:

Abstract. Provided with data from a musical social networking service, we apply elementary data mining analysis procedures and create a classification of users into clusters of musical genres in the sample population.

April 01, 2007, at 10:39 PM by 84.194.127.106 -
Added lines 7-8:

We use data from a musical social networking service, Last.fm, to study a sample population. Tag clouds are adopted as a clear depiction of a user's musical preferences. We provide an elementary data mining analysis of this data and create a classification of users into clusters of musical genres in the sample population.

April 01, 2007, at 10:34 PM by 84.194.127.106 -
Added lines 15-17:

http://anthony.liekens.net/images/datamining-dogbert.png Don't be fooled by what's hidden in the data, and remember that 70% of all statistics is made up (that's a joke)

Deleted lines 21-23:

http://anthony.liekens.net/images/datamining-dogbert.png Don't be fooled by what's hidden in the data

April 01, 2007, at 10:33 PM by 84.194.127.106 -
Changed lines 2-3 from:

Data mining musical profiles Anthony Liekens, March, 30 2007

to:

Data mining musical profiles Anthony Liekens, March, 28-April, 1 2007

Changed lines 5-6 from:

This article is still under construction, and will probably be finished gradually over the week (first of April, 2007). Please come back later if you want to read the finished version.

to:

This article is still under construction, and will probably be finished gradually over the week (first of April, 2007). Please come back later if you want to read the final version.

April 01, 2007, at 10:31 PM by 84.194.127.106 -
Deleted lines 6-7:

http://anthony.liekens.net/images/datamining-dogbert.png

Added lines 19-21:

http://anthony.liekens.net/images/datamining-dogbert.png Don't be fooled by what's hidden in the data

Added lines 73-74:

10 largest components in the tag vector describing user aliekens

Added lines 138-139:

The top tags in the sample population of 2840 users

Added lines 150-151:

Sample population plotted by its first two principal components

Changed lines 161-162 from:
to:

Sample population plotted by its second and fourth principal components

Changed lines 168-169 from:
to:

Clustering of data into 5 clusters (principal components 1 and 2)

Changed lines 173-174 from:
to:

Clustering of data into 5 clusters (principal components 3 and 4)

April 01, 2007, at 10:26 PM by 84.194.127.106 -
Deleted lines 6-13:

Abstract

With the advent of Web 2.0 and related online services, every computer-savvy internet user has gained access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

In this article, I describe how music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample population in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different from the methodology adopted by Last.fm itself. In the latter approach, data mining is based on shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population and its main structure can be given.

As a result, we show that the population of Last.fm users can be separated into 5 clearly distinct populations of music lovers, one for each of the genres "indie", "rock", "metal", "hip-hop" and "electronic". We discuss some of the initial observations based on this preliminary data mining effort.

Added lines 13-14:

Music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample population in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different from the methodology adopted by Last.fm itself. In the latter approach, data mining is based on shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population and its main structure can be given.

Added lines 17-18:

As a result, we show that the population of Last.fm users can be separated into 5 clearly distinct populations of music lovers, one for each of the genres "indie", "rock", "metal", "hip-hop" and "electronic". We discuss some of the initial observations based on this preliminary data mining effort.

April 01, 2007, at 10:23 PM by 84.194.127.106 -
Changed lines 139-140 from:

The sample of users was gathered by a random walk on friends and neighbours lists in Last.fm's database, seeded by a set of random users. "Alternative" "rock" and "indie" are clearly the most common genres in the Last.fm sampled audience. It shows that this sample population (and probably all of Last.fm's user base) is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however an important target audience of media producers; a young, influential and evolving population of dedicated music lovers.

to:

The sample of users was gathered by a random walk on friends and neighbours lists in Last.fm's database, seeded by a set of random users. "Alternative" "rock" and "indie" are clearly the most common genres in the Last.fm sampled audience. It shows that this sample population (and probably all of Last.fm's user base) is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population, via this specific medium, is considered in this study of Last.fm's social music service. Personally, my Last.fm statistics are gathered while I'm at work on the computer, which requires a specific choice of genres. I listen to radio and C Ds while in the car, which offers different genres, for a different mood. The demographic represented by Last.fm's statistics is however an important target marketing audience; a young, influential and evolving population of dedicated music lovers.

April 01, 2007, at 10:17 PM by 84.194.127.106 -
Changed lines 5-6 from:

This article is still under construction, and will probably be finished gradually over the weekend. Please come back on Monday.

to:

This article is still under construction, and will probably be finished gradually over the week (first of April, 2007). Please come back later if you want to read the finished version.

April 01, 2007, at 07:57 PM by 84.194.127.106 -
Changed lines 15-18 from:

Just as a warning the following Dilbert cartoon, data mining and statistics can discover interesting properties, but also find nonsense if you're not watching closely:

http://anthony.liekens.net/images/datamining-dogbert.png

to:

http://anthony.liekens.net/images/datamining-dogbert.png

April 01, 2007, at 07:56 PM by 84.194.127.106 -
Changed line 259 from:
  • The "electronic" cluster packs together with a lot of different genres. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. It would be interesting to study the sub-populations of the "electronic" cluster.
to:
  • The "electronic" cluster packs together with a lot of different genres. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. It would be interesting to study the sub-populations of the "electronic" cluster.
April 01, 2007, at 07:50 PM by 84.194.127.106 -
Changed lines 149-150 from:

http://anthony.liekens.net/images/pca12.png

to:

Changed lines 159-160 from:

http://anthony.liekens.net/images/pca34.png

to:

Changed lines 165-166 from:

http://anthony.liekens.net/images/cluster12.png

to:

Changed lines 169-170 from:

http://anthony.liekens.net/images/cluster34.png

to:

April 01, 2007, at 07:46 PM by 84.194.127.106 -
Changed lines 101-102 from:

alternative 

to:

alternative 

Changed lines 139-140 from:

to:

Changed line 173 from:

alternative 

to:

alternative 

Changed lines 252-253 from:

to:

April 01, 2007, at 07:44 PM by 84.194.127.106 -
Changed line 80 from:

to:

Added lines 86-88:

psy trance  rap metal  rock 

Changed lines 91-95 from:

rap metal  rock  psy trance 

to:

The top 10 tags in my personal profile as a tag cloud

March 31, 2007, at 10:14 PM by 84.194.127.106 -
Changed lines 27-33 from:

Tag clouds and tag vectors

In my previous instalment on analysing Last.fm data, I explored how one could study the tag cloud of a user's musical preferences to discover new bands. These tag clouds or tag vectors will now serve as the basis of our data mining analysis.

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight or popularity 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of that tag's weight in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1.

The following are the 10 most important factors of my personal tag vector, which provide a clear description of my preferred music styles:

to:

Musical tag vectors and clouds

In my previous instalment on analysing Last.fm data, I explored how one could study the musical tag cloud of a user's musical preferences to discover new bands. These musical tag clouds or musical tag vectors will now serve as the basis of our data mining analysis.

The musical tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight or popularity 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of that tag's weight in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1.

The following are the 10 most important factors of my personal musical tag vector, which provide a clear description of my preferred music styles:

Changed lines 78-79 from:

A tag cloud is a descriptive illustration of these tags, where the font size is scaled linearly by the tag's weight in the tag vector. The following is my personal tag cloud, with my top 10 tags.

to:

A tag cloud is a descriptive illustration of these tags, where the font size is scaled linearly by the tag's weight in the tag vector. The following is my personal musical tag cloud, with my top 10 tags.

March 31, 2007, at 10:12 PM by 84.194.127.106 -
Changed lines 25-26 from:

These sorts of information shows the (economic) value of large online communities. If you are a music label, festival organiser, or otherwise at work in the corporate music business, this information provides insights in the global market, and shows how to act to attend to a perfect target audience for your commercial activities. This is not necessarily a bad thing for the consumer of these products. Last.fm's service is completely free and legally co-operates with labels on a unique basis. The consumers receive a lot of free services in return, learn to enjoy new musical genres and artists, which they are likely to pay for, and can develop a social network based on her musical preferences. Consumers and producers both gain. (It's probably very obvious that I'm a big fan of Last.fm!)

to:

These sorts of information shows the (economic) value of large online communities. If you are a music label, festival organiser, or otherwise at work in the corporate music business, this information provides insights in the global market, and shows how to act to attend to a perfect target audience for your commercial activities. This is not necessarily a bad thing for the consumer of these products. Last.fm's service is completely free and legally co-operates with labels on a unique basis. The consumers receive a lot of free services in return, learn to enjoy new musical genres and artists, which they are likely to pay for, and can develop a social network based on their musical preferences. Consumers and producers both gain. (It's probably very obvious that I'm a big fan of Last.fm!)

Changed lines 31-32 from:

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of tags in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1.

to:

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight or popularity 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of that tag's weight in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1.

March 31, 2007, at 10:05 PM by 84.194.127.106 -
Changed lines 9-10 from:

With the advent of Web 2.0 and related online services, every computer-savvy internet user has gained access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service.

to:

With the advent of Web 2.0 and related online services, every computer-savvy internet user has gained access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

March 31, 2007, at 03:08 PM by 87.66.69.205 -
Changed lines 2-3 from:

Data mining 2.0 Anthony Liekens, March, 30 2007

to:

Data mining musical profiles Anthony Liekens, March, 30 2007

March 31, 2007, at 12:43 AM by 84.194.127.106 -
Changed lines 164-165 from:

The two other clusters are not separated clearly when depicted with respect to its first two principal components, but the third and fourth principal components show that the remaining clusters separate users that listen to "hip-hop", and those that don't. It is clear that I belong to the "electronic" camp. "Electronica" and "electronic" are confusing labels, and are used interspersed to tag the same music. "Electronic" is the adjective used for music that is produced electronically, where "electronica" is the actual genre.

to:

The two other clusters are not separated clearly when depicted with respect to two principal components, but the third and fourth components show that the remaining clusters separate users that listen to "hip-hop", and those that don't. It is clear that I belong to the "electronic" camp. "Electronica" and "electronic" are confusing labels, and are used interspersed to tag the same music. "Electronic" is the adjective used for music that is produced electronically, where "electronica" is the actual genre.

Changed lines 256-257 from:
  • The "electronic" cluster packs together with a lot of different genres. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans.
  • In contrast with the "electronic" cluster, the "hip-hop" cluster is the most clearly defined cluster, together with "rap" and "rnb," its close neighbours. Fans of "hip-hop" do not usually mix in other genres with theirs. They are a clearly defined group of fans that can serve as an easy target for marketing.
to:
  • The "electronic" cluster packs together with a lot of different genres. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans. If you're into "japanese industrial electronic punk rock female vocalists," we have a bunch of friends for you. It would be interesting to study the sub-populations of the "electronic" cluster.
  • In contrast with the "electronic" cluster, the "hip-hop" cluster is the most clearly defined cluster, with "rap" and "rnb," its close neighbours (and "hip hop," a different spelling). Fans of "hip-hop" do not usually mix other genres with theirs. They are a clearly defined group of fans that can serve as an easy target for marketing. If you're organising a "hip-hop" festival, don't hire a "punk rock" band or your festival will end in a drive-by shooting bonanza.

Enough said.

March 31, 2007, at 12:31 AM by 84.194.127.106 -
Changed lines 9-12 from:

With the advent of Web 2.0 and related online services, every computer-savvy internet user has gained access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

In this article, I describe how music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample population in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different than the methodology adopted by Last.fm. In the latter, data mining works on a basis of shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population can be given.

to:

With the advent of Web 2.0 and related online services, every computer-savvy internet user has gained access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service.

In this article, I describe how music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample population in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different from the methodology adopted by Last.fm itself. In the latter approach, data mining is based on shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population and its main structure can be given.

Changed lines 15-16 from:

Just as a warning -- data mining and statistics can discover interesting properties, but also find nonsense if you're not watching closely -- the following Dilbert cartoon:

to:

Just as a warning the following Dilbert cartoon, data mining and statistics can discover interesting properties, but also find nonsense if you're not watching closely:

Changed lines 93-98 from:

This tag cloud indeed gives a good indication of my personal listening profile. I have also written a more detailed discussion of single user tag clouds, and finding recommendations based on tags is. There's also more examples of musical tag clouds, representing weekly statistics of several friends.

In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags among their tag vectors, although the weights in most tags can be discarded as marginal.

With our random sample consisting of 2840 users, 735 tags are used by all users if their tags are limited to the 10 most important. All 2840 users in the random sample are now considered as 735 dimensional vectors, where only their 10 most important tag vector components are nonzero.

to:

This tag cloud indeed gives a good indication of my personal listening profile. I have also written a more detailed discussion of single user tag clouds, and finding recommendations based on tags. There's also more examples of musical tag clouds, representing weekly statistics of several friends.

In the rest of this article, we only use the 10 largest components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags among their tag vectors, although the weights in most tags can be discarded as marginal. With our random sample consisting of 2840 users, 735 tags are used by all users if their tags are limited to the 10 most important. All 2840 users in the random sample are now considered as 735 dimensional vectors, where only their 10 most important tag vector components are nonzero.

Changed lines 138-139 from:

"Alternative" "rock" and "indie" are clearly the most common genres in the Last.fm audience. It shows that this sample population (and probably all of Last.fm's user base) is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however an important target audience of media producers; a young, influential and evolving population of dedicated music lovers.

to:

The sample of users was gathered by a random walk on friends and neighbours lists in Last.fm's database, seeded by a set of random users. "Alternative" "rock" and "indie" are clearly the most common genres in the Last.fm sampled audience. It shows that this sample population (and probably all of Last.fm's user base) is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however an important target audience of media producers; a young, influential and evolving population of dedicated music lovers.

Changed lines 142-145 from:

Picturing 2840 points in a 735-dimensional space is quite troublesome, as we're only used to at most 3 dimensional data. We need a way to flatten the 725-dimensions down to 2 or 3 dimensions in order to view it. Principal components analysis (PCA) is such a technique, and it's got some extra's that come in hand here. It allows us to flatten the highly dimensional space down to it's principal components, which gives a new set of dimensions, where the first dimension is the one that remains most of the initial data's variance. This trick thus allows us to map the vectors to a lower dimensional space, and if significant differences among sub populations exist in the initial data, it's likely to also observe them in this lower dimensional representation. Nice!

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how their unit vectors are mapped onto the first two components. My personal profile is also highlighted and labelled "aliekens".

to:

Picturing 2840 points in a 735-dimensional space is quite troublesome, as we're only used to visualising at most 3 dimensional data. We need a way to flatten the 735 dimensions down to 2 or 3 dimensions in order to view it. Principal components analysis (PCA) is such a technique, and it's got some extra's that come in hand here. It allows us to flatten the highly dimensional space down to it's principal components, which gives a new set of dimensions, where the first dimension is the one that remains most of the initial data's variance. This trick thus allows us to map the vectors to a lower dimensional space, and if significant differences among sub populations exist in the initial data, it's likely to also observe them in this lower dimensional representation. Nice!

The following picture depicts the data, flattened onto a 2-dimensional picture, which explains a quarter of the initial data's variance. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how their unit vectors are mapped onto the first two components. My personal profile is also highlighted and labelled "aliekens".

Changed lines 150-153 from:

It seems that there is a spectrum of musical genres in the population of Last.fm users. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". Users seem distributed over this spectrum. As a data point is located closer to the centre of the data, the user is less biased towards the genre, and mixes in others. When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations. Note that the tag vector for the "pop" genre is at the centre of our plot, denoting that "pop" is associated with a lot of different genres.

The "seen live" tag is very popular among Last.fm users, labelling artists that they have seen live at a concert or festival. This tag doesn't show a musical genre, but is related to rock/indie listeners. This is probably due to the fact that these music types are generally known to perform live more often than other styles, but this claim is unsupported.

to:

It seems that there is a spectrum of musical genres in the population of Last.fm users. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". Users seem distributed over this spectrum. As a data point is located closer to the centre of the data, the user is less biased towards the genre, and mixes in others. When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations. Note that the tag vector for the "pop" genre is at the centre of our plot, denoting that "pop" averages over the above genres.

The "seen live" tag is very popular among Last.fm users, labelling artists that they have seen live at a concert or festival. This tag doesn't show a musical genre, but is related to "rock"/"indie" listeners. This is probably due to the fact that these music types are generally known to perform live more often than other styles, but this claim is unsupported.

Changed lines 164-165 from:

The two other clusters are not separated clearly when depicted with respect to its first two principal components, but the third and fourth principal components show that the (red and green) clusters separate users that listen to "hip-hop", and those that don't. It is clear that I belong to the "electronic" camp.

to:

The two other clusters are not separated clearly when depicted with respect to its first two principal components, but the third and fourth principal components show that the remaining clusters separate users that listen to "hip-hop", and those that don't. It is clear that I belong to the "electronic" camp. "Electronica" and "electronic" are confusing labels, and are used interspersed to tag the same music. "Electronic" is the adjective used for music that is produced electronically, where "electronica" is the actual genre.

March 31, 2007, at 12:13 AM by 84.194.127.106 -
Changed lines 99-100 from:

The following is a tag cloud representing the top 50 tags in the common pool of genres.

to:

The following is a tag cloud representing the top tags in the common pool of genres in our sample population.

March 31, 2007, at 12:11 AM by 84.194.127.106 -
Changed lines 93-94 from:

This tag cloud indeed gives a good indication of my personal listening profile. In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags among their tag vectors, although the weights in most tags can be discarded as marginal.

to:

This tag cloud indeed gives a good indication of my personal listening profile. I have also written a more detailed discussion of single user tag clouds, and finding recommendations based on tags is. There's also more examples of musical tag clouds, representing weekly statistics of several friends.

In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags among their tag vectors, although the weights in most tags can be discarded as marginal.

March 31, 2007, at 12:03 AM by 84.194.127.106 -
Added lines 15-16:

Just as a warning -- data mining and statistics can discover interesting properties, but also find nonsense if you're not watching closely -- the following Dilbert cartoon:

March 31, 2007, at 12:01 AM by 84.194.127.106 -
Deleted line 173:

hard rock 

Deleted lines 185-186:

electronic  folk 

Deleted line 198:

punk rock 

Changed lines 206-207 from:

alternative 

to:

80s  alternative 

Added line 209:

anime 

Added lines 211-212:

classic rock  classical 

Added lines 214-215:

ebm  electro 

Added line 218:

experimental 

Added line 220:

gothic 

Added lines 222-223:

hip-hop  house 

Added lines 230-232:

jpop  metal  new wave 

Added lines 234-235:

progressive rock  psytrance 

Added line 237:

rnb 

Added line 239:

russian 

Added lines 241-243:

soul  soundtrack  techno 

Added line 246:

visual kei 

Changed lines 249-250 from:

The cluster clearly separates different musical styles, with little or no overlap with respect to the musical genres represented in these tag clouds.

to:

The clustering clearly separates different musical styles, with little or no overlap with respect to the musical genres represented in these tag clouds.

March 30, 2007, at 11:59 PM by 84.194.127.106 -
Deleted lines 133-280:

alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 
alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 
alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 
alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 

March 30, 2007, at 11:53 PM by 84.194.127.106 -
Changed lines 97-146 from:

80s  alternative  alternative rock  ambient  black metal  chillout  classic rock  dance  death metal  doom metal  electronic  electronica  emo  experimental  female vocalists  folk  german  gothic metal  hard rock  hardcore  heavy metal  hip hop  hip-hop  idm  indie  indie rock  industrial  j-pop  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rnb  rock  russian  screamo  seen live  singer-songwriter  soundtrack  symphonic metal  trance  trip-hop 

to:

alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 
alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 
alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 
alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 
alternative  alternative rock  ambient  black metal  chillout  classic rock  death metal  electronic  electronica  emo  experimental  female vocalists  folk  gothic metal  hardcore  heavy metal  hip-hop  idm  indie  indie rock  industrial  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rock  seen live  singer-songwriter  soundtrack 

Changed lines 284-285 from:

"Alternative" "rock" and "indie" are clearly the most common genres in the Last.fm audience. It is clearly shows that this sample population (and probably all of Last.fm's user base) is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however the target audience of media producers; a young, influential and evolving population of dedicated music lovers.

to:

"Alternative" "rock" and "indie" are clearly the most common genres in the Last.fm audience. It shows that this sample population (and probably all of Last.fm's user base) is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however an important target audience of media producers; a young, influential and evolving population of dedicated music lovers.

March 30, 2007, at 11:47 PM by 84.194.127.106 -
Changed lines 95-96 from:

It is obviously true that this sample population is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however the target audience of media producers; a young, influential and evolving population of dedicated music lovers.

to:

The following is a tag cloud representing the top 50 tags in the common pool of genres.

80s  alternative  alternative rock  ambient  black metal  chillout  classic rock  dance  death metal  doom metal  electronic  electronica  emo  experimental  female vocalists  folk  german  gothic metal  hard rock  hardcore  heavy metal  hip hop  hip-hop  idm  indie  indie rock  industrial  j-pop  japanese  jazz  melodic death metal  metal  metalcore  pop  power metal  progressive metal  progressive rock  punk  punk rock  rap  rnb  rock  russian  screamo  seen live  singer-songwriter  soundtrack  symphonic metal  trance  trip-hop 

"Alternative" "rock" and "indie" are clearly the most common genres in the Last.fm audience. It is clearly shows that this sample population (and probably all of Last.fm's user base) is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however the target audience of media producers; a young, influential and evolving population of dedicated music lovers.

Changed lines 250-251 from:
  • The "electronic" cluster packs together with a lot of different genres. Attempts to separate this cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans.
to:
  • The "electronic" cluster packs together with a lot of different genres. Attempts to separate this "electronic" cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans.
  • In contrast with the "electronic" cluster, the "hip-hop" cluster is the most clearly defined cluster, together with "rap" and "rnb," its close neighbours. Fans of "hip-hop" do not usually mix in other genres with theirs. They are a clearly defined group of fans that can serve as an easy target for marketing.
March 30, 2007, at 11:29 PM by 84.194.127.106 -
Changed line 195 from:
  • I've never heard of "indie" music. I thought it had a lot in common with "rock," and would have ended up in the same cluster, but this is expectation was false. Independent music has a clearly separated target audience. Moreover, it is the second biggest cluster after the "electronic"/"pop" cluster.
to:
  • I've never heard of "indie" music. I thought it had a lot in common with "rock," and would have ended up in the same cluster, but this is expectation was false. The target audience for independent music is significantly separated from other, e.g., "rock". This can also be observed by the mutual exclusivity of "rock" and "indie" in their tag clouds. Moreover, "indie" is the second biggest cluster after the "electronic"/"pop" cluster.
March 30, 2007, at 11:27 PM by 84.194.127.106 -
Changed lines 111-112 from:

The electronic genres "hip-hop" and "electronic" are not clearly separated in this 2-dimensional representation of our data. We can show the data mapped onto its 3rd and 4th principal components, as follows, which shows the separation of "hip-hop" and "electronic" music styles very clearly.

to:

The genres "hip-hop" and "electronic" are not clearly separated in this 2-dimensional representation of our data. We can show the data mapped onto its 3rd and 4th principal components, as follows, which shows the separation of "hip-hop" and "electronic" music styles very clearly.

Changed lines 117-120 from:

The following is still under construction

K-Means clustering is a straightforward technique that tries to find a classification of the vectors, putting them in clusters of users that are similar in their musical preferences. When the number of clusters is set to 5, we get a clear separation of sub populations in Last.fm. Below is a depiction of the clusters, where each colour denotes a cluster. It is clear that the clustering algorithm found "indie", "rock" and "metal" to be three significant sub populations of Last.fm users.

to:

K-Means clustering is a straightforward technique that tries to find a classification of the vectors, putting them in clusters of users that are similar in their musical preferences. Their definition to ending up in the same cluster, is that they are all closest to their cluster's centre point (with respect to Euclidian distance). When the number of clusters is set to 5, we get a clear separation of sub populations in Last.fm. Below is a depiction of the clusters, where each colour denotes a cluster. It is clear that the clustering algorithm found "indie", "rock" and "metal" to be three significant sub populations of Last.fm users.

Changed lines 121-122 from:

The two other clusters are not separated clearly when depicted with respect to its first two principal components, but the third and fourth principal components show that the (red and green) clusters separate users that listen to "hip-hop", and those that don't.

to:

The two other clusters are not separated clearly when depicted with respect to its first two principal components, but the third and fourth principal components show that the (red and green) clusters separate users that listen to "hip-hop", and those that don't. It is clear that I belong to the "electronic" camp.

Added lines 125-126:

For each of these clusters, we can build the common tag cloud, describing the average of musical genres that are enjoyed by the users in the clusters.

Changed lines 191-196 from:

Conclusion and Discussion

to:

The cluster clearly separates different musical styles, with little or no overlap with respect to the musical genres represented in these tag clouds.

The following is a list of random observations

  • A lot of big musical styles ("classical", "jazz", "blues", "folk", "reggae", to name a few) are not clearly represented in these tag clouds, and show that there is not a significant population of listeners of these genres at Last.fm, or it signifies that fans of these genres mix with different styles, causing these genres to fade from the classification.
  • I've never heard of "indie" music. I thought it had a lot in common with "rock," and would have ended up in the same cluster, but this is expectation was false. Independent music has a clearly separated target audience. Moreover, it is the second biggest cluster after the "electronic"/"pop" cluster.
  • The "electronic" cluster packs together with a lot of different genres. Attempts to separate this cluster into smaller ones by choosing a higher number of clusters failed, this cluster remained in tact. Instead, the first cluster to split up is the "metal" cluster, with significant sub populations of general "metal" fans, and "black metal" fans.
March 30, 2007, at 11:08 PM by 84.194.127.106 -
Added lines 4-6:

This article is still under construction, and will probably be finished gradually over the weekend. Please come back on Monday.

March 30, 2007, at 11:02 PM by 84.194.127.106 -
Changed lines 92-99 from:

It is obviously true that this sample population is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however the target audience of young and evolving population of dedicated music lovers.

Principal component analysis

Picturing 2840 points in a 735-dimensional space is quite troublesome, as we're only used to at most 3 dimensional data. We need a way to flatten the 725-dimensions down to 2 or 3 dimensions in order to view it. Principal component analysis (PCA) is such a technique, and it's got some extra's that come in hand here. It allows us to flatten the highly dimensional space down to it's principal components, which gives a new set of dimensions, where the first dimension is the one that remains most of the initial data's variance. This trick thus allows us to map the vectors to a small number, and if significant differences among populations exist in the initial data, we're most probably likely to also see them in this lower dimensional representation. Nice!

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how their unit vectors are mapped onto the first two components. My personal profile is also highlighted and labelled "aliekens.

to:

It is obviously true that this sample population is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however the target audience of media producers; a young, influential and evolving population of dedicated music lovers.

Principal components analysis

Picturing 2840 points in a 735-dimensional space is quite troublesome, as we're only used to at most 3 dimensional data. We need a way to flatten the 725-dimensions down to 2 or 3 dimensions in order to view it. Principal components analysis (PCA) is such a technique, and it's got some extra's that come in hand here. It allows us to flatten the highly dimensional space down to it's principal components, which gives a new set of dimensions, where the first dimension is the one that remains most of the initial data's variance. This trick thus allows us to map the vectors to a lower dimensional space, and if significant differences among sub populations exist in the initial data, it's likely to also observe them in this lower dimensional representation. Nice!

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how their unit vectors are mapped onto the first two components. My personal profile is also highlighted and labelled "aliekens".

Changed lines 102-105 from:

The first component separates users that listen to "indie" and "rock" from those that listen to "electronic", "hip-hop" and "metal". The second component separates "indie" from "rock" and it differentiates between "metal" on one hand and "hip-hop" and "electronic" on the other.

It seems that there is a spectrum of musical genres in the population of Last.fm users, and many users have a distinctive choice in a direction of this spectrum. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations. As a data point is closer to the centre of this plot, the more the user adopts a mix of these styles. Note that the tag vector for the "pop" genre is at the centre of our plot, denoting that "pop" is indeed associated with a lot of different genres.

to:

The first component separates users that listen to "indie" and "rock" (on the left) from those that listen to "electronic", "hip-hop" and "metal" (on the right). The second component separates "indie" (top) from "rock" (bottom) and it differentiates between "metal" (bottom) on one hand and "hip-hop" and "electronic" (top) on the other. There is no clear distinction between "electronic" and "hip-hop" so it is unclear how the large "blob" at the top right constitutes of "hip-hop" and "electronic" fans.

It seems that there is a spectrum of musical genres in the population of Last.fm users. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". Users seem distributed over this spectrum. As a data point is located closer to the centre of the data, the user is less biased towards the genre, and mixes in others. When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations. Note that the tag vector for the "pop" genre is at the centre of our plot, denoting that "pop" is associated with a lot of different genres.

March 30, 2007, at 10:45 PM by 84.194.127.106 -
Changed lines 8-11 from:

In this article, I describe how music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different than the methodology adopted by Last.fm. In the latter, data mining works on a basis of shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population can be given.

As a result, we show that the population of Last.fm users can be separated into 5 clearly distinct populations of music lovers, one for each of the genres "indie", "rock", "metal", "hip-hop" and "electronic". We discuss some of the of the initial observations based on this preliminary data mining effort.

to:

In this article, I describe how music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample population in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different than the methodology adopted by Last.fm. In the latter, data mining works on a basis of shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population can be given.

As a result, we show that the population of Last.fm users can be separated into 5 clearly distinct populations of music lovers, one for each of the genres "indie", "rock", "metal", "hip-hop" and "electronic". We discuss some of the initial observations based on this preliminary data mining effort.

Changed lines 16-17 from:

Last.fm allows its users to register their musical listening habits; Every time a user plays a music track, this track's information is sent to Last.fm's servers. Based on the information that was gathered over a longer period, the Last.fm service generates a list of users (neighbours) who listen to similar artists and tracks. On the basis of this information, Last.fm than recommends artists and songs to its users. This approach works surprisingly well and provides very valuable information for users that want to expand their musical horizon. If two users listen to a big common pool of artists, it is indeed highly probable that they will also enjoy each other's mutual exclusive artists.

to:

Last.fm allows its users to register their musical listening habits; Every time a user plays a music track, this track's information is sent to Last.fm's servers. Based on the information that was gathered over a longer period, the Last.fm service generates a list of users (neighbours) who listen to similar artists and tracks. On the basis of this information, Last.fm then recommends artists and songs to its users. This approach works surprisingly well and provides very valuable information for users that want to expand their musical horizon. If two users listen to a big common pool of artists, it is indeed highly probable that they will also enjoy each other's mutual exclusive artists.

Changed lines 20-21 from:

These sorts of information shows the (economic) value of large online communities. If you are a music label, festival organiser, or otherwise at work in the corporate music business, this information provides insights in the global market, and shows how to act to attend to a perfect target audience for your commercial activities. This is not necessarily a bad thing for the consumer of these products. Last.fm's service is completely free, co-operates with labels on a unique basis

to:

These sorts of information shows the (economic) value of large online communities. If you are a music label, festival organiser, or otherwise at work in the corporate music business, this information provides insights in the global market, and shows how to act to attend to a perfect target audience for your commercial activities. This is not necessarily a bad thing for the consumer of these products. Last.fm's service is completely free and legally co-operates with labels on a unique basis. The consumers receive a lot of free services in return, learn to enjoy new musical genres and artists, which they are likely to pay for, and can develop a social network based on her musical preferences. Consumers and producers both gain. (It's probably very obvious that I'm a big fan of Last.fm!)

Changed lines 24-27 from:

In my previous instalment on analysing Last.fm, I explored how one could study the tag cloud of a user's musical preferences to discover new bands. These tag clouds or tag vectors will now serve as the basis of our data mining analysis.

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of tags in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1. A tag cloud is a descriptive illustration of these tags, where the font size is scaled linearly by the tag's weight in the tag vector.

to:

In my previous instalment on analysing Last.fm data, I explored how one could study the tag cloud of a user's musical preferences to discover new bands. These tag clouds or tag vectors will now serve as the basis of our data mining analysis.

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of tags in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1.

Changed lines 73-74 from:

In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags among their tag vectors, although the weights in most tags can be discarded as marginal.

to:

A tag cloud is a descriptive illustration of these tags, where the font size is scaled linearly by the tag's weight in the tag vector. The following is my personal tag cloud, with my top 10 tags.

alternative  ambient  dance  electronic  electronica  techno  trance  rap metal  rock  psy trance 

This tag cloud indeed gives a good indication of my personal listening profile. In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags among their tag vectors, although the weights in most tags can be discarded as marginal.

March 30, 2007, at 09:30 PM by 84.194.127.106 -
Changed lines 8-9 from:

In this article, I describe how users can be represented mathematically based on their musical preferences. With these user profiles', we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base.

to:

In this article, I describe how music fans can be represented mathematically based on their musical preferences. With a big sample of such user profiles, we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base. This approach is similar, but different than the methodology adopted by Last.fm. In the latter, data mining works on a basis of shared artists and tracks. By adopting tags that describe a user's musical preferred genres, a more descriptive, and dimensionally less complex (and thus mathematically simpler) description of the population can be given.

March 30, 2007, at 09:22 PM by 84.194.127.106 -
Added lines 77-78:

It is obviously true that this sample population is not a good representation of the whole population of music fans. It's the demographic with internet access, people who listen to their music on a computer, and have subscribed to gather all their statistics to a public server, openly for everyone to see. Only the music played by this small population is considered in this study of Last.fm's social music service. This demographic is however the target audience of young and evolving population of dedicated music lovers.

March 30, 2007, at 04:50 PM by 131.155.65.61 -
Changed lines 81-82 from:

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how their unit vectors are mapped onto the first two components.

to:

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how their unit vectors are mapped onto the first two components. My personal profile is also highlighted and labelled "aliekens.

Changed lines 87-88 from:

It seems that there is a spectrum of musical preferences that is enjoyed by Last.fm users, and many users have a distinctive choice in a direction of this spectrum. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations. As a data point is closer to the centre of this plot, the more the user adopts a mix of these styles. Note that the tag vector for the "pop" genre is at the centre of our plot, denoting that "pop" is indeed associated with a lot of different genres.

to:

It seems that there is a spectrum of musical genres in the population of Last.fm users, and many users have a distinctive choice in a direction of this spectrum. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations. As a data point is closer to the centre of this plot, the more the user adopts a mix of these styles. Note that the tag vector for the "pop" genre is at the centre of our plot, denoting that "pop" is indeed associated with a lot of different genres.

March 30, 2007, at 03:38 PM by 131.155.65.61 -
Changed lines 6-7 from:

With Web 2.0, every computer-savvy internet user gains access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

to:

With the advent of Web 2.0 and related online services, every computer-savvy internet user has gained access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

March 30, 2007, at 03:37 PM by 131.155.65.61 -
Changed lines 2-7 from:

Data mining 2.0 March, 28 2007

With the advent of Web 2.0, everyone has gained access to extremely valuable, but open and free information, describing our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

In this article, I describe how users can be represented as vectors in a highly dimensional space representing their musical preferences. The dimensions for this space are defined by the tags that describe their musical preferences. These vectors can also be shown as "musical tag clouds" describing the user's preferences. With these tag vectors, we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base.

to:

Data mining 2.0 Anthony Liekens, March, 30 2007

Abstract

With Web 2.0, every computer-savvy internet user gains access to extremely valuable, but open and free information, describing the habits and preferences of our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

In this article, I describe how users can be represented mathematically based on their musical preferences. With these user profiles', we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base.

March 30, 2007, at 03:35 PM by 131.155.65.61 -
Changed line 105 from:

alternative 

to:

alternative 

Changed line 123 from:

alternative 

to:

alternative 

Changed line 131 from:

alternative 

to:

alternative 

Changed line 142 from:

hip hop 

to:

hip hop 

Changed line 147 from:

alternative 

to:

alternative 

Changed lines 167-168 from:

to:

March 30, 2007, at 03:29 PM by 131.155.65.61 -
Changed lines 33-36 from:

= electronica

0.472839
trance

to:

electronica

Changed line 35 from:

= 0.306708

to:

= 0.472839

Changed line 37 from:

techno

to:

trance

Changed line 39 from:

= 0.281446

to:

= 0.306708

Changed line 41 from:

ambient

to:

techno

Changed line 43 from:

= 0.257368

to:

= 0.281446

Changed line 45 from:

dance

to:

ambient

Changed line 47 from:

= 0.179112

to:

= 0.257368

Changed line 49 from:

alternative

to:

dance

Changed line 51 from:

= 0.132631

to:

= 0.179112

Changed line 53 from:

rap metal

to:

alternative

Changed line 55 from:

= 0.0949674

to:

= 0.132631

Changed line 57 from:

rock

to:

rap metal

Changed line 59 from:

= 0.0919889

to:

= 0.0949674

Changed line 61 from:

psytrance

to:

rock

Added lines 63-66:

= 0.0919889

psytrance

March 30, 2007, at 03:28 PM by 131.155.65.61 -
Changed line 31 from:

0.649133

to:

= 0.649133

Changed line 33 from:

electronica

to:

= electronica

Changed line 52 from:

= alternative

to:

alternative

Changed line 60 from:

= rock

to:

rock

Changed line 64 from:

= psytrance

to:

psytrance

March 30, 2007, at 03:28 PM by 131.155.65.61 -
Changed lines 8-9 from:

As a result, we show that the population of Last.fm users can be separated into 5 clearly distinct populations of music lovers, one for each of the genres "indie", "rock", "metal", "hip-hop" and "electronic". We discuss some of the observation made in this initial and preliminary data mining effort.

to:

As a result, we show that the population of Last.fm users can be separated into 5 clearly distinct populations of music lovers, one for each of the genres "indie", "rock", "metal", "hip-hop" and "electronic". We discuss some of the of the initial observations based on this preliminary data mining effort.

Changed lines 27-28 from:

to:

Changed line 32 from:

to:

Changed line 35 from:

to:

Changed lines 38-39 from:

0.306708

to:

= 0.306708

Changed lines 42-43 from:

0.281446

to:

= 0.281446

Changed lines 46-47 from:

0.257368

to:

= 0.257368

Changed lines 50-52 from:

0.179112

alternative

to:

= 0.179112

= alternative

Changed lines 54-55 from:

0.132631

to:

= 0.132631

Changed lines 58-60 from:

0.0949674

rock

to:

= 0.0949674

= rock

Changed lines 62-64 from:

0.0919889

psytrance

to:

= 0.0919889

= psytrance

Changed line 66 from:

0.0912921

to:

= 0.0912921

March 30, 2007, at 03:11 PM by 131.155.65.61 -
Added lines 8-9:

As a result, we show that the population of Last.fm users can be separated into 5 clearly distinct populations of music lovers, one for each of the genres "indie", "rock", "metal", "hip-hop" and "electronic". We discuss some of the observation made in this initial and preliminary data mining effort.

March 30, 2007, at 02:57 PM by 131.155.65.61 -
Added lines 8-9:

http://anthony.liekens.net/images/datamining-dogbert.png

March 30, 2007, at 02:55 PM by 131.155.65.61 -
Changed lines 100-104 from:

hip hop  hip-hop  rap  rnb  rock 

to:

alternative  black metal  death metal  doom metal  folk metal  gothic metal  hard rock  heavy metal  industrial  melodic death metal  metal  metalcore  power metal  progressive metal  rock  seen live  symphonic metal  thrash metal 

Changed lines 119-174 from:

ambient  chillout  dance  electronic  electronica  female vocalists  hardcore  hip-hop  idm  indie  industrial  j-pop  japanese  jazz  pop  punk  rock  seen live  trance 

alternative  black metal  death metal  doom metal  folk metal  gothic metal  hard rock  heavy metal  industrial  melodic death metal  metal  metalcore  power metal  progressive metal  rock  seen live  symphonic metal  thrash metal 
alternative  electronic  folk  indie  indie rock  rock  seen live  singer-songwriter 
alternative  alternative rock  classic rock  emo  indie  metal  pop  punk  punk rock  rock  seen live 

to:

electronic  folk  indie  indie rock  rock  seen live  singer-songwriter 

alternative  alternative rock  classic rock  emo  indie  metal  pop  punk  punk rock  rock  seen live 
hip hop  hip-hop  rap  rnb  rock 
alternative  ambient  chillout  dance  electronic  electronica  female vocalists  hardcore  idm  indie  industrial  j-pop  japanese  jazz  pop  punk  rock  seen live  trance  trip-hop 

March 30, 2007, at 01:16 AM by 84.194.127.106 -
Added lines 90-91:

The following is still under construction

March 30, 2007, at 01:11 AM by 84.194.127.106 -
Added lines 90-91:

K-Means clustering is a straightforward technique that tries to find a classification of the vectors, putting them in clusters of users that are similar in their musical preferences. When the number of clusters is set to 5, we get a clear separation of sub populations in Last.fm. Below is a depiction of the clusters, where each colour denotes a cluster. It is clear that the clustering algorithm found "indie", "rock" and "metal" to be three significant sub populations of Last.fm users.

Added lines 94-95:

The two other clusters are not separated clearly when depicted with respect to its first two principal components, but the third and fourth principal components show that the (red and green) clusters separate users that listen to "hip-hop", and those that don't.

March 30, 2007, at 01:02 AM by 84.194.127.106 -
Added lines 145-155:

alternative  alternative rock  classic rock  emo  indie  metal  pop  punk  punk rock  rock  seen live 

March 30, 2007, at 12:58 AM by 84.194.127.106 -
Changed lines 94-160 from:

alternative  black metal  death metal  doom metal  gothic metal  hardcore  heavy metal  industrial  j-pop  japanese  melodic death metal  metal  metalcore  power metal  progressive metal  punk  rock  seen live  symphonic metal  thrash metal 
alternative  alternative rock  classic rock  emo  indie  metal  pop  punk  punk rock  rock  seen live 
alternative  folk  indie  indie rock  rock  seen live  singer-songwriter 
female vocalists  hip hop  hip-hop  pop  rap  rnb  rock  soul 
alternative  ambient  chillout  dance  electro  electronic  electronica  female vocalists  hip-hop  house  idm  indie  jazz  pop  rock  seen live  techno  trance  trip-hop 

to:

hip hop  hip-hop  rap  rnb  rock 
alternative  ambient  chillout  dance  electronic  electronica  female vocalists  hardcore  hip-hop  idm  indie  industrial  j-pop  japanese  jazz  pop  punk  rock  seen live  trance 
alternative  black metal  death metal  doom metal  folk metal  gothic metal  hard rock  heavy metal  industrial  melodic death metal  metal  metalcore  power metal  progressive metal  rock  seen live  symphonic metal  thrash metal 
alternative  electronic  folk  indie  indie rock  rock  seen live  singer-songwriter 

March 30, 2007, at 12:51 AM by 84.194.127.106 -
Changed lines 16-19 from:

Methods

Tag clouds and tag vectors

to:

Tag clouds and tag vectors

Changed lines 70-71 from:

Principal component analysis

to:

Principal component analysis

Changed lines 88-91 from:

K-Means clustering

Results

to:

K-Means clustering

March 30, 2007, at 12:50 AM by 84.194.127.106 -
Added lines 12-15:

Here, we study the structure of musical preferences in this social network. Based on the information in Last.fm's database, we can describe a user's profile by common tags or labels of its artists, with proportions of e.g., "rock," "jazz" or "hip-hop" as his preferred music genres. Here, we have taken a sample of 2840 Last.fm users, their 28302 recorded artists and these artists' commonly used tags. We show how one can determine important groups of musical genres that can classify Last.fm's user base into separate groups, adopting a range of elementary data mining algorithms, such as principal components analysis and K-means clustering.

These sorts of information shows the (economic) value of large online communities. If you are a music label, festival organiser, or otherwise at work in the corporate music business, this information provides insights in the global market, and shows how to act to attend to a perfect target audience for your commercial activities. This is not necessarily a bad thing for the consumer of these products. Last.fm's service is completely free, co-operates with labels on a unique basis

March 29, 2007, at 01:35 PM by 84.194.127.106 -
Changed lines 4-5 from:

With the advent of Web 2.0, everyone gains access to extremely valuable information of our global brain. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

to:

With the advent of Web 2.0, everyone has gained access to extremely valuable, but open and free information, describing our global population. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

Changed lines 64-65 from:

In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags in their tag vectors, although the weights in most tags can be discarded as marginal.

to:

In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags among their tag vectors, although the weights in most tags can be discarded as marginal.

March 29, 2007, at 01:27 PM by 84.194.127.106 -
Changed lines 80-81 from:

Two musical genres are not clearly separated in this 2-dimensional representation of our data. We can show the data mapped onto its 3rd and 4th principal components, as follows, which shows the separation of "hip-hop" and "electronic" music very clearly.

to:

The "seen live" tag is very popular among Last.fm users, labelling artists that they have seen live at a concert or festival. This tag doesn't show a musical genre, but is related to rock/indie listeners. This is probably due to the fact that these music types are generally known to perform live more often than other styles, but this claim is unsupported.

The electronic genres "hip-hop" and "electronic" are not clearly separated in this 2-dimensional representation of our data. We can show the data mapped onto its 3rd and 4th principal components, as follows, which shows the separation of "hip-hop" and "electronic" music styles very clearly.

March 29, 2007, at 01:20 PM by 84.194.127.106 -
Changed lines 78-79 from:

It seems that there is a spectrum of musical preferences that is enjoyed by Last.fm users, and many users have a distinctive choice in a direction of this spectrum. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations.

to:

It seems that there is a spectrum of musical preferences that is enjoyed by Last.fm users, and many users have a distinctive choice in a direction of this spectrum. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations. As a data point is closer to the centre of this plot, the more the user adopts a mix of these styles. Note that the tag vector for the "pop" genre is at the centre of our plot, denoting that "pop" is indeed associated with a lot of different genres.

March 29, 2007, at 01:15 PM by 84.194.127.106 -
Changed lines 72-73 from:

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how it's unit vector is mapped onto the 2 dimensional figure.

to:

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how their unit vectors are mapped onto the first two components.

Changed lines 76-77 from:

The first component separates users that listen to "indie" and "rock" from those that listen to "electronic", "hip-hop" and "rock". The second component separates "indie" from "rock" and it differentiates between "metal" on one hand and "hip-hop" and "electronic" on the other.

to:

The first component separates users that listen to "indie" and "rock" from those that listen to "electronic", "hip-hop" and "metal". The second component separates "indie" from "rock" and it differentiates between "metal" on one hand and "hip-hop" and "electronic" on the other.

March 29, 2007, at 01:12 PM by 84.194.127.106 -
Added lines 80-81:

Two musical genres are not clearly separated in this 2-dimensional representation of our data. We can show the data mapped onto its 3rd and 4th principal components, as follows, which shows the separation of "hip-hop" and "electronic" music very clearly.

March 29, 2007, at 01:08 PM by 84.194.127.106 -
Changed lines 72-73 from:

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure.

to:

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure. The X-axis represent the most important principal component, the Y-axis shows the second most important component. For the 10 most important tags in the whole population, we show how it's unit vector is mapped onto the 2 dimensional figure.

Added lines 76-79:

The first component separates users that listen to "indie" and "rock" from those that listen to "electronic", "hip-hop" and "rock". The second component separates "indie" from "rock" and it differentiates between "metal" on one hand and "hip-hop" and "electronic" on the other.

It seems that there is a spectrum of musical preferences that is enjoyed by Last.fm users, and many users have a distinctive choice in a direction of this spectrum. The spectrum goes from "indie" over "alternative" to "rock" and "metal", and then onto "hip-hop" and "electronic" music with a sparse gap back to "indie". When we cluster the data later on, we will automatically determine this separation of musical preferences and sub populations.

March 29, 2007, at 12:56 PM by 84.194.127.106 -
Added lines 70-73:

Picturing 2840 points in a 735-dimensional space is quite troublesome, as we're only used to at most 3 dimensional data. We need a way to flatten the 725-dimensions down to 2 or 3 dimensions in order to view it. Principal component analysis (PCA) is such a technique, and it's got some extra's that come in hand here. It allows us to flatten the highly dimensional space down to it's principal components, which gives a new set of dimensions, where the first dimension is the one that remains most of the initial data's variance. This trick thus allows us to map the vectors to a small number, and if significant differences among populations exist in the initial data, we're most probably likely to also see them in this lower dimensional representation. Nice!

The following picture depicts the data, flattened onto a 2-dimensional picture, which still shows some of the initial data's structure.

March 29, 2007, at 12:51 PM by 84.194.127.106 -
Deleted lines 69-74:

http://anthony.liekens.net/images/pca.png

K-Means clustering

Results

Added lines 74-81:

K-Means clustering

Results

http://anthony.liekens.net/images/cluster12.png

http://anthony.liekens.net/images/cluster34.png

March 29, 2007, at 12:48 PM by 84.194.127.106 -
Changed lines 70-71 from:

http://anthony.liekens.net/images/pca12.png

to:

http://anthony.liekens.net/images/pca.png

March 29, 2007, at 12:48 PM by 84.194.127.106 -
Added lines 70-71:

http://anthony.liekens.net/images/pca12.png

March 29, 2007, at 12:45 PM by 84.194.127.106 -
Changed lines 18-19 from:

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of tags in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1.

to:

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of tags in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1. A tag cloud is a descriptive illustration of these tags, where the font size is scaled linearly by the tag's weight in the tag vector.

Added lines 64-67:

In the rest of this article, we only use the 10 first components of the tag vector, to simplify the computations and limit the number of dimensions required to describe populations of users. In a small test, 15 Last.fm users typically shared over 600 tags in their tag vectors, although the weights in most tags can be discarded as marginal.

With our random sample consisting of 2840 users, 735 tags are used by all users if their tags are limited to the 10 most important. All 2840 users in the random sample are now considered as 735 dimensional vectors, where only their 10 most important tag vector components are nonzero.

March 29, 2007, at 12:28 PM by 84.194.127.106 -
Changed lines 18-20 from:

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Most artists on Last.fm are tagged by the whole community. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The tag vector is calibrated such that its length equals 1.

The following are the 10 most important factors of my personal tag vector:

to:

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Last.fm users can tag artists with keywords, called tags. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The value at each dimension (identified by a tag) is the sum of tags in the top artists, weighted by the number of tracks played and the weight of the tag for the artist. The tag vector is calibrated such that its length equals 1.

The following are the 10 most important factors of my personal tag vector, which provide a clear description of my preferred music styles:

March 29, 2007, at 12:25 PM by 84.194.127.106 -
Changed lines 18-19 from:

The following describes how the tag cloud, or tag vector is constructed. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Most artists on Last.fm are tagged by the whole community. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The tag vector is calibrated such that its length equals 1.

to:

The tag cloud or tag vector describing a user's musical preferences is constructed as follows. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Most artists on Last.fm are tagged by the whole community. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The tag vector is calibrated such that its length equals 1.

March 29, 2007, at 12:24 PM by 84.194.127.106 -
Added lines 18-19:

The following describes how the tag cloud, or tag vector is constructed. For a user, consider his top 50 artists, and the number of tracks of that artist played by this user. Most artists on Last.fm are tagged by the whole community. For each artist in the user's top, consider this artist's tags and the counts of occurrences of that tag for the artist, linearly calibrated such that the top tag has weight 1. The tag vector is now the weighted sum of tag occurrences in this aggregate. The tag vector is calibrated such that its length equals 1.

March 29, 2007, at 12:17 PM by 84.194.127.106 -
Changed lines 18-19 from:

The following is my personal tag vector:

to:

The following are the 10 most important factors of my personal tag vector:

Changed lines 60-61 from:

to:

March 29, 2007, at 12:16 PM by 84.194.127.106 -
Added lines 18-61:

The following is my personal tag vector:

electronic 0.649133
electronica 0.472839
trance 0.306708
techno 0.281446
ambient 0.257368
dance 0.179112
alternative 0.132631
rap metal 0.0949674
rock 0.0919889
psytrance 0.0912921

March 28, 2007, at 02:53 PM by 131.155.64.143 -
Changed lines 16-17 from:

In my previous instalment? on analysing Last.fm, I explored how one could study the tag cloud of a user's musical preferences to discover new bands. These tag clouds or tag vectors will now serve as the basis of our data mining analysis.

to:

In my previous instalment on analysing Last.fm, I explored how one could study the tag cloud of a user's musical preferences to discover new bands. These tag clouds or tag vectors will now serve as the basis of our data mining analysis.

March 28, 2007, at 02:53 PM by 131.155.64.143 -
Added lines 16-17:

In my previous instalment? on analysing Last.fm, I explored how one could study the tag cloud of a user's musical preferences to discover new bands. These tag clouds or tag vectors will now serve as the basis of our data mining analysis.

March 28, 2007, at 02:50 PM by 131.155.64.143 -
Changed lines 10-11 from:

Last.fm allows its users to register their musical listening habits; Every time a user plays a music track, this track's information is sent to Last.fm's servers. Based on the information that was gathered over a longer period, the Last.fm service generates a list of users (neighbours) who listen to similar artists and tracks. On the basis of this information, Last.fm than recommends artists and songs to its users. This approach seems to be working and offers valuable information if a user wants to expand her musical horizon. If two users listen to a big common pool of artists, it is indeed highly probable that they will also enjoy each other's mutual exclusive artists.

to:

Last.fm allows its users to register their musical listening habits; Every time a user plays a music track, this track's information is sent to Last.fm's servers. Based on the information that was gathered over a longer period, the Last.fm service generates a list of users (neighbours) who listen to similar artists and tracks. On the basis of this information, Last.fm than recommends artists and songs to its users. This approach works surprisingly well and provides very valuable information for users that want to expand their musical horizon. If two users listen to a big common pool of artists, it is indeed highly probable that they will also enjoy each other's mutual exclusive artists.

March 28, 2007, at 12:01 PM by 131.155.64.143 -
Changed lines 14-15 from:

Tag vectors

to:

Tag clouds and tag vectors

March 28, 2007, at 12:01 PM by 131.155.64.143 -
Added lines 14-19:

Tag vectors

Principal component analysis

K-Means clustering

March 28, 2007, at 11:59 AM by 131.155.64.143 -
Added lines 16-19:

http://anthony.liekens.net/images/pca12.png

http://anthony.liekens.net/images/pca34.png

March 28, 2007, at 11:45 AM by 131.155.64.143 -
Changed lines 54-61 from:

female vocalists  hip hop  hip-hop  pop  rap  rnb  rock  soul 

to:

female vocalists  hip hop  hip-hop  pop  rap  rnb  rock  soul 

March 28, 2007, at 11:44 AM by 131.155.64.143 -
Added lines 16-82:

alternative  black metal  death metal  doom metal  gothic metal  hardcore  heavy metal  industrial  j-pop  japanese  melodic death metal  metal  metalcore  power metal  progressive metal  punk  rock  seen live  symphonic metal  thrash metal 
alternative  alternative rock  classic rock  emo  indie  metal  pop  punk  punk rock  rock  seen live 
alternative  folk  indie  indie rock  rock  seen live  singer-songwriter 
female vocalists  hip hop  hip-hop  pop  rap  rnb  rock  soul 
alternative  ambient  chillout  dance  electro  electronic  electronica  female vocalists  hip-hop  house  idm  indie  jazz  pop  rock  seen live  techno  trance  trip-hop 

March 28, 2007, at 11:43 AM by 131.155.64.143 -
Changed lines 6-7 from:

In this article, I describe how users can be represented as vectors in a highly dimensional space representing their musical preferences. The dimensions for this space are defined by the tags that describe their musical preferences. These vectors can also be shown as "musical tag clouds" describing the user's preferences. With these tag vectors, we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data in order to gain access to some of the statistics of Last.fm's user base.

to:

In this article, I describe how users can be represented as vectors in a highly dimensional space representing their musical preferences. The dimensions for this space are defined by the tags that describe their musical preferences. These vectors can also be shown as "musical tag clouds" describing the user's preferences. With these tag vectors, we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data which allows insights in the structure of Last.fm's user base.

March 28, 2007, at 11:42 AM by 131.155.64.143 -
Added lines 6-7:

In this article, I describe how users can be represented as vectors in a highly dimensional space representing their musical preferences. The dimensions for this space are defined by the tags that describe their musical preferences. These vectors can also be shown as "musical tag clouds" describing the user's preferences. With these tag vectors, we determine the principal components that divides the sample populations in sub populations, and study possible clusterings of the data in order to gain access to some of the statistics of Last.fm's user base.

March 28, 2007, at 10:35 AM by 131.155.64.143 -
Changed lines 8-9 from:

Subscribers of Last.fm can register their musical listening habits using the service. Every time a user plays a music track, this track's information is sent to Last.fm's servers. Based on the information that was gathered over a longer period, the Last.fm service generates a list of users (neighbours) who listen to similar artists and tracks. On the basis of this information, Last.fm than recommends artists and songs to its users. This approach seems to be working and offers valuable information if a user wants to expand her musical horizon. If two users listen to a big common pool of artists, it is indeed highly probable that they will also enjoy each other's mutual exclusive artists.

to:

Last.fm allows its users to register their musical listening habits; Every time a user plays a music track, this track's information is sent to Last.fm's servers. Based on the information that was gathered over a longer period, the Last.fm service generates a list of users (neighbours) who listen to similar artists and tracks. On the basis of this information, Last.fm than recommends artists and songs to its users. This approach seems to be working and offers valuable information if a user wants to expand her musical horizon. If two users listen to a big common pool of artists, it is indeed highly probable that they will also enjoy each other's mutual exclusive artists.

March 28, 2007, at 10:33 AM by 131.155.64.143 -
Added lines 8-9:

Subscribers of Last.fm can register their musical listening habits using the service. Every time a user plays a music track, this track's information is sent to Last.fm's servers. Based on the information that was gathered over a longer period, the Last.fm service generates a list of users (neighbours) who listen to similar artists and tracks. On the basis of this information, Last.fm than recommends artists and songs to its users. This approach seems to be working and offers valuable information if a user wants to expand her musical horizon. If two users listen to a big common pool of artists, it is indeed highly probable that they will also enjoy each other's mutual exclusive artists.

March 28, 2007, at 10:27 AM by 131.155.64.143 -
Changed line 12 from:

Conclusion and Discussions

to:

Conclusion and Discussion

March 28, 2007, at 10:27 AM by 131.155.64.143 -
Changed lines 4-5 from:

With the advent of Web 2.0, everyone gains access to extremely valuable information of our global brain. Here, I show how very basic data mining algorithms can help understand the structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

to:

With the advent of Web 2.0, everyone gains access to extremely valuable information of our global brain. Here, I show how elementary data mining algorithms can aid in understanding the underlying structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

Introduction

Methods

Results

Conclusion and Discussions

March 28, 2007, at 10:26 AM by 131.155.64.143 -
Changed lines 1-2 from:

Data mining 2.0

to:

Data mining 2.0 March, 28 2007

March 28, 2007, at 10:24 AM by 131.155.64.143 -
Changed lines 3-4 from:

With the advent of Web 2.0, social networks and the open availability of huge datasets describing the global mind, everyone gains access to extremely valuable information. Here, I show how very basic data mining algorithms can help understand the structure of musical preferences in a sample population of Last.fm users.

to:

With the advent of Web 2.0, everyone gains access to extremely valuable information of our global brain. Here, I show how very basic data mining algorithms can help understand the structure of musical preferences in a sample population of users of a musical social networking service, Last.fm.

March 28, 2007, at 10:19 AM by 131.155.64.143 -
Added lines 1-4:

Data mining 2.0

With the advent of Web 2.0, social networks and the open availability of huge datasets describing the global mind, everyone gains access to extremely valuable information. Here, I show how very basic data mining algorithms can help understand the structure of musical preferences in a sample population of Last.fm users.


comments powered by Disqus