decode

Used to calculate the correlation between rows and rows or columns in a DataFrame.

Parameters：

other：DataFrame, Series. Object with which to compute correlations.

axis： {0 or ‘index’, 1 or ‘columns’}, default 0. 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise.

method：{‘pearson’, ‘kendall’, ‘spearman’} or callable.

axis=0oraxis=‘index’ denotes the computation of column-to-column correlations.axis=1oraxis=‘columns’ Indicates that the row-to-row correlation is computed.

methodIt is the method of calculating the correlation, here pearson correlation coefficient (Pearson correlation coefficient) is used.

Here's an example of an audience rating of a movie

user_movie_ratings

Each row represents the ratings of one viewer for all movies, and each column represents the ratings of all viewers for one movie.

Then the correlation between the first viewer and other viewers and the correlation between the first movie and other movies are calculated separately.

The code is as follows

import pandas as pd
import numpy as np


data = ([[5, 5, 3, 3, 4], [3, 4, 5, 5, 4],
                 [3, 4, 3, 4, 5], [5, 5, 3, 4, 4]])
df = (data, columns=['The Shawshank Redemption',
                                 'Forrest Gump', 'Avengers: Endgame',
                                 'Iron Man', 'Titanic '],
                  index=['user1', 'user2', 'user3', 'user4'])
# Compute correlation between user1 and other users
user_to_compare = [0]
similarity_with_other_users = (user_to_compare, axis=1,
                                          method='pearson')
similarity_with_other_users = similarity_with_other_users.sort_values(
    ascending=False)
# Compute correlation between 'The Shawshank Redemption' and other movies
movie_to_compare = df['The Shawshank Redemption']
similarity_with_other_movies = (movie_to_compare, axis=0)
similarity_with_other_movies = similarity_with_other_movies.sort_values(
    ascending=False)

The pearson correlation coefficient is used here:

where n is the dimension of the sample and x_iand y_idenote the values of each dimension of the sample, respectively.cap (a poem)denotes the sample mean.

Take user1 and user4 as an example and calculate the correlation coefficient between them, the mean value of user1 is 4 and the mean value of user2 is 4.2:

This result agrees with that calculated by the corrwith function.

similarity_with_other_users

user_correlation

similarity_with_other_movies

movie_correlation

As can be seen from the results, user1 and user4 have the highest correlation, indicating that their ratings for each movie are closest to each other, or that their favorite movie genres are closest to each other; The Shawshank Redemption and Forrest Gump have a correlation of 1, indicating that the latter's ratings are closest to the former's.

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.