Detailed explanation of NumPy random data distribution and Seaborn visualization

Random data distribution

What is data distribution?

Data distribution refers to the frequency at which all possible values appear in the data set and is expressed by probability. It describes the possibility of data taking values.

In statistics and data science, data distribution is an important basis for analyzing data.

Random distribution in NumPy

NumPy'srandomThe module provides multiple methods to generate random numbers that obey different distributions.

Generate discrete distribution random numbers

choice(a, p, size): From the arrayaRandomly select elements in thepMake a selection.a: Source array containing all possible values.p: The probability array of each value must be 1.size: Output the shape of the array.

Example: Generate 100 random numbers, where 3 has a probability of occurrence of 0.2, 5 has a probability of occurrence of 0.4, 7 has a probability of occurrence of 0.3, and 9 has a probability of occurrence of 0.1:

import numpy as np

x = ([3, 5, 7, 9], p=[0.2, 0.4, 0.3, 0.1], size=100)
print(x)

Generate continuous distribution random numbers

NumPy provides a variety of methods to generate random numbers that obey different continuous distributions, such as normal distribution, uniform distribution, exponential distribution, etc.

randn(size): Generate random numbers that obey standard normal distribution.rand(size): Generate random numbers that obey uniform distribution.beta(a, b, size): Generate random numbers that obey the Beta distribution.gamma(shape, scale, size): Generate random numbers that obey Gamma distribution.poisson(lam, size): Generate random integers that obey Poisson distribution.

Example: Generate 10 random numbers that obey standard normal distributions:

import numpy as np

x = (10)
print(x)

Random arrangement

Shuffle the array

shuffle(arr): For arrayarrDo a random shuffle and modify the original array.

Example: Random shuffle array[1, 2, 3, 4, 5]：

import numpy as np
from  import shuffle

arr = ([1, 2, 3, 4, 5])

shuffle(arr)
print(arr)

Generate a random arrangement of arrays

permutation(arr): Generate an arrayarrRandom arrangement of elements, no modification of the original array.

Example: Generate an array[1, 2, 3, 4, 5]Random arrangement:

import numpy as np
from  import permutation

arr = ([1, 2, 3, 4, 5])

x = permutation(arr)
print(x)

practise

usechoiceThe method generates 200 random numbers, of which the probability of 1 appearing is 0.1, the probability of 2 appearing is 0.2, and the probability of 3 appearsing is 0.7.
Generate 10 random numbers that obey exponential distributions.
Combination array[10, 20, 30, 40, 50]Do a random shuffle.
Generate an array[6, 7, 8, 9, 10]Random arrangement of elements.

Solution

import numpy as np
from  import choice, permutation, expon

# 1. Generate random numbers using the choice methodrandom_numbers = choice([1, 2, 3], p=[0.1, 0.2, 0.7], size=200)
print(random_numbers)

# 2. Generate random numbers that obey exponential distributionsexponential_randoms = expon(scale=1, size=10)
print(exponential_randoms)

# 3. Random shuffle the arrayarr = ([10, 20, 30, 40, 50])
shuffle(arr)
print(arr)

# 4. Generate random arrangement of arraysrandom_permutation = permutation([6, 7, 8, 9, 10])
print(random_permutation)

Visualize distribution using Seaborn

Introduction

Seaborn is a Python data visualization library based on Matplotlib for creating statistical charts. It provides a range of advanced drawing functions that can easily create beautiful and informative statistical graphs.

Install Seaborn

If you already have Python and pip installed, you can install Seaborn using the following command:

pip install seaborn

If you are using a Jupyter Notebook, you can install Seaborn using the following command:

!pip install seaborn

Draw a distribution map

A distribution map is a graph that visualizes the distribution of data. It shows the frequency of occurrence of each value in the dataset.

In Seaborn, you can use()Functions draw distribution graphs. This function accepts the following parameters:

data: The data to be plotted. Can be an array, list, or Pandas dataframe.hist: IfTrue(default), draw the histogram; ifFalse, then only the density curve is drawn.kde: IfTrue(default), kernel density estimation (KDE) is used to estimate the distribution of the data; ifFalse, then use a histogram.bins: The number of histograms used to create histograms.norm: The type used to regulate the distribution. For example,norm='kde'The distribution will be standardized using KDE.

Example: Drawing a normal distribution

The following example demonstrates how to plot a normal distribution using Seaborn:

import seaborn as sns
import numpy as np

# Generate random datadata = (1000)

# Draw a distribution map(data)
()

This code will generate 1000 random numbers that obey the standard normal distribution and plot their distribution using Seaborn.

Example: Draw a custom distribution

The following example demonstrates how to draw a custom distribution:

import seaborn as sns
import numpy as np

# Generate custom datadata = [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9]

# Draw a distribution map(data, hist=False, kde=False)
()

The code will generate a custom array of data containing duplicate values and plot their distribution using Seaborn without showing histograms or density curves.

practise

Generate 500 random numbers that obey uniform distributions and plot their distributions.
Generate 1000 random numbers that obey exponential distributions and plot their distributions.
Draw a distribution map from the following data:

data = [23, 37, 43, 29, 31, 32, 36, 27, 31, 33, 34, 25, 27, 28, 42, 38, 27, 27, 33, 31, 26, 29, 31, 35, 33, 30, 30, 32, 36, 28, 31, 33, 38, 29, 31, 31, 34, 36, 26, 25, 26, 34, 37, 28, 36, 31, 29, 31, 27, 28, 32, 37, 30, 33, 33, 27, 31, 32, 32, 36, 25, 32, 35, 37, 37, 30, 31, 34, 33, 29, 32, 31, 36, 26, 29, 31, 37, 28, 28, 37, 31, 32, 36, 33, 27, 31, 32, 33, 32, 32, 30, 27, 36, 38, 35, 26, 32, 37, 31, 30, 33, 30, 27,

This is the end of this article about the detailed explanation of NumPy random data distribution and Seaborn visualization. For more related NumPy data distribution and Seaborn content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!