Random data distribution
What is data distribution?
Data distribution refers to the frequency at which all possible values appear in the data set and is expressed by probability. It describes the possibility of data taking values.
In statistics and data science, data distribution is an important basis for analyzing data.
Random distribution in NumPy
NumPy'srandom
The module provides multiple methods to generate random numbers that obey different distributions.
Generate discrete distribution random numbers
choice(a, p, size)
: From the arraya
Randomly select elements in thep
Make a selection.a
: Source array containing all possible values.p
: The probability array of each value must be 1.size
: Output the shape of the array.
Example: Generate 100 random numbers, where 3 has a probability of occurrence of 0.2, 5 has a probability of occurrence of 0.4, 7 has a probability of occurrence of 0.3, and 9 has a probability of occurrence of 0.1:
import numpy as np x = ([3, 5, 7, 9], p=[0.2, 0.4, 0.3, 0.1], size=100) print(x)
Generate continuous distribution random numbers
NumPy provides a variety of methods to generate random numbers that obey different continuous distributions, such as normal distribution, uniform distribution, exponential distribution, etc.
randn(size)
: Generate random numbers that obey standard normal distribution.rand(size)
: Generate random numbers that obey uniform distribution.beta(a, b, size)
: Generate random numbers that obey the Beta distribution.gamma(shape, scale, size)
: Generate random numbers that obey Gamma distribution.poisson(lam, size)
: Generate random integers that obey Poisson distribution.
Example: Generate 10 random numbers that obey standard normal distributions:
import numpy as np x = (10) print(x)
Random arrangement
Shuffle the array
shuffle(arr)
: For arrayarr
Do a random shuffle and modify the original array.
Example: Random shuffle array[1, 2, 3, 4, 5]
:
import numpy as np from import shuffle arr = ([1, 2, 3, 4, 5]) shuffle(arr) print(arr)
Generate a random arrangement of arrays
permutation(arr)
: Generate an arrayarr
Random arrangement of elements, no modification of the original array.
Example: Generate an array[1, 2, 3, 4, 5]
Random arrangement:
import numpy as np from import permutation arr = ([1, 2, 3, 4, 5]) x = permutation(arr) print(x)
practise
- use
choice
The method generates 200 random numbers, of which the probability of 1 appearing is 0.1, the probability of 2 appearing is 0.2, and the probability of 3 appearsing is 0.7. - Generate 10 random numbers that obey exponential distributions.
- Combination array
[10, 20, 30, 40, 50]
Do a random shuffle. - Generate an array
[6, 7, 8, 9, 10]
Random arrangement of elements.
Solution
import numpy as np from import choice, permutation, expon # 1. Generate random numbers using the choice methodrandom_numbers = choice([1, 2, 3], p=[0.1, 0.2, 0.7], size=200) print(random_numbers) # 2. Generate random numbers that obey exponential distributionsexponential_randoms = expon(scale=1, size=10) print(exponential_randoms) # 3. Random shuffle the arrayarr = ([10, 20, 30, 40, 50]) shuffle(arr) print(arr) # 4. Generate random arrangement of arraysrandom_permutation = permutation([6, 7, 8, 9, 10]) print(random_permutation)
Visualize distribution using Seaborn
Introduction
Seaborn is a Python data visualization library based on Matplotlib for creating statistical charts. It provides a range of advanced drawing functions that can easily create beautiful and informative statistical graphs.
Install Seaborn
If you already have Python and pip installed, you can install Seaborn using the following command:
pip install seaborn
If you are using a Jupyter Notebook, you can install Seaborn using the following command:
!pip install seaborn
Draw a distribution map
A distribution map is a graph that visualizes the distribution of data. It shows the frequency of occurrence of each value in the dataset.
In Seaborn, you can use()
Functions draw distribution graphs. This function accepts the following parameters:
data
: The data to be plotted. Can be an array, list, or Pandas dataframe.hist
: IfTrue
(default), draw the histogram; ifFalse
, then only the density curve is drawn.kde
: IfTrue
(default), kernel density estimation (KDE) is used to estimate the distribution of the data; ifFalse
, then use a histogram.bins
: The number of histograms used to create histograms.norm
: The type used to regulate the distribution. For example,norm='kde'
The distribution will be standardized using KDE.
Example: Drawing a normal distribution
The following example demonstrates how to plot a normal distribution using Seaborn:
import seaborn as sns import numpy as np # Generate random datadata = (1000) # Draw a distribution map(data) ()
This code will generate 1000 random numbers that obey the standard normal distribution and plot their distribution using Seaborn.
Example: Draw a custom distribution
The following example demonstrates how to draw a custom distribution:
import seaborn as sns import numpy as np # Generate custom datadata = [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9] # Draw a distribution map(data, hist=False, kde=False) ()
The code will generate a custom array of data containing duplicate values and plot their distribution using Seaborn without showing histograms or density curves.
practise
- Generate 500 random numbers that obey uniform distributions and plot their distributions.
- Generate 1000 random numbers that obey exponential distributions and plot their distributions.
- Draw a distribution map from the following data:
data = [23, 37, 43, 29, 31, 32, 36, 27, 31, 33, 34, 25, 27, 28, 42, 38, 27, 27, 33, 31, 26, 29, 31, 35, 33, 30, 30, 32, 36, 28, 31, 33, 38, 29, 31, 31, 34, 36, 26, 25, 26, 34, 37, 28, 36, 31, 29, 31, 27, 28, 32, 37, 30, 33, 33, 27, 31, 32, 32, 36, 25, 32, 35, 37, 37, 30, 31, 34, 33, 29, 32, 31, 36, 26, 29, 31, 37, 28, 28, 37, 31, 32, 36, 33, 27, 31, 32, 33, 32, 32, 30, 27, 36, 38, 35, 26, 32, 37, 31, 30, 33, 30, 27,
This is the end of this article about the detailed explanation of NumPy random data distribution and Seaborn visualization. For more related NumPy data distribution and Seaborn content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!