SoFunction
Updated on 2024-10-29

On the use of NumPy in Python's common functions

1. txt file

(1) Unit matrix

That is, a square matrix whose elements on the main diagonal are all 1's and the remaining elements are all 0's.

It is possible to create such a two-dimensional array in NumPy using the eye function, where we only need to be given a parameter that specifies the number of elements of 1 in the matrix.

For example, create a 3 x 3 array:

import numpy as np
I2 = (3)
print(I2)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

(2) use savetxt function to store the data into the file, of course, we need to specify the file name and the array to be saved.

('', I2)#Create a file to hold I2's data

2. CSV files

CSV (Comma-Separated Value, comma-separated value) format is a common file format; usually, the database dump file is the CSV format, each field in the file corresponds to the database table columns; spreadsheet software (such as Microsoft Excel) can handle CSV files.

note:, the loadtxt function in NumPy makes it easy to read CSV files, automatically slice and dice the fields, and load the data into NumPy arrays.

The data content of the

在这里插入图片描述

c, v = ('', delimiter=',', usecols=(6,7), unpack=True)
# usecols is parameterized with a tuple to get the data for fields 7 through 8
The # unpack parameter is set to True, meaning that the data in the different columns are split, i.e., the closing price and volume arrays are assigned to the variables c and v, respectively.
print(c)
[336.1  339.32 345.03 344.32 343.44 346.5  351.88 355.2  358.16 354.54
 356.85 359.18 359.9  363.13 358.3  350.56 338.61 342.62 342.88 348.16
 353.21 349.31 352.12 359.56 360.   355.36 355.76 352.47 346.67 351.99]
print(v)
[21144800. 13473000. 15236800.  9242600. 14064100. 11494200. 17322100.
 13608500. 17240800. 33162400. 13127500. 11086200. 10149000. 17184100.
 18949000. 29144500. 31162200. 23994700. 17853500. 13572000. 14395400.
 16290300. 21521000. 17885200. 16188000. 19504300. 12718000. 16192700.
 18138800. 16824200.]
print(type(c))
print(type(v))
<class ''>
<class ''>

3.Volume weighted average price = average() function

VWAP Overview: VWAP (Volume-Weighted Average Price) is a very important economic quantity that represents the "average" price of a financial asset.

The higher the volume at a given price, the more weight that price carries.

VWAP is the weighted average calculated using volume as weights and is commonly used in algorithmic trading.

vwap = (c,weights=v)
print('Volume-weighted average pricevwap =', vwap)
Volume-weighted average pricevwap = 350.5895493532009

4. Arithmetic mean function = mean() function

The mean function in NumPy calculates the arithmetic mean of an array element.

print('cThe arithmetic mean of the elements in the array is: {}'.format((c)))
cThe arithmetic mean of the elements in the array is: 351.0376666666667

5. Time-weighted average price

TWAP Overview:

In economics, TWAP (Time-Weighted Average Price) is another indicator of "average" prices. Now that we have calculated the VWAP, let's calculate the TWAP as well. In fact, TWAP is just a variant, the basic idea is that the recent price is more important, so we should give higher weight to the recent price. The simplest way is to use the range function to create a sequence of natural numbers starting from 0 to grow sequentially, the number of natural numbers that is the number of closing prices. Of course, this is not necessarily the correct way to calculate TWAP.

t = (len(c))
print('Time-weighted average pricetwap=', (c, weights=t))
Time-weighted average pricetwap= 352.4283218390804

6. Maximum and minimum values

h, l = ('', delimiter=',', usecols=(4,5), unpack=True)
print('hdata for: \n{}'.format(h))
print('-'*10)
print('ldata for: \n{}'.format(l))
hdata for: 
[344.4  340.04 345.65 345.25 344.24 346.7  353.25 355.52 359.   360.
 357.8  359.48 359.97 364.9  360.27 359.5  345.4  344.64 345.15 348.43
 355.05 355.72 354.35 359.79 360.29 361.67 357.4  354.76 349.77 352.32]
----------
ldata for: 
[333.53 334.3  340.98 343.55 338.55 343.51 347.64 352.15 354.87 348.
 353.54 356.71 357.55 360.5  356.52 349.52 337.72 338.61 338.37 344.8
 351.12 347.68 348.4  355.92 357.75 351.31 352.25 350.6  344.9  345.  ]
print('hThe maximum value of the data is: {}'.format((h)))
print('lThe minimum value of the data is: {}'.format((l)))
hThe maximum value of the data is: 364.9
lThe minimum value of the data is: 333.53
NumPyThere is aptpfunction calculates the range of values of an array
This function returns the difference between the maximum and minimum values of an array element
in other words,The return value is equal tomax(array) - min(array)
print('hMaximum value of data-The difference between the minimum value of: \n{}'.format((h)))
print('lMaximum value of data-The difference between the minimum value of: \n{}'.format((l)))
hMaximum value of data-The difference between the minimum value of: 
24.859999999999957
lMaximum value of data-The difference between the minimum value of: 
26.970000000000027

7. Statistical analysis

Median: we can use some thresholds to remove outliers, but there's actually a better way, and that's the median.

The values of each variable are arranged in order of magnitude to form a series, and the number in the middle of the series is the median.

For example, if we have 5 values, 1, 2, 3, 4, and 5, then the median is the number 3 in the middle.

m = ('', delimiter=',', usecols=(6,), unpack=True)
print('mThe median in the data is: {}'.format((m)))
mThe median in the data is: 352.055
# Find the median after sorting the array
sorted_m = (m)
print('mdata sorting: \n{}'.format(sorted_m))
N = len(c)
print('mThe median in the data is: {}'.format((sorted_m[N//2]+sorted_m[(N-1)//2])/2))
mdata sorting: 
[336.1  338.61 339.32 342.62 342.88 343.44 344.32 345.03 346.5  346.67
 348.16 349.31 350.56 351.88 351.99 352.12 352.47 353.21 354.54 355.2
 355.36 355.76 356.85 358.16 358.3  359.18 359.56 359.9  360.   363.13]
mThe median in the data is: 352.055
variance (statistics):
variance (statistics)是指各个数据与所有数据算术平均数的离差平方和除以数据个数所得到的值。
print('variance =', (m))
variance = 50.126517888888884
var_hand = ((())**2)
print('var =', var_hand)
var = 50.126517888888884

Note:There is a difference in the calculation of sample variance and overall variance. The overall variance removes the sum of the squares of the deviations from the number of data, while the sample variance removes the sum of the squares of the deviations from the number of sample data minus one, where the number of sample data minus one (i.e., n-1) is called the degrees of freedom. The reason for this difference is to ensure that the sample variance is an unbiased estimator.

8. Equity yields

In the academic literature, closing prices are often analyzed based on stock returns and log returns.

The simple rate of return is the rate of change between two neighboring prices, while the logarithmic rate of return is the difference between the two after all prices have been taken as logarithms.
As we learned about logarithms in high school, the logarithm of "a" minus the logarithm of "b" is the logarithm of "a divided by b". Therefore, the logarithmic rate of return can also be used to measure the rate of price change.

Note that since the rate of return is a ratio, e.g., we divide dollars by dollars (which can also be other currency units), it is dimensionless.

In short, investors are most interested in the variance or standard deviation of the rate of return, as this represents the amount of investment risk.

(1) First, let's compute the simple yield. the diff function in NumPy returns an array of the differences of neighboring array elements. This is somewhat analogous to differentiation in calculus. To calculate the yield, we also need to divide the difference by the previous day's price. Note here, however, that the array returned by diff has one less element than the array of closing prices. returns = (arr)/arr[:-1]

Note that we did not use the last value in the array of closing prices as a divisor. Next, use the std function to calculate the standard deviation:

print ("Standard deviation =", (returns))

(2) The logarithmic yield is even a bit simpler to calculate. We first use the log function to get the logarithm of each closing price, and then just use the diff function on the result.

logreturns = ( (c) )

In general, we should check the input array to make sure it does not contain zeros and negative numbers. Otherwise, you will get an error message. However, in our example, the stock price is always positive, so the check can be omitted.

(3) We are likely to be very interested in which trading days have positive returns.

After completing the previous steps, we can do this simply by using the where function, which returns the indexes of all the array elements that satisfy the condition based on the specified condition.

Enter the following code:

posretindices = (returns > 0)
print "Indices with positive returns", posretindices
You can output the indexes of all the positive elements in the array.。
Indices with positive returns (array([ 0, 1, 4, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23, 25, 28]),)

(4) In investment science, volatility is a measure of price movement. Historical volatility can be calculated from historical price data. To calculate historical volatility (e.g., annual or monthly volatility), the logarithmic rate of return is used. Annual volatility is equal to the standard deviation of the logarithmic return divided by its mean, divided by the square root of the reciprocal of the number of trading days, which is usually taken as 252 days. Calculated using the std and mean functions

The code is shown below:

annual_volatility = (logreturns)/(logreturns)
annual_volatility = annual_volatility / (1./252.)

(5) Division in sqrt function. In Python, the division of integers has a different mechanism than the division of floating point numbers (python3 has modified this function), and we must use floating point numbers to get the correct result. Similar to the method used to calculate the annual volatility, the monthly volatility is calculated as follows:

annual_volatility * (1./12.)

c = ('', delimiter=',', usecols=(6,), unpack=True)

returns = (c)/c[:-1]
print('returnsstandard deviation: {}'.format((returns)))
logreturns = ((c))
posretindices = (returns>0)
print('retrunsPositions in which the element is a positive number: \n{}'.format(posretindices))
annual_volatility = (logreturns)/(logreturns)
annual_volatility = annual_volatility/(1/252)
print('Annual volatility: {}'.format(annual_volatility))
print('Monthly volatility:{}'.format(annual_volatility*(1/12)))
returnsstandard deviation: 0.012922134436826306
retrunsPositions in which the element is a positive number: 
(array([ 0,  1,  4,  5,  6,  7,  9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23,
       25, 28], dtype=int64),)
Annual volatility: 129.27478991115132
Monthly volatility:37.318417377317765
this paper references《PythonData Analysis Fundamentals Tutorial:NumPyStudy Guides》

To this article on the use of Python common function NumPy article is introduced to this, more related Python common function NumPy content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!