SoFunction
Updated on 2024-12-19

python implementation of find median in unordered array method

I. Description of the problem

1Find the median of an unordered array. (If the array is even, the median is the sum of the middle two numbers divided by 2. If the array is odd, the median is the middle most position. Requirements: no sorting, as low a time complexity as possible.

2Example:

lists = [3, 2, 1, 4] , median = (2+3)/2 = 2.5
lists = [3, 1, 2] , median 2

3. Algorithmic thinking:

Using the idea of quick sort (but not all of them are used): pick an element at random, take the element as key, divide the array into two parts, if the left side of the array length is just (n-1)/2, then the key is the median, if the left side of the array length < (n-1)/2, then the median point on the right, and vice versa, the median is on the left side. Then go to the corresponding side and continue to find the median

The average time complexity is O(n)

II. Procedures

class Solution(object):
 
 def findmedian(self, lists):
  if not lists or len(lists) == 0:
   return []
 
  n = len(lists)
  if n % 2 == 0:
   a = (lists, n/2, 0, n-1)
   b = (lists, n/2-1, 0, n-1)
   mid = (lists[a]+lists[b])/ (2 * 1.0)
   return mid
  else:
   mid = (lists, n/2, 0, n-1)
   return lists[mid]
 
 def partition(self, lists, k, start, end):
 
  key = lists[start]
  left, right = start, end
 
  while left < right:
   while left < right and lists[right] > key:
    right = right - 1
   lists[left] = lists[right]
 
   while left < right and lists[left] < key:
    left = left + 1
   lists[right] = lists[left]
 
  lists[left] = key
 
  if left == k:
   return left
  elif left > k:
   return (lists, k, start, left-1)
  else:
   return (lists, k, left+1, end)
 
 
if __name__ == "__main__":
 
 sol = Solution()
 lists = [2, 5, 4, 9, 3, 6, 8, 7, 1]
 # lists = [1, 2]
 
 
 data = (lists)
 print("upper quartile = %s" % data)

Knowledge supplement: python streaming implementation of a field sorting

I. hadoop streaming by default

1, in the default case of hadoop streaming, it is \t as the separator, standard input, each line of the first \t before the content as the key, the first \t after the content as the value. note that, if there is not a \t character, then the whole line as the key.

2, some parameters of streaming are as follows:

-D : Set the separator between key and value in map output.
-D : Set the position of the map program separator, the part before this position is used as key and the part after is used as value.
-D : Set the splitter inside the key in the output of the map.
-D : Specify the number of columns used for bucketing keys after the keys have been cut according to the delimiter when bucketing (used with -partitioner).
-D : Set the separator between key and value in reduce output.
-D : Set the position of the reduce program separator.

Second, python streaming implements the sorting of a field.

1, Enter data: cat (tab key in the center)

11 2
11 3
11 4 1
11 1

11 12 22

2, streaming program is as follows:

vim

#!/bin/bash

export CURRENT=/home//hadoop_streaming/sort

/usr/local/hadoop-2.6.3/bin/hadoop jar /usr/local/hadoop-2.6.3/share/hadoop/tools/lib/hadoop-streaming-2.6. \
-D ='\t' \
-D =3 \
-D = \
-D =-k3,3nr \  # Arranged in reverse order in the third column and can be selected according to the desired paragraph.
-input "/user/test/inputdata/datas3/" \
-output "/user/test/streaming/sorted_20180711" \
-mapper "python " \
-reducer "python " \
-file "$CURRENT/" \
-file "$CURRENT/"

(2)

# -*- coding: utf-8 -*-
import sys
 
for line in :
 line = ()
 print('{0}'.format(line))

(3)

# -*- coding: utf-8 -*-
import sys
 
for line in :
 line = ()
 print("{0}".format(line))

Run command:

bash

Run results:

hdfs dfs -cat /user/test/streaming/sorted_20180711/part-00000

11 12 22
11 3
11 2
11 4 1
11 1

Above this python implementation in the unordered array to find the median method is all I have shared with you, I hope to be able to give you a reference, and I hope you support me more.