I. Description of the problem
1、Find the median of an unordered array. (If the array is even, the median is the sum of the middle two numbers divided by 2. If the array is odd, the median is the middle most position. Requirements: no sorting, as low a time complexity as possible.
2、Example:
lists = [3, 2, 1, 4] , median = (2+3)/2 = 2.5
lists = [3, 1, 2] , median 2
3. Algorithmic thinking:
Using the idea of quick sort (but not all of them are used): pick an element at random, take the element as key, divide the array into two parts, if the left side of the array length is just (n-1)/2, then the key is the median, if the left side of the array length < (n-1)/2, then the median point on the right, and vice versa, the median is on the left side. Then go to the corresponding side and continue to find the median
The average time complexity is O(n)
II. Procedures
class Solution(object): def findmedian(self, lists): if not lists or len(lists) == 0: return [] n = len(lists) if n % 2 == 0: a = (lists, n/2, 0, n-1) b = (lists, n/2-1, 0, n-1) mid = (lists[a]+lists[b])/ (2 * 1.0) return mid else: mid = (lists, n/2, 0, n-1) return lists[mid] def partition(self, lists, k, start, end): key = lists[start] left, right = start, end while left < right: while left < right and lists[right] > key: right = right - 1 lists[left] = lists[right] while left < right and lists[left] < key: left = left + 1 lists[right] = lists[left] lists[left] = key if left == k: return left elif left > k: return (lists, k, start, left-1) else: return (lists, k, left+1, end) if __name__ == "__main__": sol = Solution() lists = [2, 5, 4, 9, 3, 6, 8, 7, 1] # lists = [1, 2] data = (lists) print("upper quartile = %s" % data)
Knowledge supplement: python streaming implementation of a field sorting
I. hadoop streaming by default
1, in the default case of hadoop streaming, it is \t as the separator, standard input, each line of the first \t before the content as the key, the first \t after the content as the value. note that, if there is not a \t character, then the whole line as the key.
2, some parameters of streaming are as follows:
-D : Set the separator between key and value in map output.
-D : Set the position of the map program separator, the part before this position is used as key and the part after is used as value.
-D : Set the splitter inside the key in the output of the map.
-D : Specify the number of columns used for bucketing keys after the keys have been cut according to the delimiter when bucketing (used with -partitioner).
-D : Set the separator between key and value in reduce output.
-D : Set the position of the reduce program separator.
Second, python streaming implements the sorting of a field.
1, Enter data: cat (tab key in the center)
11 2
11 3
11 4 1
11 111 12 22
2, streaming program is as follows:
vim
#!/bin/bash export CURRENT=/home//hadoop_streaming/sort /usr/local/hadoop-2.6.3/bin/hadoop jar /usr/local/hadoop-2.6.3/share/hadoop/tools/lib/hadoop-streaming-2.6. \ -D ='\t' \ -D =3 \ -D = \ -D =-k3,3nr \ # Arranged in reverse order in the third column and can be selected according to the desired paragraph. -input "/user/test/inputdata/datas3/" \ -output "/user/test/streaming/sorted_20180711" \ -mapper "python " \ -reducer "python " \ -file "$CURRENT/" \ -file "$CURRENT/"
(2)
# -*- coding: utf-8 -*- import sys for line in : line = () print('{0}'.format(line))
(3)
# -*- coding: utf-8 -*- import sys for line in : line = () print("{0}".format(line))
Run command:
bash
Run results:
hdfs dfs -cat /user/test/streaming/sorted_20180711/part-00000
11 12 22
11 3
11 2
11 4 1
11 1
Above this python implementation in the unordered array to find the median method is all I have shared with you, I hope to be able to give you a reference, and I hope you support me more.