295. Find Median from Data Stream

https://leetcode.com/problems/find-median-from-data-stream/#/description

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

1
2
3
4
Examples: 
[2,3,4] , the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) - Add a integer number from the data stream to the data structure.
  • double findMedian() - Return the median of all elements so far.
1
2
3
4
5
6
7
For example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3)
findMedian() -> 2

https://discuss.leetcode.com/topic/27521/short-simple-java-c-python-o-log-n-o-1

Short simple Java/C++/Python, O(log n) + O(1)

I keep two heaps (or priority queues):

  • Max-heap small has the smaller half of the numbers.
  • Min-heap large has the larger half of the numbers.

This gives me direct access to the one or two middle values (they’re the tops of the heaps), so getting the median takes O(1) time. And adding a number takes O(log n) time.

Supporting both min- and max-heap is more or less cumbersome, depending on the language, so I simply negate the numbers in the heap in which I want the reverse of the default order. To prevent this from causing a bug with -231 (which negated is itself, when using 32-bit ints), I use integer types larger than 32 bits.

Using larger integer types also prevents an overflow error when taking the mean of the two middle numbers. I think almost all solutions posted previously have that bug.

Update: These are pretty short already, but by now I wrote even shorter ones.

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class MedianFinder {

private Queue<Long> small = new PriorityQueue(),
large = new PriorityQueue();

public void addNum(int num) {
large.add((long) num);
small.add(-large.poll());
if (large.size() < small.size())
large.add(-small.poll());
}

public double findMedian() {
return large.size() > small.size()
? large.peek()
: (large.peek() - small.peek()) / 2.0;
}
};

Props to larrywang2014’s solution for making me aware that I can use Queue in the declaration instead of PriorityQueue (that’s all I got from him, though (just saying because I just saw he changed his previously longer addNum and it’s now equivalent to mine)).

C++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class MedianFinder {
priority_queue<long> small, large;
public:

void addNum(int num) {
small.push(num);
large.push(-small.top());
small.pop();
if (small.size() < large.size()) {
small.push(-large.top());
large.pop();
}
}

double findMedian() {
return small.size() > large.size()
? small.top()
: (small.top() - large.top()) / 2.0;
}
};

Big thanks to jianchao.li.fighter for telling me that C++’s priority_queue is a max-queue (see comments below).

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from heapq import *

class MedianFinder:

def __init__(self):
self.heaps = [], []

def addNum(self, num):
small, large = self.heaps
heappush(small, -heappushpop(large, num))
if len(large) < len(small):
heappush(large, -heappop(small))

def findMedian(self):
small, large = self.heaps
if len(large) > len(small):
return float(large[0])
return (large[0] - small[0]) / 2.0

https://discuss.leetcode.com/topic/27522/java-python-two-heap-solution-o-log-n-add-o-1-find

Java/Python two heap solution, O(log n) add, O(1) find

The invariant of the algorithm is two heaps, small and large, each represent half of the current list. The length of smaller half is kept to be n / 2 at all time and the length of the larger half is either n / 2 or n / 2 + 1 depend on n’s parity.

This way we only need to peek the two heaps’ top number to calculate median.

Any time before we add a new number, there are two scenarios, (total n numbers, k = n / 2):

1
2
(1) length of (small, large) == (k, k)
(2) length of (small, large) == (k, k + 1)

After adding the number, total (n + 1) numbers, they will become:

1
2
(1) length of (small, large) == (k, k + 1)
(2) length of (small, large) == (k + 1, k + 1)

Here we take the first scenario for example, we know the large will gain one more item and small will remain the same size, but we cannot just push the item into large. What we should do is we push the new number into small and pop the maximum item from small then push it into large (all the pop and push here are heappop and heappush). By doing this kind of operations for the two scenarios we can keep our invariant.

Therefore to add a number, we have 3 O(log n) heap operations. Luckily the heapq provided us a function “heappushpop” which saves some time by combine two into one. The document says:

Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().

Alltogether, the add operation is O(logn), The findMedian operation is O(1).

Note that the heapq in python is a min heap, thus we need to invert the values in the smaller half to mimic a “max heap”.

A further observation is that the two scenarios take turns when adding numbers, thus it is possible to combine the two into one. For this please see stefan’s post

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
private PriorityQueue<Integer> small = new PriorityQueue<>(Collections.reverseOrder());
private PriorityQueue<Integer> large = new PriorityQueue<>();
private boolean even = true;

public double findMedian() {
if (even)
return (small.peek() + large.peek()) / 2.0;
else
return small.peek();
}

public void addNum(int num) {
if (even) {
large.offer(num);
small.offer(large.poll());
} else {
small.offer(num);
large.offer(small.poll());
}
even = !even;
}

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from heapq import *


class MedianFinder:
def __init__(self):
self.small = [] # the smaller half of the list, max heap (invert min-heap)
self.large = [] # the larger half of the list, min heap

def addNum(self, num):
if len(self.small) == len(self.large):
heappush(self.large, -heappushpop(self.small, -num))
else:
heappush(self.small, -heappushpop(self.large, num))

def findMedian(self):
if len(self.small) == len(self.large):
return float(self.large[0] - self.small[0]) / 2.0
else:
return float(self.large[0])

# 18 / 18 test cases passed.
# Status: Accepted
# Runtime: 388 ms

https://discuss.leetcode.com/topic/27541/very-short-o-log-n-o-1

Very Short, O(log n) + O(1)

Same idea as before, but really exploiting the symmetry of the two heaps by switching them whenever a number is added. Still O(log n) for adding and O(1) for median. Partially inspired by peisi’s updated solution.

Update: Added a new Java version (the first one).

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class MedianFinder {

Queue<Integer> q = new PriorityQueue(), z = q, t,
Q = new PriorityQueue(Collections.reverseOrder());

public void addNum(int num) {
(t=Q).add(num);
(Q=q).add((q=t).poll());
}

public double findMedian() {
return (Q.peek() + z.peek()) / 2.;
}
};

Or:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class MedianFinder {

Queue[] q = {new PriorityQueue(), new PriorityQueue(Collections.reverseOrder())};
int i = 0;

public void addNum(int num) {
q[i].add(num);
q[i^=1].add(q[i^1].poll());
}

public double findMedian() {
return ((int)(q[1].peek()) + (int)(q[i].peek())) / 2.0;
}
};

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from heapq import *

class MedianFinder:

def __init__(self):
self.heaps = None, [], []
self.i = 1

def addNum(self, num):
heappush(self.heaps[-self.i], -heappushpop(self.heaps[self.i], num * self.i))
self.i *= -1

def findMedian(self):
return (self.heaps[self.i][0] * self.i - self.heaps[-1][0]) / 2.0

Or:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from heapq import *

class MedianFinder:

def __init__(self):
self.data = 1, [], []

def addNum(self, num):
sign, h1, h2 = self.data
heappush(h2, -heappushpop(h1, num * sign))
self.data = -sign, h2, h1

def findMedian(self):
sign, h1, h2 = d = self.data
return (h1[0] * sign - d[-sign][0]) / 2.0

https://discuss.leetcode.com/topic/27613/my-c-priority_queue-based-solution-140-ms

My C++ priority_queue based solution (140 ms)

The idea is to use two heaps (one max heap, one mn heap) to save the input data. firstQ is a max_heap to save the first half of the data with smaller values, and secQ is a min_heap to save the second half of the data with bigger values. Everytime when inserting a new value, we first compare if it is smaller than the top of firstQ (the largest value of the first half), if so, insert into firstQ. Otherwise, it belongs to the second half. After inserting, we have to balance the first half and the second half to make sure either they have the same length or the length difference is only 1.
The median will be the mean of two top elements (when they have the same length) or the top element of the queue with a larger length.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class MedianFinder {
private:
priority_queue<int> firstQ; // max_heap for the first half
priority_queue<int, std::vector<int>, std::greater<int> > secQ; // min_heap for the second half
public:
// Adds a number into the data structure.
void addNum(int num) {
if(firstQ.empty() || (firstQ.top()>num)) firstQ.push(num); // if it belongs to the smaller half
else secQ.push(num);

// rebalance the two halfs to make sure the length difference is no larger than 1
if(firstQ.size() > (secQ.size()+1))
{
secQ.push(firstQ.top());
firstQ.pop();
}
else if(firstQ.size()+1<secQ.size())
{
firstQ.push(secQ.top());
secQ.pop();
}
}

// Returns the median of current data stream
double findMedian() {
if(firstQ.size() == secQ.size()) return firstQ.empty()?0:( (firstQ.top()+secQ.top())/2.0);
else return (firstQ.size() > secQ.size())? firstQ.top():secQ.top();
}
};

https://discuss.leetcode.com/topic/27598/solution-using-binary-search-tree

Solution using Binary Search Tree

As the input numbers are random, so the height of the binary search tree is O(logN)

We maintain every single node’s children’s size and it’s easy to implement because it just has add operation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
struct BST {
struct node {
int val;
int size;
node* left, *right;
node(int v) : size(1), val(v) {};
} *Null, *root;

BST() {
Null = new node(0);
Null -> size = 0;
root = Null;
}

void add(int val, node*& R) {
if(R == Null) {
R = new node(val);
R -> left = R -> right = Null;
return;
}
if(R->val <= val) add(val, R->left);
else add(val, R->right);
R->size = R->left->size + R->right->size + 1;

}

int rank(int k) {
node* t = root;
while(true) {
int leftSize = t -> left -> size;
if(leftSize == k) return t -> val;
if(leftSize > k) {
t = t -> left;
} else {
k = k - leftSize - 1;
t = t -> right;
}
}
return -1;
}
};




class MedianFinder {
public:
BST* bst;
MedianFinder() {
bst = new BST();
}
// Adds a number into the data structure.
void addNum(int num) {
bst->add(num, bst->root);
}

// Returns the median of current data stream
double findMedian() {
int sz = bst -> root -> size;
if(sz % 2 == 0) {
return 1.0 * (bst -> rank(sz / 2) + bst -> rank(sz / 2 - 1)) / 2;
} else return bst->rank(sz / 2);

}
};
谢谢你,可爱的朋友。