https://leetcode.com/problems/find-median-from-data-stream/#/description

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Design a data structure that supports the following two operations:

• void addNum(int num) - Add a integer number from the data stream to the data structure.
• double findMedian() - Return the median of all elements so far.

https://discuss.leetcode.com/topic/27521/short-simple-java-c-python-o-log-n-o-1

Short simple Java/C++/Python, O(log n) + O(1)

I keep two heaps (or priority queues):

• Max-heap small has the smaller half of the numbers.
• Min-heap large has the larger half of the numbers.

This gives me direct access to the one or two middle values (they’re the tops of the heaps), so getting the median takes O(1) time. And adding a number takes O(log n) time.

Supporting both min- and max-heap is more or less cumbersome, depending on the language, so I simply negate the numbers in the heap in which I want the reverse of the default order. To prevent this from causing a bug with -231 (which negated is itself, when using 32-bit ints), I use integer types larger than 32 bits.

Using larger integer types also prevents an overflow error when taking the mean of the two middle numbers. I think almost all solutions posted previously have that bug.

Update: These are pretty short already, but by now I wrote even shorter ones.

Java

Props to larrywang2014’s solution for making me aware that I can use Queue in the declaration instead of PriorityQueue (that’s all I got from him, though (just saying because I just saw he changed his previously longer addNum and it’s now equivalent to mine)).

C++

Big thanks to jianchao.li.fighter for telling me that C++’s priority_queue is a max-queue (see comments below).

Python

Java/Python two heap solution, O(log n) add, O(1) find

The invariant of the algorithm is two heaps, small and large, each represent half of the current list. The length of smaller half is kept to be n / 2 at all time and the length of the larger half is either n / 2 or n / 2 + 1 depend on n’s parity.

This way we only need to peek the two heaps’ top number to calculate median.

Any time before we add a new number, there are two scenarios, (total n numbers, k = n / 2):

After adding the number, total (n + 1) numbers, they will become:

Here we take the first scenario for example, we know the large will gain one more item and small will remain the same size, but we cannot just push the item into large. What we should do is we push the new number into small and pop the maximum item from small then push it into large (all the pop and push here are heappop and heappush). By doing this kind of operations for the two scenarios we can keep our invariant.

Therefore to add a number, we have 3 O(log n) heap operations. Luckily the heapq provided us a function “heappushpop” which saves some time by combine two into one. The document says:

Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().

Alltogether, the add operation is O(logn), The findMedian operation is O(1).

Note that the heapq in python is a min heap, thus we need to invert the values in the smaller half to mimic a “max heap”.

A further observation is that the two scenarios take turns when adding numbers, thus it is possible to combine the two into one. For this please see stefan’s post

Java

Python

https://discuss.leetcode.com/topic/27541/very-short-o-log-n-o-1

Very Short, O(log n) + O(1)

Same idea as before, but really exploiting the symmetry of the two heaps by switching them whenever a number is added. Still O(log n) for adding and O(1) for median. Partially inspired by peisi’s updated solution.

Update: Added a new Java version (the first one).

Java

Or:

Python

Or:

https://discuss.leetcode.com/topic/27613/my-c-priority_queue-based-solution-140-ms

My C++ priority_queue based solution (140 ms)

The idea is to use two heaps (one max heap, one mn heap) to save the input data. firstQ is a max_heap to save the first half of the data with smaller values, and secQ is a min_heap to save the second half of the data with bigger values. Everytime when inserting a new value, we first compare if it is smaller than the top of firstQ (the largest value of the first half), if so, insert into firstQ. Otherwise, it belongs to the second half. After inserting, we have to balance the first half and the second half to make sure either they have the same length or the length difference is only 1.
The median will be the mean of two top elements (when they have the same length) or the top element of the queue with a larger length.

https://discuss.leetcode.com/topic/27598/solution-using-binary-search-tree

Solution using Binary Search Tree

As the input numbers are random, so the height of the binary search tree is O(logN)

We maintain every single node’s children’s size and it’s easy to implement because it just has add operation.