The Maximum Subarray Problem
This problem is a classic. Or so I’m told. The maximum subarray problem is involves finding the maximum sum out of all contiguous subarrays. It’s one of those problems that seems straightforward, and then boom - complicated edge cases seemingly appear from the heavens to say “Not so fast!". But that’s the essence of algorithm problems, right? They assess a developer’s ability to see beyond the scope of some given test cases and construct a robust solution.
The Problem
Given an integer array nums
, find the contiguous subarray (containing at least one number) which has the largest sum and return its sum.
Example:
- Input:
[-2, 1,-3, 4,-1, 2, 1,-5, 4]
- Output:
6
- Explanation:
[ 4,-1, 2, 1]
has the largest sum of 6.
A First Solution
So this looks like a great problem to use prefix sums, right? You can iterate over the array and keep track of the minimum and maximum prefix sums you’ve encountered, and then max_prefix - min_prefix
will give you the largest sum! Just look at the prefix sum array for our example below:
Prefix Sum Array:[-2,-1,-4, 0,-1, 1, 2,-3, 2]
Our answer is simply 2-(-4)=6
. This works because the difference in two prefix sums gives you the sum of all elements between where max_prefix
and min_prefix
are located. Straightforward. time to code it up..
“Not so fast!”
You probably already found the fault in my reasoning, but I was not so lucky to see the issue until I had written code. There are some cases that I hadn’t considered:
- What happens if the list is only one element? Then
max_prefix
andmin_prefix
are the same, so you get a return of 0. This isn’t correct! - Suppose you have an array
[-2, 1,-3, 4,-1, 2, 1,-5,-9]
. Now themin_prefix
occurs after themax_prefix
! Obviously the answer is the same as in the given example, so this is incorrect.
This is the motive behind algorithm questions - can you handle the ambiguity? Programming a solution to a problem only works when the problem is well-defined, and it’s usually up to the developer to define what the scope is. If our given array was guaranteed to have at least two elements along with min_prefix
occuring before max_prefix
, then this solution would be ideal! Unfortunately, this is not the case for the problem on Leetcode: My original solution failed for the second edge case listed.
The Solution to our “Solution”
But alas, there is hope! The simple solution is two use a conditional to filter the first edge case, and use the current prefix_sum
instead of max_prefix
. Because the current prefix_sum
is always ahead of min_prefix
we never have to deal with the second edge case. At each point in the iteration we take the max of two values: the current result
and prefix_sum - min_prefix
. If the current prefix sum is lower than the minimum prefix sum encountered, we update min_prefix
. Now we can implement a solution using prefix sums! Before we do so, it helps to go ahead and write down all the contraints for the problem, just so we don’t have to deal with any more unexpected edge cases:
- input array
nums
is always at least one element long
Wow, not many constraints! This is the reason problem is unecesarily tricky; It is very easy to underestimate the complexity of open-ended problems. Anyway, here’s the implementation using prefix sums:
def maxSubArray(self, nums):
"""
Finds maximum sum of all contiguous subarrays
using prefix sums!
input:
`nums`: array of integers
returns:
`result`: integer
"""
# edge case, list is one element long
if len(nums) == 1:
return nums[0]
# Initialize variables
min_prefix = 0
prefix_sum = 0
result = -float(inf)
# compute prefix sum, find sum of subarray, update min_prefix
for num in nums:
prefix_sum += num
result = max(result, prefix_sum - min_prefix)
min_prefix = min(min_prefix, prefix_sum)
return result
A Second Solution
Now entering Kadane’s algorithm. I’m don’t have a formal background in Computer Science, so I didn’t have the benefit of already being exposed to Kadane’s algorithm upon first encountering “The Maximum Subarray Problem”. The way Kadane’s algorithm works is actually very similar to the prefix sum method, and while it performs about the same, the code looks much cleaner. It works by keeping track of sums as you iterate through the array (go figure).
Consider iterating through the list manually with a working subarray, trying to find the max subarray. Kadane’s algorithm says you have two options at each step:
- Ditch your current working subarray and start a new working subarray array with the next element.
- Keep your current working subarray and add append the next element to it.
Phrased like this, all we have left is to figure out how to decide between these choices. Because the problem is concerned with the maximum sum of all subarrays, all we need to do is check to see if the next element will make our sum larger than it already is. If our current sum is larger, we stick with option 1, and if our would-be sum is larger, we go with option 2. The real beauty of this solution is that you can implement it in such a way that the edge cases are taken care of without any additional logic! Such an implementation might look like:
def maxSubArray(self, nums):
"""
Finds maximum sum of all contiguous subarrays
using Kadane's Algorithm
input:
`nums`: array of integers
returns:
`max(nums)`: integer
"""
# start at second element
for i in range(1,len(nums)):
# increase the local sum if next number increases it
if nums[i-1]+nums[i] > nums[i]:
nums[i] = nums[i-1]+nums[i]
# otherwise, start new local sum
else:
nums[i] = nums[i]
# Return the max of all local max subarray sums
return max(nums)
This is purposefully lengthy to draw a comparison to the options above. Making it short and sweet, we get a very eloquent solution:
def maxSubArray(self, nums):
"""
Finds maximum sum of all contiguous subarrays
using Kadane's Algorithm
input:
`nums`: array of integers
returns:
`max(nums)`: integer
"""
for i in range(1,len(nums)):
nums[i] = max(nums[i-1]+nums[i], nums[i])
return max(nums)
Just like any algorithm problem, there are trade-offs to the solutions. Because of what I can only assume is overhead in accessing Python lists by indices, Kadane’s algorithm is actually slower than the solution using prefix sums. This totally shocked me, but the difference is only 20ms by Leetcode’s measurement, so there probably isn’t much difference. The implementations of Kadane’s modifies the nums
array, but since Python is weird, new items are created on the heap and the pointers are modified to point to the new elements. In a statically typed language, Kadane’s would beat out Prefix Sums in terms of memory, but they still rank the same in runtime complexity (that is O(n)
).