Talk:Subset sum problem

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Low

This article has been rated as Low-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Old thread[edit]

The Subset Sum problem is a good one for addressing the NP-complete class of problems. There are three reasons for this

- It is an exact, and not an optimal problem

- It has a very simple formal definition and problem statement

- It explicitly uses the constraints of numerical addition as part of the problem.

To understand the true nature of the problem you have to understand that what makes the problem difficult is that the number of binary place values it takes to state the problem needs to be equal to the number of decision variables in the problem. Take N as the number of decision variables, and P as the precision of the problem (stated as the number of binary place values that it takes to state the problem)

P is equivalent to the number of constraints on the problem. With P < N, the problem is underdetermined, and it is easy to find a solution that will fit. With P > N, the problem is overdetermined, and the search space is so small, that it is again easy to find an N. The problem is only difficult when P = N. Addressing any other state of the problem than P = N is effectively changing the subject.

Also inherent in the Subset Problem is a concise exposition of the NP-complete conundrum. That is the conundrum of infeasible problems. Let's say you had a magical way of solving the problem in polynomial time. If you can deliver a feasible solution, then you can say "it doesn't matter how I did it, here is the solution and here is the proof that it is right (it adds up)". However, what if someone, without your knowledge, gives you an infeasible problem? Your magical solution algorithm can not deliver a feasible solution. It will either fail, go on for ever, or deliver a solution for which there is no proof of correctness and hence no one will believe you.

This means that if you actually wanted to solve real subset sum problems, you would have to come up with a polynomially sized algorithm for proving infeasibility for problems where P = N. Anything short of that is an evasion.

Most physical problems in life can be solved quite nicely with a +/- 1% error. Being asked to solve an N = 100 subset sum problem with a precision of +/- 1/(2^100) seems silly and irrelevant. However, what it does do is to state in numerical precision terms the logical complexity of the problem. The advantage of this is that it allows you to use the language of numerical analysis to address what is typically seen as a problem of logical analysis. -- PeterG

That is actually quite a useful explanation and would probably be better placed in the article than sitting here on the talk page. -- Derek Ross | Talk 04:39, 22 Jun 2004 (UTC)

You misunderstand the subset sum problem. First, it gives YES/NO, not solutions; it is possible for a provably correct algorithm to not find an actual solution. Second, in order for it to give NO, of course it must be able to handle infeasible sets. Writings about magical solutions to a related, but different, problem do not belong here; especially if they are written as if they apply to this problem. 12.207.151.144 13:59, 9 Apr 2005 (UTC)

I've done something about that. Charles Matthews 19:07, 27 Jun 2004 (UTC)

Thanks, Charles. Good Job! I didn't feel confident enough about the subject to do it myself. -- Derek Ross | Talk 15:32, 2004 Jun 28 (UTC)

Questions about recent addition[edit]

Can you guys be a little more kind please? >N, the number of decision variables

  What is 'decision' variable?

>P, the precision of the problem (stated as the number of binary place values that it takes to state the problem).

  What 'is binary place values' ? I don't get it really.
  Can you please explain it with examples?  —Preceding unsigned comment added by 61.101.222.27 (talk) 11:21, 11 April 2010 (UTC)[reply]

I too would like to know little bit more about "stated as the number of binary place values that it takes to state the problem". What is is meant by stating the problem? Anjummajeed (talk) 20:43, 30 August 2010 (UTC)[reply]

I am quite familiar with complexity theory. The text on the talk page contains several ideas I have never seen before. I have several questions about it.

Why is it important that "It explicitly uses the constraints of numerical addition as part of the problem"?
I agree about problem being underdetermined/overdetermined when P<N or P>N but I do not see why it means that finding solution is easy. If I write it as a system of P equations in the most natural way, the equations are not linear and the system does not look easy to solve.
What is the meaning of "The advantage of this is that it allows you to use the language of numerical analysis to address what is typically seen as a problem of logical analysis."?

I would prefer to see the addition removed from the article (it can certainly stay on talk page) if those concerns are not addressed. If the ideas come from a published source (textbook/paper), a reference would be useful. If they are not published, they should not be in the article. Andris 16:01, Jun 28, 2004 (UTC)

Well, no to the last part, anyway. There is plenty of room on WP for discussion round a problem. Charles Matthews 16:40, 28 Jun 2004 (UTC)

Let me clarify. The main problem is with "undetermined/overdetermined" part. Presently, that claim (that subset sum problem is hard only if P=N) looks incorrect to me. I'm fine with discussion but I would rather not have mathematically incorrect claims.

We can wait for clarification and I can look up more literature but, if we do not find evidence that the claim is correct, I still think it should not be there. The rest of discussion can stay. Andris 17:11, Jun 28, 2004 (UTC)

Yes - if it turns out that 'P < N' is really saying 'when P is much smaller than N', for example, that should be explained and modified, with a clearer explanation. Charles Matthews 18:52, 28 Jun 2004 (UTC)

"equivalence" of two probems[edit]

Let P(s) be the problem of deciding if there is a subset summing to s. The article says that the general case P(s) is "equivalent" to the special case P(0). What does this mean? In complexity theory there are many notions of equivalence. Of course we know they are both NP-complete so there is some polynomial-time transformation between them, but I guess something simpler is meant. Certainly P(0) is a special case of P(s), but how does one use P(0) to solve the general P(s)? --Zero 10:59, 3 Aug 2004 (UTC)

Good point. I have read the article before but never thought about this place. The simplest thing would be to add -s as one more number. Then, a subset summing to s plus -s gives 0. The problem is that there might be a subset of the original instance that sums to 0 and an algorithm for P(0) might return that. There might be some way of tweaking this but it is not immediately clear. Andris 11:23, Aug 3, 2004 (UTC)

This works, although I am not sure whether this is what was meant. Find a number t>0 such that everything in the original instance of P(s) is more than -t. Add t to every number in the instance of P(s), making them all positive. Create n instances of P(0), one by adding an extra number -s-t, another by adding an extra -s-2t, etc., the last one by adding extra -s-n*t If the original P(s) instance had a set of k numbers summing up to s, then adding -s-k*t creates a solution to P(0).

The disadvantage is we have to create a separate instance for each -s-i*t. Putting two of them into one instance might result in a solution to P(0) which involves both of them and corresponds to no solution of P(s). I don't yet see how to bypass that without using multiple instances of P(0). Andris 11:32, Aug 3, 2004 (UTC)

Do this works?[edit]

Let $B \subseteq \C^n$ be a; disc $D \subseteq \C \setminus \ 0\$ with center $0$.Let $B \subseteq \C^n$ be a; disc $D \subseteq \C \setminus \ 0\$ with center $0$.

--musatov

Accuracy problems with "General discussion" section[edit]

The general discussion section states that subset sum is not an optimization problem, but a few lines later it says that there is an approximation algorithm for subset sum. By definition, an approximation algorithm is a solution to an optimization problem, so these two statements seem to be contradictory.

Shlurbee 17:32, 23 December 2005 (UTC)[reply]

I've made the change to 'approximate' algorithm in the section title. There should also be some qualification made where the piped link to approximation algorithm occurs, such as 'although SSP is not strictly ...' . Charles Matthews 08:03, 24 December 2005 (UTC)[reply]

In my opinion, the General Discussion section adds nothing to the article and should be removed. Mostly it consists of "refutation" of an uninteresting strawman. --Zero 08:24, 24 December 2005 (UTC)[reply]

The trouble with cutting out all 'thinking aloud' sections is that it tends to make the coverage as a whole less accessible to non-experts. So please don't remove it, if it can be improved instead. Charles Matthews 08:37, 24 December 2005 (UTC)[reply]

I added "Although the subset sum problem is a decision problem," and also the phrase "approximation version of" in the last paragraph to make the difference between the standard and approximate versions more clear. I think that and the previous changes resolve the dispute. Shlurbee 14:45, 6 January 2006 (UTC)[reply]

Dynamic programming solution is incorrect?[edit]

It is clearly stated that Q(1,s) is true. For any natural number n (excluding 0) we have: Q(n, 0) implies Q(n+1, 0) therefore for any natural n Q(n, 0) is true. This correspond to situation when we take an empty subset. IMO the algorithm should be changed to the fallowing:

The problem can be solved as follows using dynamic programming. Suppose the sequence is

x₁, ..., x_n

and we wish to find a subset which sums to 0. Let N be the sum of the negative values and P the sum of the positive values. Define the function

Q(i,s)

to be 0 if there is no subset of x₁, ..., x_i which sums to s; 1 if there is a nonempty such subset; or 2 if only empty subset sumst to s (i.e. when s is zero).

(Thus, the value we really want is whether Q(n,0) is 2.)

Clearly

Q(i,s) = 0

if s<N or s>P. Create an array to hold the values Q(i,s) for 1≤i≤n and N≤s≤P. The array can now be filled in using a simple recursion.

Initialize all Q(1, s) to 0. Let Q(1, 0) be 1. Let Q(1, x₁) be 2. For i>1, if Q(i-1,s-x_i) is nonzero let Q(i,s) be 2 otherwise let it be value of Q(i-1,s).

Moreover, I think it should be clearly stated that we are interested only in nonempty subsets.

Correct me if I'm wrong or correct the article if I'm right :) -- mina86, 217.96.228.130 11:09, 3 April 2006 (UTC)[reply]

Yeah, surely $2^{N}$ subsets means including the empty subset? That subset doesn't sum to 0 because there's nothing in it. Therefore, it should be $2^{N}-1$ subsets. CloudNine 07:37, 29 April 2006 (UTC)[reply]

You are wrong. By convention,

\sum _{x\in \emptyset }x

is 0. --Mellum 12:43, 29 April 2006 (UTC)[reply]

(nullary sum. :)

On the page itself, the quote reads: The running time is of order $O(2^{N}N)$ , since there are $2^{N}$ subsets and, to check each subset, we need to sum at most N elements.

This is specifically talking about a computer program (or Turing machine if you like) running an algorithm to solve the problem. The notation is introduced for the specific purpose of calculating the running time. It seems perfectly obvious that the program will never perform any calculations on the empty set, so the existence of the empty set does not have any effect on the running time. Therefore, there are only $2^{N}-1$ subsets being used to calculate the running time, and the running time is $O(2^{N}-1N)$ .86.151.205.224 19:38, 8 September 2007 (UTC)[reply]

awkward definition[edit]

Almost everywhere that I have seen subset-sum defined has defined the numbers in the set to be either over the natural numbers or the positive integers. This is certainly so for Garey/Johnson and CLRS, the stated sources. Shouldn't our definition follow these canonical sources? Fremerl 19:22, 13 January 2007 (UTC)[reply]

"any" vs. "some"[edit]

I have rephrased the intro here to expressed the problem as finding some non-empty subset for which the sum equals exactly zero. For a layman, at least, the use of any is ambiguous, and suggests that the sum of any subset equals zero. Others may feel that any is more precise, and want to revert this change. That's OK, but perhaps you could find some choice of words that's not ambiguous for the lay reader. Rupert Clayton 12:11, 6 February 2007 (UTC)[reply]

Generalizations[edit]

The statement in the generalizations section, "[the subset sum problem] can actually be defined using any group", is not exactly accurate. For example Z_2 (integers modulo 2) under addition is a group, but finding the answer to the subset sum problem for a set of integers in Z_2 alone is trivial - the power set of Z_2 contains only four sets. Thus, I believe n must be a parameter of the problem, rather than as any particular fixed number (which "it can actually be defined using any group" seems to imply is allowed). I will rephrase the offending line. -- Ben-Arba 18:50, 11 February 2007 (UTC)[reply]

Exponential time algorithm not clear[edit]

I'm not convinced by the description of the exponential algorithm by Horowitz and Sahni as described in this article. My main point of confusion is on how do you decide on how to split N elements into N/2 sets? Would you get any improvement if you split on negative vs positive integers to solve the sum = 0 special case? Can someone cite to a more in depth discussion of this algorithm?

Approximation algorithm[edit]

Something is wrong, here. First, the conventional definition of an approximation algorithm is for optimization problems (specifically, problems where the answer is a number), not decision problems (where the answer is just "yes" or "no"). For an optimization problem, an approximation algorithm is one that is guaranteed to produce an answer "close to" the right answer, for some appropriate definition of "close to". But what is an approximation algorithm for a decision problem such as subset sum?

The definition given doesn't work because it's not symmetric about the target s -- if there's a subset whose sum is a little bit less than s, the algorithm might say "yes". However, if there are no subsets whose sum is between (1 − c)s and s, the algorithm must say "no", even if there is a subset whose sum is s + ε for ε as small as you want.

Is there a refereed, published source for this algorithm or is it original research?

Dricherby (talk) 16:38, 3 July 2008 (UTC)[reply]

Horowitz and Sahni[edit]

So a sorted list can be solved in linear time? Isn't this algorithm only searching for subsets of two elements? I don't get this section and can't find any web references. MadCow257 (talk) 03:44, 3 March 2009 (UTC)[reply]

I wrote in the paragraph about the Horowitz and Sahni algorithm "...no algorithm that solves all instances of the subset sum problem that uses a different strategy than that of Horowitz and Sahni's algorithm has ever been found to run faster than the order of 2^N time in the worst-case scenario." but it got deleted twice. The reason for the deletion doesn't make sense. First, the deleter said that dynamic programming beats this running-time for some instances. But my statement said "worst-case scenario", which includes all instances: When the set in the subset sum problem has integers of size on the order of N bits, dynamic programming runs on the order of 2^NN time, so the deleter's comment isn't relevant. — Preceding unsigned comment added by Logicker (talk • contribs) 18:55, 18 March 2011 (UTC)[reply]

As well as all the other reasons (it's redundant with the previous sentence, it's overly fawning and unobjective, and it assumes implicitly that N is the only variable worth using in the analysis), here's another for removing it: it's blatantly wrong. In particular the simpler algorithm that is like the Horowitz-Sahni one but uses a comparison sort instead of the more clever merge idea is O(2^N/2 N), which is better than 2^N, and there are lots of other ways of taking some similar algorithm and slowing it down, making a different algorithm that is better than 2^N. —David Eppstein (talk) 19:33, 18 March 2011 (UTC)[reply]

It's not redundant with the previous sentence because the previous sentence says there is no algorithm with a better worst-case running time, where the new sentence says that there is no algorithm with a different strategy than that of Horowitz and Sahni that beats the O(2^N) brute force search strategy. It's not blatantly wrong. The choice of which type of sorting method to use is irrelevant, as the strategy of sorting subset-sums is still the same. Also, show me an algorithm that solves all instances of subset sum in the literature that doesn't use the strategy of sorting subset-sums and runs in o(2^N) time. If you can, then the new sentence is wrong.Logicker (talk) 22:09, 18 March 2011 (UTC)[reply]

A strategy is not the same thing as an algorithm. And here's a different algorithm: loop for 1.5^N steps doing nothing, then run Horowitz/Sahni. Here's a third algorithm: split the problem into thirds, find all sums of subsets within each third, merge two of the thirds, and finally compare the merged result with the remaining third, time something like 2^2N/3. Finally, what we say in a Wikipedia article needs to be supported by sources in the literature — we should not be adding our own editorializations. If a source in the literature makes a point about nothing else coming close to Horowitz/Sahni then we can quote that, but it's not the kind of thing we should be making up ourselves. —David Eppstein (talk) 22:56, 18 March 2011 (UTC)[reply]

What about Binary Decision Diagrams (later BDDs)? They can be used to solve #P-complete problem of calculating power indices in voting games in O*(2^(n/2) (google Bolus 2011). Calculating power indices is like counting all the possible solutions of several subset sum problems. In fact BDDs can be used for any threshold function in such time. #P-complete problems are at least as difficult as NP complete. Polynomial time reduction from subset sum to calculating power indices clearly yealds O*(2^(N/2)) algorithm with different strategy - by means of Binary Decision Diagrams. Example of such reduction from subset sum instance to power indices instance: take subset sum + 1 as voting game quota, take players weights as numbers in subset sum, add player with weight 1, calculate absolute Banzhaf power index of added player. The calcualted value multiplied by 2^n is equal to the number of subsets summing to orignal subset sum in original problem. This will work only for positive numbers in subset sum but surely can be adapted for negative also. Btw, all reasoning here is a bit of overkill and Binary Decision Diagrams should be directly applicable to subset sum problem. Bartosz Meglicki (talk) 00:31, 13 December 2011 (UTC)[reply]

And what about it? You want to mention it in the article? If you can cite a reliable source with what you just said - how this algorithm can be applied to the subset sum problem and solve it in O(2^(n/2)) time, then by all means go ahead and add this information to the article. Otherwise, I don't think this should be added: O(2^(n/2)) is better than O(2^(n/2)*n) of Horowitz-Sahni, bold claims require bold evidence. We can't cite your words on the talk page here in the article as this would be original research. And "google Bolus 2011" - I don't see anything there. -- X7q (talk) 20:40, 23 December 2011 (UTC)[reply]

You mistake the intent of my comment. I was reffering to the incorrect sentence by Logicker: "...no algorithm that solves all instances of the subset sum problem that uses a different strategy than that of Horowitz and Sahni's algorithm has ever been found to run faster than the order of 2^N time in the worst-case scenario.". It has been found, it is not new, it is Binary Decision Diagrams algorithm. Also I never said it was O(2^(n/2)) but O*(2^(n/2)). '*' in the notation means 'suppressing polynomial factors' and was used for underlining the 2^(n/2) factor (which is plainly faster than 2^n in the cited sentence). My mistake for reffering to Bolus algorithm, it only made the comment difficult to understand. But that algorithm (and mentioned translatation to subset sum problem algorithm) has interesting properties: expected O(n^(2^n/2)) complexity along with pseudopolynomial complexity (like dynamic programming for subset sum) and solves #subset-sum (counts the number of all valid solutions as opposed to 'yes'/'no' decision algorithm). The paper was http://www.sciencedirect.com/science/article/pii/S0377221710006181 or the freely available preprint http://www.informatik.uni-kiel.de/~stb/files/SimpleGamesBDDs_2010.pdf. Bartosz Meglicki (talk) 09:46, 27 December 2011 (UTC)[reply]

Trouble reading dynamic programming example[edit]

I could not understand the dynamic programming example because it uses weird notation. I apologize for being naive, but what does the := operator do. Wouldn't it be easier to write that example using if-else notation. —Preceding unsigned comment added by 97.123.190.39 (talk) 16:06, 11 August 2009 (UTC)[reply]

2^{n/2} vs 1.414^n[edit]

Some anonymous editor has repeatedly attempted to replace the time bounds of the form O(2^n/2) with O(1.414ⁿ), claiming that it looks nicer.

It is not true that 2^n/2 is O(1.414ⁿ). You can't mix overestimates (big O notation) and underestimates (rounding down √2 to the nearest smaller decimal) in this way. It would be mathematically correct to write that 2^n/2 is O(1.415ⁿ). It would also be mathematically correct to write that 2^n/2 is O(179ⁿ). Neither one of these overestimates is as precise as the original formula.
The broader context is a section in which a somewhat subtle algorithm is being used to achieve a time bound of O(2^n/2), when a simpler algorithm would achieve O(n2^n/2). The subtler algorithm is faster by a factor of n. But both 2^n/2and n2^n/2 are O(1.415ⁿ), so replacing the exact formula by an approximate decimal loses all of the distinction between these two algorithms.

For these reasons I have reverted the change and intend to continue reverting it if it is made again. —David Eppstein (talk) 05:56, 17 January 2011 (UTC)[reply]

Verification of content for community review[edit]

In computer science, the subset sum problem is an important problem in complexity theory and cryptography. The problem is this: given a set of integers, does the sum of some non-empty subset equal exactly zero? For example, given the set { −7, −3, −2, 5, 8}, the answer is yes because the subset { −3, −2, 5} sums to zero. The problem is NP-Complete.

An equivalent problem is this: given a set of integers and an integer s, does any non-empty subset sum to s? Subset sum can also be thought of as a special case of the knapsack problem. One interesting special case of subset sum is the partition problem, in which s is half of the sum of all elements in the set.

Problem Solution[edit]

Win a million dollars, solve one of seven Millennium Problems. Seven great currently unsolved math problems. Favorite Millennium problem question ‘P vs. NP’. Challenge short, proof problem “hard” (i.e. NP) true hard or “easy” (i.e. P). Problem express ‘P = NP’ or ‘P != NP’. For detail describe ‘P vs. NP’, see Wikipedia article.

Hear NP problems. Famous Sales Travel problem, sales find short path connect cities. Sudoku puzzle Japan. Tetris minesweeper NP problem. Example class ‘NP-complete’, “easy” solve class result easy solve all problem. Converse, proof hard, all hard.

Simple NP-complete problem describe ‘subset sum problem’: Give set integer (both positive and negative), subset sum zero (omit empty). Simple example:

Set integer (-8, -5, 2, 3, 14) sum zero.

Answer: (-5, 2, 3). Easy? 5 number set! Difficult quick rise start add number set. If set 100 number, computer year solve. Set 1000 number > energy universe solution, best algorithm. Best algorithm solution subset sum problem O(N2^(N/2)) operate (Big-O). Give set 1000 number solution require 10002^500 operate number hard.

Catch best algorithm solve subset sum problem O(N*2^(N/2)). Proof: million dollar solution solve NP-complete problem, circuit optimize. Proof NP-complete problem hard. Cryptograph algorithm base NP-complete problem. Solve algorithm worth< cryptography.

Describe solution subset sum problem.

Simple algorithm solve subset sum problem. Integer problem set (<2^N), record possible solution array, zero solution. Create array, compute possible sum include exclude integer. Trick number operate grow between large positive sum and large negative sum.

Describe discrete convolute. Integer, create graph 1 both zero (x- axis) integer. Value zero (y-axis). Two possible integer; add, or do not include integer add zero. Result convolute intege graph. Result graph show number solution sum give value.

Example, 3 integer: (2, 3, 5). Convolve value, get solution:

Graph example express array start zero. Simple, positive integer. Start array [1], solution zero (sum NULL set).

• Add 2: [1] * [101] = [101] • Add 3: [101] * [1001] = [101101] • Add 5: [101101] * [100001] = [10110201101]

Result set sum: (0, 2, 3, 5, 7, 8, 10), two get 5. (compute MATLAB’s convolution operator).

Trick: quick compute convolute Fourier analyze. State different, possible solution subset sum problem wave. Convolute two function same product frequency present.

Solve problem involve Fourier transform.

Perform optical Fourier transform shine light small hole measure light hit wall. Angle diffract depend wavelength light. End result sinc function (sin(x)/x), Fourier transform square function, present slit.

Present integer beam light frequency correspond integer value, possible modulate multiple beam light, fourier transform (i.e. shine diffract grate) combine beam determine possible solution. Physics solves math problem.

Quantum computer perform Fourier transform, quantum computer quick solve subset-sum problem. Learn more about quantum computers, stop. Base physics solve math problem, compute device capable quick solve NP- complete problem details.

General discussion[edit]

The subset sum problem is a good introduction to the NP-complete class of problems. There are two reasons for this

It is a decision and not an optimization problem
It has a very simple formal definition and problem statement.

Although the subset sum problem is a decision problem, the cases when an approximate solution is sufficient have also been studied, in the field of approximations algorithms; one algorithm for the approximate version of the subset sum problem is given below.

A solution that has a ±1% precision is good enough for many physical problems. However, the number of place values in the problem is essentially equivalent to the number of simultaneous constraints that need to be solved. A numerical precision of 1% is approximately 0.00001₂%, or 7 binary places (any numerical error after that is less than 1⁄128 of the first digit). However, if there are 100 binary place values in the problem, solving just 7 of them amounts to solving only 7% of the constraints. Moreover, given that the volume of the solution space in this case would be 2¹⁰⁰, and you have only covered a volume of 2⁷, then there is still a solution space of 2^{99.999999999999999999999999999854} (= 1267650600228229401496703205376 - 128) left uncovered. In this way, a solution with a 1% numerical precision has covered essentially none of the real problem. The only way that a solution to the subset sum problem can be used as a solution to other NP problems is to solve all of the problem (and all of the constraints) exactly.

In cryptography, it is actually important to solve real subset sum problems exactly. The subset sum problem comes up when a codebreaker attempts, given a message and ciphertext, to deduce the secret key. A key that is not equal to but within ±1% of the real key is essentially useless for the codebreaker due to the avalanche effect, which causes very similar keys to produce very different results.

The complexity of subset sum[edit]

The complexity (difficulty of solution) of subset sum can be viewed as depending on two parameters, N, the number of decision variables, and P, the precision of the problem (stated as the number of binary place values that it takes to state the problem). (Note: here the letters N and P mean something different than what they mean in the NP class of problems.)

The complexity of the best known algorithms is exponential in the smaller of the two parameters N and P. Thus, the problem is most difficult if N and P are of the same order. It only becomes easy if either N or P becomes very small.

If N (the number of variables) is small, then an exhaustive search for the solution is practical. If P (the number of place values) is a small fixed number, then there are dynamic programming algorithms that can solve it exactly.

What is happening is that the problem becomes seemingly non- exponential when it is practical to count the entire solution space. There are two ways to count the solution space in the subset sum problem. One is to count the number of ways the variables can be combined. There are 2^N possible ways to combine the variables. However, with N = 10, there are only 1024 possible combinations to check. These can be counted easily with a branching search. The other way is to count all possible numerical values that the combinations can take. There are 2^P possible numerical sums. However, with P = 5 there are only 32 possible numerical values that the combinations can take. These can be counted easily with a dynamic programming algorithm. When N = P and both are large, then there is no aspect of the solution space that can be counted easily.

Efficient algorithms for both small N and small P cases are given below.

Exponential time algorithm[edit]

'There are several ways to solve subset sum in time exponential in N. The most naïve algorithm would be to cycle through all subsets of N numbers and, for every one of them, check if the subset sums to the right number. The running time is of order O(2^NN), since there are 2^N subsets and, to check each subset, we need to sum at most N elements.

A better exponential time algorithm is known, which runs in time O(2^N/2N). The algorithm splits arbitrarily the N elements into two sets of N/2 each. For each of these two sets, it calculates sums of all 2^N/2 possible subsets of its elements and stores them in an array of length 2^N/2. It then sorts each of these two arrays, which can be done in time O(2^N/2N). When arrays are sorted, the algorithm can check if an element of the first array and an element of the second array sum up to s in time O(2^N/2). To do that, the algorithm passes through the first array in decreasing order (starting at the largest element) and the second array in increasing order (starting at the smallest element). Whenever the sum of the current element in the first array and the current element in the second array is more than s, the algorithm moves to the next element in the first array. If it is less than s, the algorithm moves to the next element in the second array. If two elements with sum s are found, it stops. No better algorithm has been found since Horowitz and Sahni first published this algorithm in 1974^[1].

Pseudo-polynomial time dynamic programming solution[edit]

The problem can be solved as follows using dynamic programming. Suppose the sequence is

x₁, ..., x_n

and we wish to determine if there is a nonempty subset which sums to 0. Let N be the sum of the negative values and P the sum of the positive values. Define the boolean valued function Q(i,s) to be the value (true or false) of

"there is a nonempty subset of x₁, ..., x_i which sums to s".

Thus, the solution to the problem is the value of Q(n,0).

Clearly, Q(i,s) = false if s < N or s > P so these values do not need to be stored or computed. Create an array to hold the values Q(i,s) for 1 ≤ i ≤ n and N≤ s ≤ P.

The array can now be filled in using a simple recursion. Initially, for N ≤ s ≤ P, set

Q(1,s) := (x₁ = s).

Then, for i = 2, …, n, set

Q(i,s) := Q(i-1,s) or (x_i = s) or Q(i-1,s-x_i)

for N ≤ s ≤ P.

For each assignment, the values of Q on the right side are already known, either because they were stored in the table for the previous value of i or because Q(i-1,s-x_i) = false if s-x_i < N or s-x_i > P. Therefore, the total number of arithmetic operations is O(n(P − N)). For example, if all the values are O(n^k) for some k, then the time required is O(n^k+2).

This algorithm is easily modified to return the subset with sum 0 if there is one.

This solution does not count as polynomial time in complexity theory because P-N is not polynomial in the size of the problem, which is the number of bits used to represent it. This algorithm is polynomial in the value of N and P, which are exponential in their numbers of bits.

A more general problem asks for a subset summing to a specified value (not necessarily 0). It can be solved by a simple modification of the algorithm above. For the case that each x_i is positive and bounded by the same constant, Pisinger found a linear time algorithm.^[2]

Polynomial time approximate algorithm[edit]

An approximate version of the subset sum would be: given a set of N numbers x₁, x₂, ..., x_N and a number s, output

yes, if there is a subset that sums up to s;
no, if there is no subset summing up to a number between (1-c)s

and s for some small c>0;

any answer, if there is a subset summing up to a number between (1-c)s and s but no subset summing up to s.

If all numbers are non-negative, the approximate subset sum is solvable in time polynomial in N and 1/c.

The solution for subset sum also provides the solution for the original subset sum problem in the case where the numbers are small (again, for nonnegative numbers). If any sum of the numbers can be specified with at most P bits, then solving the problem approximately with c=2^-P is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in N and 2^P (i.e., exponential in P).

The algorithm for the approximate subset sum problem is as follows: initialize a list S to contain one element 0. for each i from 1 to N do let T be a list consisting of x_i+y, for all y in S let U be the union of T and S sort U make S empty let y be the smallest element of U add y to S for each element z of U in increasing order do //trim the list by eliminating numbers close one to another if y<(1-c/N)z, set y=z and add z to S if S contains a number between (1-c)s and s, output yes, otherwise no The algorithm is polynomial time because the lists S, T and U always remain of size polynomial in N and 1/c and, as long as they are of polynomial size, all operations on them can be done in polynomial time. The size of lists is kept polynomial by the trimming step, in which we only include a number z into S if the previous y is at most

(1 − c/N)z.

This step ensures that each element in S is smaller than the next one by at least a factor of (1 − c/N) and any list with that property is of at most polynomial size.

The algorithm is correct because each step introduces a multiplicative error of at most (1 −c/N) and N steps together introduce the error of at most

(1 − c/N)^N < 1 − c.

References[edit]

^ Ellis Horowitz and Sartaj Sahni (1974). "Computing Partitions with Applications to the Knapsack Problem". JACM, Volume 21, Issue 2, 277-292, April 1974
^ Pisinger D (1999). "Linear Time Algorithms for Knapsack Problems with Bounded Weights". Journal of Algorithms, Volume 33, Number 1, October 1999, pp. 1-14

[http://www.bandacity.com/gyan/algorithm.html C implementation of Subset Sum problem] ar:مسألة مجموع المجموعات الجزئية de:Untermengensumme es:Problema de la suma de subconjuntos fa:مسئله جمع زیرمجموعه‌ها fr:Problème de la somme de sous-ensembles ko:부분집합 합 문제 ja:部分和問題 pl:Problem sumy podzbioru tr:Alt küme toplamı problemi zh:子集合加總問題 ..

Unsatisfactory Complexity section[edit]

Currently the complexity section uses terms like "decision variable" and "precision" of the problem without explaining what these terms are or how they are concretely represented in the subset sum problem. For those who don't already know the mapping, this makes the section fairly useless. --Mark viking (talk) 18:59, 4 April 2013 (UTC)[reply]

NO ETH based lower bound mentioned[edit]

The standard reduction from 3-Sat to subset sum implies that there is no 2^{o(n)} algorithm for the subset sum problem, unless ETH fails. ETH is the exponential time hypothesis that can be also found on wikipedia.

Does anyone know a source, where this is mentioned?

I think it should be mentioned, as it shows that we cannot hope for much better algorithms.

An introduction to ETH can be found in the new parametrized complexity book:

http://www.barnesandnoble.com/w/parameterized-algorithms-marek-cygan/1122052557 — Preceding unsigned comment added by Tillmannberlin (talk • contribs) 11:34, 17 November 2015 (UTC)[reply]

Misunderstandable formulation[edit]

The introduction contains the formulation:

"The problem is known to be NP-hard. Moreover, some restricted variants of it are NP-complete too"

I interpreted the "too" in the sense of "The problem is NP-hard. Some restricted variants are not only NP-hard, but even NP-complete."

Can we change the first sentence into "The problem is known to be NP-complete" to avoid this ambiguity? — Preceding unsigned comment added by 77.57.114.138 (talk) 12:20, 30 September 2023 (UTC)[reply]

Looks ok to me. Many NP-hard problems have a complication that the natural formulation is not actually in NP, because it is an optimization problem rather than a decision problem, but that doesn't appear to be an issue here. —David Eppstein (talk) 16:45, 30 September 2023 (UTC)[reply]

[1] Ellis Horowitz and Sartaj Sahni (1974). "Computing Partitions with Applications to the Knapsack Problem". JACM, Volume 21, Issue 2, 277-292, April 1974

[Pisinger09-2] Pisinger D (1999). "Linear Time Algorithms for Knapsack Problems with Bounded Weights". Journal of Algorithms, Volume 33, Number 1, October 1999, pp. 1-14

[1]

[2]