1. Attachments are working again! Check out this thread for more details and to report any other bugs.

A Simple Math Puzzle

Discussion in 'Fred's House of Pancakes' started by airportkid, May 24, 2007.

  1. eagle33199

    eagle33199 Platinum Member

    Joined:
    Mar 2, 2006
    5,122
    268
    0
    Location:
    Minnesota
    Vehicle:
    2015 Prius v wagon
    Model:
    Two
    The problem with that, Boulder, is that it won't just take a few milliseconds... Sure, with your sample of 5 it'll probably take less than a second - There are only 31 possible combinations to calculate. When you expand that to 100 options, you're facing a lot more possible combinations - while i don't want to calculate the whole thing, you can figure out the total number of combinations with the formula:
    for(i=0 ;i<=100; i++); sum = sum+ 100Ci;

    In this case, simply looking at 100C50 (which by definition has the largest numbers of combinations) gives you 10^29... even with a processor running at 3 GHz, you only get 3 billion clock cycles per second, and it probably takes a good 100+ clock cycles to go through a loop and calculate one combination - but for simplicities sake lets call it 30 clock cycles... I'm sure you can see where this is going: 10^29/10^8 = 10^21 seconds to calculate the entire list of possible combinations of 50 from the original set of 100.

    Even if you limit yourself to the likely range of combinations of 30 or so (because the amount is 1/3 the total), you're still dealing with 100C30 = 3*10^25.

    So brute force for this is definitely not a real viable alternative. The key to solving a problem like this is reducing the search space. There are plenty of strategies you can use to do this, based on what you know about the numbers. For example, ignoring all combinations using more than 50 numbers will cut the search space in half, saving you weeks of time. Developing an elegant solution (something similar to mine) will potentially cut the search space by a factor of 10 or more, in addition to other reductions. And using daniel's method reduces the search space to 1 - by far the most optimum :)
     
  2. airportkid

    airportkid Will Fly For Food

    Joined:
    Sep 2, 2005
    2,191
    538
    0
    Location:
    San Francisco Bay Area CA
    Vehicle:
    2005 Prius
    <div class='quotetop'>QUOTE(eagle33199 @ May 28 2007, 05:25 PM) [snapback]451209[/snapback]</div>
    Stand by - I just got back from a blissful Memorial Day holiday & haven't yet picked up my salt mine axe. I'll keep you apprised --- :)

    MB
     
  3. boulder_bum

    boulder_bum Senior Member

    Joined:
    Mar 11, 2007
    1,371
    38
    0
    Location:
    Castle Rock, CO
    Vehicle:
    2007 Prius
    <div class='quotetop'>QUOTE(eagle33199 @ May 29 2007, 07:17 AM) [snapback]451415[/snapback]</div>
    I figured it was better to spend a few minutes crafting a solution that got the job done slowly than spending a long time crafting a solution that could get the job done instantaneously, but you're right as it turns out.

    It's no surprise that the brute force algorithm can't scale with the exponential nature the problem, but I saw that airportkid was considering using Excel/VBA to come up with the solution and figured we weren't dealing with a large number of invoices.

    I naively assumed you wouldn't start running into performance problems until you ran into the thousands of invoices, but I tried randomly populating larger samples and the performance is indeed cripled very early (in the tens, not thousands of records).

    Funny thing, that exponential growth! Interesting stuff.
     
  4. airportkid

    airportkid Will Fly For Food

    Joined:
    Sep 2, 2005
    2,191
    538
    0
    Location:
    San Francisco Bay Area CA
    Vehicle:
    2005 Prius
    Well, I started out looking at Eagle's least significant digit buckets approach, but got jammed against the fact that the number of subset members isn't known - and hit the same exponential growth of candidates that stymied me at the beginning. So I Googled "subset sum" (should have done that right off but wasn't sure how to phrase the query) and found out that this is (as I suspected) a very old problem with no known polynomial solutions, and as yet no exposition that I could find that dealt with the problem in anything other than theoretical mathematical terms (translation: nothing that lent itself to a VB workup).

    Sorry, Eagle - no beer yet.

    Funny how the universe works - some of the simplest problems turn out to have no solution, or solutions beyond practical computation, while on the other hand some of the most complex manifestations of nature are at their root no more complicated than 1+1=2. Almost makes one want to believe in a deity that delights in playing practical jokes.

    Well, I'll keep banging on Google - maybe something will turn up that I can use.

    Many thanks for everyone's help!

    Mark
     
  5. eagle33199

    eagle33199 Platinum Member

    Joined:
    Mar 2, 2006
    5,122
    268
    0
    Location:
    Minnesota
    Vehicle:
    2015 Prius v wagon
    Model:
    Two
    Yeah, i knew it would still have a pretty big growth curve, but it would be a smaller curve than brute force. I'll keep thinking about it, i have a few half formed ideas floating around (i love working on problems like this :))
     
  6. daniel

    daniel Cat Lovers Against the Bomb

    Joined:
    Feb 25, 2004
    14,487
    1,518
    0
    Location:
    Spokane, WA
    Vehicle:
    2004 Prius
    I still say shred everything and run off to Mexico with the million bucks.
     
  7. eagle33199

    eagle33199 Platinum Member

    Joined:
    Mar 2, 2006
    5,122
    268
    0
    Location:
    Minnesota
    Vehicle:
    2015 Prius v wagon
    Model:
    Two
    What i'm currently pondering can be explained best through an example:

    10 numbers in the set:
    -6, -3, -2, 1, 4, 5, 7, 10, 12, 15

    goal: 30 (chosen because an obvious solution is (1, 7, 10, 12))

    Pick numbers at random until you get "near" the solution. I'm going to go with every other number to ensure my example is unbiased.

    subset: -6, -2, 4, 7, 12
    Total: 15

    Now, the basic idea is to figure out which numbers you can use to replace current numbers to get you closer to your goal. So we create a "new goal" value, being the difference between the old goal and the total:

    new goal: 15

    Next, create a grid to show replacement value - put the subset on top, and the remaining numbers down the side and fill in with the difference. Add 0 to each set to represent adding or subtracting a number:

    0 -6 -2 4 7 12
    0 6 2 -4 -7 -12
    -3 -3 3 -1 -7 -10 -15
    1 1 7 3 -3 -6 -11
    5 5 11 7 1 -2 -7
    10 10 16 12 6 3 -2
    15 15 21 17 11 8 3

    In searching this space, you first look for anything that matches the goal - in this case you have one cell, indicating that, if you add 15 to your subset you hit your goal! Solution: (-6, -2, 4, 7, 12, 15)
    Another fairly obvious solution found from this particular grid is (-6, -2, 4, 5, 7, 10, 12)

    But, lets pretend there wasn't anything quite so obvious.

    The general idea here is that you can pick any value so long as it doesn't share a row or column with another selected value. Your goal isn't to get exactly to your new goal with this selection - just to get close. Eventually, through enough iterations, you'll find your new goal in the grid and have a solution.

    So lets say our algorithm first selects the cell with a value of 16 (-6, 10). It then searches the remaining 4 rows and columns for anything that gets it closer to the solution - in this case finding a cell with a value of -1: (-2, -3)

    This gives us yet another solution: ( -3, 4, 7, 10, 12) note that none of these were the "obvious" solution i found found myself...

    Another solution could be found by selecting cells with values 21 and -6, or 8 and 7 (or the other 7)... Just one iteration through this method on my (admittedly) quickly constructed example gave us a ton of solutions that weren't too obvious from the initial set.

    Anyways, if you don't land on a solution here, you end up picking your "best guess" to get you closer to the solution, make your substitutions and you have a new subset. calculate a new goal and a new grid and repeat.


    This solution isn't guaranteed to give you a solution. Instead, it'll give you a "local minimum", so to speak - something very close in value. I would guess you could limit it to 3-5 iterations before canning a run - then just pick a new random subset to begin with.


    If you're curious, the inspiration for this algorithm came from a neural network that uses a genetic algorithm... It's very similar in that both will find a local minimum without necissarily finding the solution you're looking for.

    this method is, in one way better than other methods - it doesn't grow exponentially. however, it can be worse in that it will take an indeterminate amount of time to find the correct answer, based solely on luck.