Generate Lexicographic Series Efficiently In Python
Solution 1:
Very effective algorithm adapted from Jorg Arndt book "Matters Computational"
(Chapter 7.2 Co-lexicographic order for compositions into exactly k parts
)
n = 4
k = 3
x = [0] * n
x[0] = k
while True:
print(x)
v = x[-1]
if (k==v ):
break
x[-1] = 0
j = -2while (0==x[j]):
j -= 1
x[j] -= 1
x[j+1] = 1 + v
[3, 0, 0, 0]
[2, 1, 0, 0]
[2, 0, 1, 0]
[2, 0, 0, 1]
[1, 2, 0, 0]
[1, 1, 1, 0]
[1, 1, 0, 1]
[1, 0, 2, 0]
[1, 0, 1, 1]
[1, 0, 0, 2]
[0, 3, 0, 0]
[0, 2, 1, 0]
[0, 2, 0, 1]
[0, 1, 2, 0]
[0, 1, 1, 1]
[0, 1, 0, 2]
[0, 0, 3, 0]
[0, 0, 2, 1]
[0, 0, 1, 2]
[0, 0, 0, 3]
Number of compositions and time on seconds for plain Python (perhaps numpy arrays are faster) for n=100, and k = 2,3,4,5 (2.8 ghz Cel-1840)
250500.04000020027160644531717000.99000144004821784442127520.02204465866089591962520372.03577995300293I expect time2 hours for 100/6 generation
Same with numpy arrays (x = np.zeros((n,), dtype=int)
) gives worse results - but perhaps because I don't know how to use them properly
25050 0.0799999237060546931717002.3900032043457034442127554.74532389640808
Native code (this is Delphi, C/C++ compilers might optimize better) generates 100/6 in 21 seconds
3 171700 0.012
4 4421275 0.125
5 91962520 1.544
6 1609344100 20.748
Cannot go sleep until all measurements aren't done :)
MSVS VC++: 18 seconds! (O2 optimization)
5 91962520 1.466
6 1609344100 18.283
So 100 millions variants per second. A lot of time is wasted for checking of empty cells (because fill ratio is small). Speed described by Arndt is reached on higher k/n ratios and is about 300-500 millions variants per second:
n=25, k=152514084066060.981400 millions per second
Solution 2:
My recommendations:
- Rewrite it as a generator utilizing
yield
, rather than a loop that concatenates a global variable on each iteration. - Keep a running sum instead of calculating the sum of some subset of the array representation of the number.
- Operate on a single instance of your working number representation instead of splicing a copy of it to a temporary variable on each iteration.
Note no particular order is implied.
Solution 3:
I have a better solution using itertools as follows,
from itertools import product
n = 4#number of elements
s = 3#sum of elements
r = []
for x inrange(n):
r.append(x)
result = [p for p in product(r, repeat=n) ifsum(p) == s]
print(len(result))
print(result)
I am saying this is better because it took 0.1 secs on my system, while your code with numpy took 0.2 secs.
But as far as n=100 and s=6, this code takes time to go through all the combinations, I think it will take days to compute the results.
Solution 4:
I found a solution using itertools as well (Source: https://bugs.python.org/msg144273). Code follows:
import itertools
import operator
defcombinations_with_replacement(iterable, r):
# combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
pool = tuple(iterable)
n = len(pool)
ifnot n and r:
return
indices = [0] * r
yieldtuple(pool[i] for i in indices)
whileTrue:
for i inreversed(range(r)):
if indices[i] != n - 1:
breakelse:
return
indices[i:] = [indices[i] + 1] * (r - i)
yieldtuple(pool[i] for i in indices)
int_part = lambda n, k: (tuple(map(c.count, range(k))) for c in combinations_with_replacement(range(k), n))
for item in int_part(3,4): print(item)
Post a Comment for "Generate Lexicographic Series Efficiently In Python"