I’m looking for an algorithm in order to solve following problem:
Let’s say I’m organizing a course with 300 attendees and 6 workshops divided over 3 Timeframes.
Every attendee has to register on a website, and choose which 3 workshops he would like to attend, together with 2 reserve selections.
Workshops are randomly divided over the timeframes, mostly the same workshop occurs in multiple timeframes. It doesn’t matter in which timeframe the attendee follows the workshop.
The algorithm needs to generate the ideal spread of attendees over the different timeframes so they all have gotten there favorite workshops as much as possible…
Which technology can I use to generate this spread? Can I do it with ActionScript or PHP? Is there anyone with a good example?
Thanks a lot for your help!
In principle, for optimization problems like this you have the choice between exact solving methods like Linear Programming and Constraint Programming, and approximate methods like heuristics or any flavour of Local Search (which includes methods like Genetic Algorithms or Simulated Annealing).
For the problem size that you mentioned, I would definitely use an exact method, since only these guarantee that you find the global optimum. With approximate methods, you can only be sure that you found the global optimum if the cost measure has a value of zero (e.g., no constraint violations).
Version 1: Integer Programming
Your problem can be seen as a variant of Bin Packing. For this type of problem, Mixed Integer Programming (a variant of Linear Programming) is the best choice in my opinion. You will need a MIP solver, since you don’t want to program one yourself. The probably best free one can be found in the COINOR library: CLP/CBC. It’s okayish for small MIP problems, but may run into troubles with larger ones. For pure LP problems, it’s quite good, but for your particular problem, you need integral decision variables, hence MIP. For industrialsized MIP problems, you need a commercial solver. Take your pick of CPLEX, Xpress, or Gurobi. They are all excellent.
The problem can be modelled thus:

for each combination of attendee and workshop, you introduce a binary decision variable. The variable will be one if the attendee visits the workshop. In your example, there will be 1800 such variables.

for each attendee, the sum of the decision variables will be the number of visited workshops. In your example, this is three.

for each attendee, the sum of overlapping workshops is at most 1.

a unit cost is induced if an attendee must visit a reserve choice

decision variables for courses that an attendee has not selected are set to zero
The objective is then to minimize the overall cost.
Here’s a quickly written example code in ECLiPSe (a variant of Prolog):
: lib(eplex).
: lib(random).
: lib(listut).
: local struct(attendee(choices, reserve, workshops)).
generate_attendees(NA, NW, NC, NR, Atts) :
seed(1), % always generate the same set
( for(I, 1, NW), foreach(I, WD) do true ),
(
for(_I, 1, NA),
foreach(Att, Atts),
param(NC, NR, WD)
do
Att = attendee{},
generate_choices(Att, NC, NR, WD)
).
generate_choices(Att, NC, NR, WD) :
(
for(_I, 1, NC),
fromto(WD, DI, DO, RD),
foreach(C, Choices)
do
select_course(DI, C, DO)
),
(
for(_I, 1, NR),
fromto(RD, DI, DO, _),
foreach(R, Reserve)
do
select_course(DI, R, DO)
),
Att = attendee{choices:Choices, reserve:Reserve}.
select_course(L, E, RL) :
length(L, LL),
random(R),
N is R mod LL,
nth0(N, L, E, RL).
solve_with_mip(Atts, NA, NW, NC, Excl) :
decision_vars(NA, NW, Decs),
workshop_visits(Decs, NA, NW, NC),
workshop_choices(Atts, Decs, NA, NW, Cost),
workshop_exclusions(Decs, NA, Excl),
% set up solver and solve
eplex:eplex_solver_setup(min(Cost), Cost, [], []),
eplex:eplex_solve(Result),
printf("Found solution with cost: %w%n", [Result]),
% retrieve solution
eplex:eplex_get(vars, Vars),
eplex:eplex_get(typed_solution, Vals),
Vars = Vals,
retrieve_solution(Atts, Decs, NA, NW).
% make array of decision variables
decision_vars(NA, NW, Decs):
dim(Decs, [NA,NW]),
(
multifor(Idx, 1, [NA,NW]),
foreach(D, Ds),
param(Decs)
do
subscript(Decs, Idx, D),
eplex:(D $>= 0),
eplex:(D $=< 1)
),
eplex:integers(Ds).
% each attendee visits NC workshops
workshop_visits(Decs, NA, NW, NC) :
(
for(AIdx, 1, NA),
param(Decs, NW, NC)
do
(
for(WIdx, 1, NW),
fromto(0, R, D+R, Sum),
param(AIdx, Decs)
do
subscript(Decs, [AIdx, WIdx], D)
),
eplex:(Sum $= NC)
).
% each attendee must not visit a workshop not in
% her list of choices or reserve choices
% choosing a reserve workshop induces a unit cost
workshop_choices(Atts, Decs, NA, NW, Cost):
(
for(AIdx, 1, NA),
foreach(Att, Atts),
fromto(0, CI, CO, Cost),
param(Decs, NW)
do
Att = attendee{choices:Cs, reserve:Rs},
(
for(WIdx, 1, NW),
fromto(CI, ACI, ACO, CO),
param(Decs, AIdx, Cs, Rs)
do
(
memberchk(WIdx, Cs)
>
% choices do not induce cost
ACO = ACI
;
memberchk(WIdx, Rs)
>
% reserves induce unit cost
% (if decision variable is 1)
subscript(Decs, [AIdx, WIdx], D),
ACO = ACI + D
;
% other workshops must not be chosen
subscript(Decs, [AIdx, WIdx], D),
eplex:(D $= 0),
ACO = ACI
)
)
).
% some workshops overlap, so exclude each other
workshop_exclusions(Decs, NA, Excl) :
(
foreach(W1W2, Excl),
param(Decs, NA)
do
(
for(AIdx, 1, NA),
param(Decs, W1, W2)
do
subscript(Decs, [AIdx, W1], D1),
subscript(Decs, [AIdx, W2], D2),
eplex:(D1+D2 $=< 1)
)
).
% retrieve workshopnumbers for attendees
retrieve_solution(Atts, Decs, NA, NW) :
(
for(AIdx, 1, NA),
foreach(Att, Atts),
param(Decs, NW)
do
(
for(WIdx, 1, NW),
fromto([], WI, WO, Ws),
param(Decs, AIdx)
do
subscript(Decs, [AIdx, WIdx], D),
( D == 1 > WO = [WIdxWI] ; WO = WI )
),
Att = attendee{workshops:Ws}
).
test(Atts) :
NA = 300,
NW = 6,
NC = 3,
NR = 2,
% list of exclusions
Excl = [12, 13, 34, 56],
generate_attendees(NA, NW, NC, NR, Atts),
cputime(T1),
solve_with_mip(Atts, NA, NW, NC, Excl),
cputime(T2),
D1 is T2T1,
printf("Found solution with MIP in %w seconds.%n", [D1]).
I have generated attendees and their choices randomly. The exclusion list is hardcoded. Here is the output generated for a run:
? test(Atts).
Found solution with cost: 219.0
Found solution with MIP in 0.119999999999997 seconds.
Atts = [attendee([2, 3, 4], [5, 6], [6, 3, 2]), attendee(...), ...]
Yes (0.12s cpu)
I.e., in the solution, 219 times an attendee was placed in a reserve choice. Note that this is purely due to the overlap exclusion constraints, since there are no capacity constraints on the workshop sizes in the model. The selected workshops for the first attendee are 2, 3, and 6. The first choice of [2,3,4] was infeasible, since workshops 3 and 4 overlap. (I have edited out the remaining attendees from the answer)
For this test, I have used the free CLP/CBC solver from the COINOR library, which is included in the ECLiPSe distribution. COINOR also offers API libraries for C++ and Python.
Version 2: Constraint Logic Programming
Here is a second version, this time using Constraint Logic Programming. Constraint Programming is another exact solution method. Here, I use a different model:

for each attendee, construct a list of three variables. These variables denote the assigned workshops, and hence have the workshop numbers as domain. All three variables must have different values.

in order to break symmetries, I enforce that the variables must be increasing in their order.

the unwanted workshops are removed from the domains.

binding the variables to reserve choices induces unit costs

choosing a workshop for one of the variables removes any overlapping workshop from the domain of the other variables.
The key for successful Constraint Programming is selecting a clever labeling strategy, where the variables are bound to values. Since in this example, there are no capacity constraints on the workshops, one can simply choose preferred workshops until the domains contain only reserve workshops (due to the exclusion constraints). However, the value ordering is crucial here: one must start with the workshops with the fewest overlap.
If this is done, then no optimization will be necessary: the first solution will be optimal (thanks to the lack of capacity constraints). Otherwise, one will find a solution that is close to optimal, but then will have to traverse a huge search tree to find a better solution.
Here’s the code, again in ECLiPSe:
: lib(ic).
: lib(random).
: lib(lists).
: lib(listut).
: local struct(attendee(choices, reserve, workshops)).
solve_with_clp(Atts, NA, NW, NC, Excl) :
decision_vars_clp(NA, NW, NC, Decs),
workshop_choices_clp(Atts, Decs, NA, NW, NC, CostSum),
workshop_exclusions_clp(Decs, NA, NW, Excl, BestOrder),
% solve
Cost #= eval(CostSum),
% order for labeling worskhops
% start with workshop with fewest exclusions
% should be computed!
label(Decs, Atts, BestOrder),
printf("Found solution with cost: %w%n", [Cost]),
% retrieve solution
retrieve_solution_clp(Atts, Decs, NA).
% search solution
label(Decs, Atts, BestOrder) :
(
foreacharg(A, Decs),
foreach(Att, Atts),
param(BestOrder)
do
Att = attendee{choices:Cs, reserve:Rs},
label_att(A, Cs, Rs, BestOrder)
).
label_att(A, Cs, Rs, BestOrder) :
(
foreacharg(C, A),
param(Cs, Rs, BestOrder)
do
(
member(V, BestOrder),
memberchk(V, Cs)
;
member(V, BestOrder),
memberchk(V, Rs)
),
C #= V
).
% make array of decision variables
% for each attendee, one variable for every visited course is created
decision_vars_clp(NA, NW, NC, Decs):
dim(Decs, [NA,NC]),
(
multifor(Idx, 1, [NA,NC]),
foreach(D, Ds),
param(Decs)
do
subscript(Decs, Idx, D)
),
Ds #:: 1..NW,
% for each attendee, each variable must have a different value
(
for(AIdx, 1, NA),
param(Decs, NC)
do
(
for(CIdx, 1, NC),
foreach(C, Cs),
param(Decs, AIdx)
do
subscript(Decs, [AIdx,CIdx], C)
),
alldifferent(Cs),
% break symmetry by requiring ascending order
Cs = [HT],
(
foreach(C, T),
fromto(H, L, C, _)
do
C #> L
)
).
% each attendee must not visit a workshop not in
% her list of choices or reserve choices
% choosing a reserve workshop induces a unit cost
workshop_choices_clp(Atts, Decs, NA, NW, NC, Cost):
(
for(AIdx, 1, NA),
foreach(Att, Atts),
fromto(0, CI, CO, Cost),
param(Decs, NW, NC)
do
Att = attendee{choices:Cs, reserve:Rs},
% make list of costs
functor(RCost, c, NW),
(
for(I, 1, NW),
param(Rs, RCost)
do
( memberchk(I, Rs) > arg(I, RCost, 1) ; arg(I, RCost, 0) )
),
RCost =.. [_RCL],
% remove unwanted workshops
(
for(CIdx, 1, NC),
param(Decs, AIdx, Cs, Rs, NW)
do
subscript(Decs, [AIdx, CIdx], C),
(
for(I, 1, NW),
param(Cs, Rs, C)
do
(
( memberchk(I, Cs) ; memberchk(I, Rs) )
>
true
;
C #\= I
)
)
),
% add costs for workshops
(
for(CIdx, 1, NC),
fromto(CI, ACI, ACO, CO),
param(Decs, AIdx, RCL)
do
subscript(Decs, [AIdx, CIdx], C),
element(C, RCL, CCost),
ACO = ACI+CCost
)
).
% some workshops overlap, so exclude each other
% assumption: W1 < W2
% also compute best order to label workshops:
% start with lowest number of overlaps
workshop_exclusions_clp(Decs, NA, NW, Excl, BestOrder) :
(
foreach(W1W2, Excl),
param(Decs, NA)
do
(
for(AIdx, 1, NA),
param(Decs, W1, W2)
do
subscript(Decs, [AIdx], ACs),
ACs =.. [_ACList],
(
fromto(ACList, [HT], T, [_]),
param(W1, W2)
do
(
foreach(N, T),
param(H, W1, W2)
do
( H #= W1 => N #\= W2 ),
( N #= W2 => H #\= W1 )
)
)
)
),
% compute best order for labeling workshops
(
for(W, 1, NW),
foreach(CW, CWs),
param(Excl)
do
(
foreach(W1W2, Excl),
fromto(0, I, O, C),
param(W)
do
( memberchk(W, [W1,W2]) > O is I+1 ; O = I )
)
),
sort(CWs, SCWs),
( foreach(_W, SCWs), foreach(W, BestOrder) do true ).
% retrieve workshop numbers for attendees
retrieve_solution_clp(Atts, Decs, NA) :
(
for(AIdx, 1, NA),
foreach(Att, Atts),
param(Decs)
do
subscript(Decs, [AIdx], AD),
AD =.. [_Ws],
Att = attendee{workshops:Ws}
).
test(Atts1) :
NA = 300,
NW = 6,
NC = 3,
NR = 2,
% list of exclusions
Excl = [12, 13, 34, 56],
generate_attendees(NA, NW, NC, NR, Atts1),
cputime(T1),
solve_with_clp(Atts1, NA, NW, NC, Excl),
cputime(T2),
D is T2T1,
printf("Found solution with CLP in %w seconds.%n", [D]).
Note that the problem generation predicates are the same as in the MIP solution. Here’s the output for one run:
? test(A).
Found solution with cost: 219
Found solution with CLP in 0.330000000000002 seconds.
A = [attendee([2, 3, 4], [5, 6], [2, 4, 5]), ...]
Yes (0.34s cpu, solution 1, maybe more)
As you can see, it’s somewhat slower than the MIP solution. Also, the actual solution is slightly different, although it has the same cost.
Which version should you choose? It depends on what further constraints you expect to add. If it’s capacity constraints, use MIP. If there it’s more complicated constraints like scheduling constraints, then CLP will be better. With a system like ECLiPSe you could also construct hybrid algorithms.
Answer：
Some basic directions to look in:
Genetic Algorithms are a typical way to solve this problem. While they cannot promise the BEST possible schedule. They can ensure one that is good enough.
Things to look out for. You’ll want to run this once all the reservations are made. You can’t do this on the fly giving each new person a schedule as they reserve seats. In fact, no method will allow that and reach an optimal solution to the problem.
Genetic Programming is also a common method for generic schedulers. However, this is probably overkill since you don’t need a generic solution, just one specific to your conference format.
Answer：
Assuming that this is not a toy version of a real problem (i.e. there are only 6 courses and 3 timeframes), I would go with exhaustive search. The number of total solutions is 6!/ (2!2!2!) = 90 different scheduling options. For each of those options you can calculate some kind of fitness and choose the one that fits most.
However, if this is a toy version of a real problem (100s of courses and 10s of timeframes), then the problem is difficult and a combination of greedy search and simulated annealing should be very helpful.
Another option is to regulate user behavior (by using incentives/penalties) so that they choose what is good for you 🙂
Answer：
Our final solution is found here: