Note that this program is deprecated and replaced by the Generic Allocator.
This program was created during an internship at the chair in informatics 6 of the Julius-Maximilians-Universität Würzburg by Martins Avots and Kirill Djebko with assistance of Prof. Dr. Frank Puppe.
Given a set of supplies and a set of demands an allocation assigns each object of a subset of demands an element of supplies. Each supply can only be assigned to a single demand. A set of preferences contain for some demands one or more preferred supplies. A set of constraints describe some properties which should be met by an allocation. The rating of an allocation quantifies its compliance with the given preferences and constraints. The higher the rating the more constraints and preferences are not met by an allocation. A rating of zero means that every preference and constraint is met by an allocation. An optimal allocation is an allocation in which each demand is assigned to a supply and where the rating is minimal.
A classical example for such an assignment problem would be the Sudoku puzzle in which a set of numbers needs to be placed onto a grid following certain rules. Another problem is the graph coloring problem in which each node of a graph is colored and neighboring nodes have to have different colors.
This program solves given problems with the help of multiple algorithms. Nevertheless, there are problems that can be modeled as an assignment problem but cannot be solved by the Universal Allocation Program.
The problem which is solved by the program is stored inside an Excel file. The program can be started via the following shell command:
java -jar nap.jar -s <Path to the Excel file >
For further information of the command line options following command can be used:
java -jar nap.jar -h
On some platforms the
java -jar
part can be omitted. An alternative way to start the program is to
double
click at the nap.jar file which will start a very basic web
server. In this
case the standard web browser will open a site to
which the Excel files can
be uploaded. During the calculations the
browser will show the loading page
animation specified by the
browser. When the calculations are done a
download is started. Note
that this method does not require an internet
connection as
everything is done on the local computer.
A pipeline of assignment problems can be used in order to describe more complex problems. Such a pipeline consists of multiple stages which have an absolute order. The set of supplies and the set of demands need to be explicitly stated for first assignment problem of such a pipeline. The subsequent problems only require an explicit enumeration of the supply elements. The respective set of demands is defined by the solution of the previous assignment problem of the pipeline. In the following, the stage number is the position of a problem inside the pipeline. A single assignment problem without connections to other problems is formally a pipeline with one element in this context.
The set of supplies is defined by a table. The supply header line
defines
the attributes of which each supply consists of. Each entry
has a name and
ends with
(int)
if the attribute is an integer. Otherwise the entry ends with
(ref)
. The following rows of the table contain the supply elements and
use the
same columns as the header. Each row describes a unique
element of the
supply set but their content do not have to be unique.
This table is stored
into a sheet called
Angebot_0
. The following example shows a supply definition which consists of
a Group
of 4. Note that the first and second supply have the same
content but still
are two different supplies. This example shows that
a project with the id
K1
needs 4 peoples with different primary and secondary duties.
The demand definition has the same format as the supply definition.
The only
difference is that the sheet's name is
Nachrage_<stage>
.
Where <stage> is the number of the problem's stage. Note that
most
problems probably will only contain one stage. Which means that
most users
won't bother with stages at all as they can be ignored in
such cases. The
following table shows a demand definition suited for
the previous supply
definition. In this case the sheet's name is
Nachfrage_0
because it is the first and only stage of this example. As you can
see the
people which need to be assigned to a Project do not have
perfectly fit
skills.
The preference definition is stored into a sheet named
Präferenzen_<stage>
where <stage> is the stage number of the corresponding
assignment
problem of the pipeline. The header of the preferences
sheet is defined by
the concatenation of the headers' of the supplies
and the demands of
respective stage and a weight attribute called
Gewicht(lb)
. This means that the header of the preference is the header of the
solution
for a given stage plus the
Gewicht(lb)
attribute.
The preferences table contains assignments which the user wishes to
be part
of the solution. The
Gewicht(lb)
attribute of a preference describes how important it is for the user
that
this assignment is located in the solution. It consists of a
positive
integer. The lower the number the more important the
preference is. The
preference of a demand with the highest number
defines how important it is
that this demand will get any of its
preferences. The higher this number is
the more it is likely that any
preference of this demand will be met. In
most cases the
Gewicht(lb)
attribute should be a very low number which guarantees that
constraints are
more important than preferences. The following table
shows a preferences
definition based on the previous tables. This
examples also show that nearly
all persons want to get a role as a
programmer which is not possible.
Preferences are often less
important than constraints because in the other
case the preference
would already be part of a correct solution and
therefore the
corresponding supply and demand could be removed from the
input in
order to get better results or to calculate the solution more
quickly.
The constraint definition of an assignment problem is stored into a
sheet
named
Constraints_<stage>
where <stage> is the stage number of the concerned problem. It
has the
same header as the preference table of the same stage. Each
row describes
one constraint. A constraint defines the following
abstract rule: the
biggest assignment subset which complies with IF
has also to comply with
THEN. To put it in other words: every cluster
which is created by the
condition IF also must comply with the
condition THEN.
Lets assume following solution in order to explain the functioning of constraints by example:
Every entry of a constraint's row which has the format
<value>
is part of IF. Such an entry means that every element in each
cluster, which
is formed by this constraint, has to have the Value
<value> at the
attribute of the same column. The following
constraint creates a cluster
consisting of Charlie and Alpha because
they are the only people who got the
role as a programmer in the
assumed solution. Note that this constraint does
not have any effect
as there is no entry which is part of THEN. Constraint
which selects
every assignment in the solution where the primary duty is
Programming
:
The cluster which is formed by the IF part of the previous constraint:
Every entry of a constraint which consists of
*
is part of IF. Such an entry means that the cluster which is created
by this
constraint is parted into multiple clusters. For each cluster
all elements
have the same value at the column of such an entry. The
following tables
show the clusters created by a constraint which
consists of a
*
at the
Primary Duty(ref)
attribute. Constraint which clusters the solution according to the
Primary Duty
attribute:
First cluster which is formed by the IF part of the previous constraint:
Cluster 2 which is formed by the IF part of the previous constraint:
Last cluster which is formed by the IF part of the previous constraint:
Every other entry which starts with
#
followed by a function call is part of THEN. Possible function calls
are
listed at the chapter constraint functions. These function calls
describe
which conditions has to be met by the clusters which are
created by the IF
part of the same constraint. The following table
shows the constraints which
were used in order to generate the
solution mentioned at the beginning of
this chapter:
The first, second and third constraints say that the guy whose
primary duty
is X should also have many experience in X (i.e. that
the people whose
primary duty is programming should also have many
experience in
programming). Each of these constraints has the same
value for the
Gewicht(lb)
attribute which means that each of them is equally important. The
last two
constraints demand that the people whose secondary duty is
planning or
testing also should have some experience in it. These two
constraints have a
lower value at the
Gewicht(lb)
attribute which means that these two constraints are less important
than the
first three constraints. The
Gewicht(lb)
is often set to very high values in order to ensure that the
constraints are
more important than the preferences and is limited to
the values between 0
and 922337203685477.
The THEN part of each constraint consists of constraint functions which describe the conditions that have to be meet by the clusters formed by the IF part of the same constraint. All available functions are listed below.
Demands that each cluster defined by the IF part of a constraint has exactly <Integer> elements which have the value <Value> at the same column as the appearance of this function.
Demands that each cluster defined by the IF part of a constraint has at least <Integer> elements which have the value <Value> at the same column as the appearance of this function.
Demands that each cluster defined by the IF part of a constraint has at at most <Integer> many elements which have the value <Value> at the same column as the appearance of this function.
Demands that each cluster defined by the IF part of a constraint has only elements containing a value of the enumartion <Values...> at the same column as the appearance of this function. Each element of <Values...> is separated by a comma.
Demands that each cluster defined by the IF part of a constraint has only elements which has no value at the same column as the appearance of this function that is an element of <Values...>. Each element of <Values...> is separated by a comma.
Demands
that each cluster defined by the IF part of a constraint has only
elements
which have a pairwise minimum distance of <Integer>
or greater. The
distance between two elements is defined by the
difference of the two
integer values of these two elements which are
located at the same column
as the appearance of this constraint
function. Therefore this function can
only meaningfully be used at
columns which are of the attribute
(int)
.
Demands
that each cluster defined by the IF part of a constraint has only
elements
which have a pairwise maximum distance of <Integer>
or less. The
distance between two elements is defined by the
difference of the two
integer values of these two elements which are
located at the same column
as the appearance of this constraint
function. Therefore this function can
only meaningfully be used at
columns which are of the attribute
(int)
.
Demands that each cluster defined by the IF part of a constraint has only elements which are consecutive. This means that for each element in one cluster there is exactly one element at the same cluster which is adjacent to this element. Two elements are adjacent if their pairwise distance is exactly 1. The distance between two elements is defined by the difference of the two integer values of these two elements which are located at the same column as the appearance of this constraint function.
In other words this constraint function demands that the elements of
a
cluster form an ordered queue without any gap between two elements
and
therefore this function is only sensibly usable at columns which
are of the
attribute
(int)
.
The
configuration definition is stored in a table named
"config_<stage>"
where <stage> is the stage number of the
affected assignment problem.
This table consists of two the columns
with the header
Key
and
Value
. In the next subsection the possible options are listed.
Value: an integer describing how many restarts are used during problem solving.
Default:
false
Value: the flusher is enabled if set to "true" and in the other case set to "false". See chapter Flusher for information about the flusher.
Default:
false
Value: random restarts are computed in parallel if set to "true" and in the other case set to "false".
Default:
false
Value: maximal number of plateau steps which should be walked by the hill climber before aborting the computation.
Default:
10
Value: a list of column numbers which are separated via semicolon. The column numbering is starting with 1 and after each number there have to be a semicolon. The list describes in which order the columns of the solution are going to be sorted.
Default: Empty String (no sorting).
Value: a string which describes the temperature function which is used for simulated annealing. The following values are valid:
Default: simulated annealing is disabled
ONEDIVX:
f(x) = [log(2) / log(x^1.5)] - 0.05
LOGTWO:
f(x) = [log(2) / log(x^1.5)] - 0.05
LOGTHREE:
f(x) = [log(3) / log(x^1.5)] - 0.05
LOGFOUR:
f(x) = [log(4) / log(x^2)] - 0.05
LOGFIVE:
f(x) = [log(5) / log(x^2)] - 0.05
Value:
repeats hill climbing algorithm with enabled flusher and without
restarted
temperature function as long as there is an improvement
found if set to
true
.
Default:
false
Value: the maximum number of threads which are used during Random Restart.
Default: 3
The following chapters describe the problem solving components of the program in the order they are used in the program. Some of them can be disabled.
The Initializer assigns demands to supplies. At the first stage it assigns demands to supplies based on preferences only. It iterates multiple times over all not yet assigned demands. At the first iteration it tries to assign to each demand the most preferred supply if there is one available. At the second iteration it tries to assign to each demand the second most preferred supply if there is one available and so on. The first stage ends if all demands are assigned or if there are no preferred supplies left for not yet assigned demands. At the second stage supplies are assigned to demands based on proposals which are created via constraints. If there are multiple proposed supplies for one demand then the supply is assigned to the demand which complies best to the constraints. At the third stage the initializer assigns any supplies left to each free demand randomly.
The hill climber tries to improve the current solution by swapping. There are 3 types of swaps. The first attribute, which is used most of the time in general, selects two assigned demands and swaps their respective supplies. The next attribute replaces the supply of an assigned demand with a supply which is not assigned to any demand. The last attribute replaces the demand of an assigned supply with a demand that is not assigned to any supply. The hill climber works in two phases. At the first phase at each hill climbing step swaps are proposed by the constraints. The hill climber checks which proposed swap improves the solution the most. At the end of the step the best proposed swap is executed. If no proposed swap improves the solution a plateau step is done. This step changes the current assignment of demands to supplies by one step without downgrading the current assignment. The first phase runs as long it finds swaps which improve the current assignment, as long not too many consecutively plateau steps are done or if no swap can be found which does not downgrade the solution. At the second phase the hill climber uses proposed swaps which are generated by preferences but aside from that works like in the first phase.
If the flusher is enabled it removes some assignments of the solution created by the hill climber based on proposals of the constraints. This is only done if the removal of each assignment improves the compliance of the current solution with the given preferences and constraints. After that the initializer and hill climber are used again. This is repeated as long the flusher removes assignments from the solution provided by the hill climber and as long each iteration improves the solution.
If simulated annealing is activated there is a probability that a random swap is done during an hill climbing step. A downgrade by this random swap is accepted. This probability is defined by the temperature function and gets lower as the time progresses during problem solving.
If random restart is activated the complete calculation of the solution is repeated multiple times. The solution with the best rating is returned as the result. These calculations can be executed in parallel. If one of these solutions is complying with all constraints and preferences than every other remaining calculations are aborted.
If the program is used via console the output of the solution is stored in an excel table and the path of this file is printed to the console. If the program is used via web browser a download of the excel table is started.
There are multiple examples at the folder
testData
of the provided program of this program. At the end of each entry
the
corresponding command is listed in order to execute the example
from the
console.