Good morning, so here is my problem. And sorry in advance if my english isn’t perfect.
Anyway, I have a huge amount of relation between numbers formated as so :
$relation1 = [1, 2];
$relation2 = [2, 3];
$relation3 = [4, 5];
$relation4 = [6, 7];
$relation5 = [7, 2];
each array represente a relation between the 2 numbers.
What I need to do is merge the relations to get a result such as :
$result1 = [1, 2, 3, 6, 7];
$result1 = [4, 5];
I already develloped a algorithm in PHP that do exactly that, the problem is that for a total of about more than 50000 relations, the program takes a long time to execute. When I’m done creating direct relations, which don’t require a lot of time :
$temporary_result1 = [1, 2, 3];
$temporary_result2 = [4, 5];
$temporary_result3 = [6, 7, 2];
I then need to merge group 1 and 3, but with around 20 000 groups previously created, the process takes a really long time, because i need to iterate 20 000 * 20 000 times (400 000 000) that takes around 30 minutes already. As the number of relation is expected to grow overtime, I’m worried the time to execute will grow exponentially since i need to execute the program every days.
I was just wondering if there was a existing algorithm that does this, while being more efficient, in any programming language that I could try and recode in PHP.
You can look at these relations as Composer packages. Every package has it’s own dependency.
Like for example,
$relation1 = [1, 2];
$relation2 = [2, 3];

In the above example, you can say that package
1
is dependent on package2
and package2
is dependent on package3
. 
Likewise, we have to find all connected components. The relation between 2 packages need not necessarily be reflexive, meaning package
A
is dependent on packageB
doesn’t necessarily mean packageB
is dependent on packageA
. 
But for the scope of this problem, we can make the relation reflexive to get the connected components via
depth first search
. If you want, you can adapt to a nonreflexive idea usingunion find by path compression
to merge values with singular parents, but in my answer, I would stick with DFS.
Your Input:
$relation1 = [1, 2];
$relation2 = [2, 3];
$relation3 = [4, 5];
$relation4 = [6, 7];
$relation5 = [7, 2];
For the above input, the adjacency list would look like:
1 => 2
2 => 1,3,7
3 => 2
4 => 5
5 => 4
6 => 7
7 => 6,2
Snippet:
function getConnectedComponents($relations,$total_nodes){
$result = [];
$nodes = [];
$visited = [];
for($i=1;$i<=$total_nodes;++$i){
$nodes[$i] = [];
$visited[$i] = false;
}
foreach($relations as $relation){
$nodes[$relation[0]][] = $relation[1];
$nodes[$relation[1]][] = $relation[0];
}
for($i=1;$i<=$total_nodes;++$i){
if(!$visited[$i]){
$temp = [];
dfs($nodes,$i,$visited,$temp);
$result[] = $temp;
}
}
return $result;
}
function dfs($nodes,$node,&$visited,&$temp){
if($visited[$node]) return;
$visited[$node] = true;
$temp[] = $node;
foreach($nodes[$node] as $child_node){
dfs($nodes,$child_node,$visited,$temp);
}
}
Demo: https://3v4l.org/JWAF6
In the above algorithm, we build the adjacency list and do a depth first search on each node. We mark nodes as visited along the way to make sure that we aren’t visiting something we already processed.
Answer：
You can look at this problem as a general graphs problem.
Your relations are edges, and there are E
of them.
Your integers are vertices, there are V
of them.
Now you need to find the components of the graph, this is a well known problem and there are many solutions to it.
One solution is described here, you start from a random node and do a DFS
until you find all the nodes in your component. Than moving on to another unvisited node until finding all of the nodes in the graph.
It presents a code in many languages there but not PHP
, here is a php implementation of DFS which you can use.
The time complexity of this solution is O(E + V)
if implemented correctly so it will solve the runtime problem.
Best of luck