Home » Php » algorithm – Questions on the possible fastest relation graph alghorithm in PHP

algorithm – Questions on the possible fastest relation graph alghorithm in PHP

Posted by: admin February 25, 2020 Leave a comment

Questions:

Good morning, so here is my problem. And sorry in advance if my english isn’t perfect.
Anyway, I have a huge amount of relation between numbers formated as so :

$relation1 = [1, 2];
$relation2 = [2, 3];
$relation3 = [4, 5];
$relation4 = [6, 7];
$relation5 = [7, 2];

each array represente a relation between the 2 numbers.
What I need to do is merge the relations to get a result such as :

$result1 = [1, 2, 3, 6, 7];
$result1 = [4, 5];

I already develloped a algorithm in PHP that do exactly that, the problem is that for a total of about more than 50000 relations, the program takes a long time to execute. When I’m done creating direct relations, which don’t require a lot of time :

$temporary_result1 = [1, 2, 3];
$temporary_result2 = [4, 5];
$temporary_result3 = [6, 7, 2];

I then need to merge group 1 and 3, but with around 20 000 groups previously created, the process takes a really long time, because i need to iterate 20 000 * 20 000 times (400 000 000) that takes around 30 minutes already. As the number of relation is expected to grow overtime, I’m worried the time to execute will grow exponentially since i need to execute the program every days.

I was just wondering if there was a existing algorithm that does this, while being more efficient, in any programming language that I could try and recode in PHP.

How to&Answers:

You can look at these relations as Composer packages. Every package has it’s own dependency.

Like for example,

$relation1 = [1, 2];
$relation2 = [2, 3];
  • In the above example, you can say that package 1 is dependent on package 2 and package 2 is dependent on package 3.

  • Likewise, we have to find all connected components. The relation between 2 packages need not necessarily be reflexive, meaning package A is dependent on package B doesn’t necessarily mean package B is dependent on package A.

  • But for the scope of this problem, we can make the relation reflexive to get the connected components via depth first search. If you want, you can adapt to a non-reflexive idea using union find by path compression to merge values with singular parents, but in my answer, I would stick with DFS.

Your Input:

$relation1 = [1, 2];
$relation2 = [2, 3];
$relation3 = [4, 5];
$relation4 = [6, 7];
$relation5 = [7, 2];

For the above input, the adjacency list would look like:

1 => 2
2 => 1,3,7
3 => 2
4 => 5
5 => 4
6 => 7
7 => 6,2

Snippet:

function getConnectedComponents($relations,$total_nodes){
    $result = [];
    $nodes = [];
    $visited = [];

    for($i=1;$i<=$total_nodes;++$i){
        $nodes[$i] = [];
        $visited[$i] = false;
    }

    foreach($relations as $relation){
        $nodes[$relation[0]][] = $relation[1];
        $nodes[$relation[1]][] = $relation[0];
    }

    for($i=1;$i<=$total_nodes;++$i){
        if(!$visited[$i]){
            $temp = [];
            dfs($nodes,$i,$visited,$temp);
            $result[] = $temp;
        }
    }

    return $result;
}

function dfs($nodes,$node,&$visited,&$temp){
    if($visited[$node]) return;
    $visited[$node] = true;
    $temp[] = $node;
    foreach($nodes[$node] as $child_node){
        dfs($nodes,$child_node,$visited,$temp);
    }
}

Demo: https://3v4l.org/JWAF6

In the above algorithm, we build the adjacency list and do a depth first search on each node. We mark nodes as visited along the way to make sure that we aren’t visiting something we already processed.

Answer:

You can look at this problem as a general graphs problem.
Your relations are edges, and there are E of them.
Your integers are vertices, there are V of them.

Now you need to find the components of the graph, this is a well known problem and there are many solutions to it.

One solution is described here, you start from a random node and do a DFS until you find all the nodes in your component. Than moving on to another unvisited node until finding all of the nodes in the graph.
It presents a code in many languages there but not PHP, here is a php implementation of DFS which you can use.

The time complexity of this solution is O(E + V) if implemented correctly so it will solve the runtime problem.

Best of luck