Home » Php » php – How can I merge two redundant records in a MySQL table, maintaining all PK/FK relationships?

php – How can I merge two redundant records in a MySQL table, maintaining all PK/FK relationships?

Posted by: admin July 12, 2020 Leave a comment

Questions:

Say I have a table customers with the following fields and records:

id   first_name   last_name   email                  phone
------------------------------------------------------------------------
1    Michael      Turley      [email protected]   555-123-4567
2    John         Dohe        [email protected]      
3    Jack         Smith       [email protected]    555-555-5555
4    Johnathan    Doe                                123-456-7890

There are several other tables, such as orders, rewards, receipts which have foreign keys customer_id relating to this table’s customers.id.

As you can see, in their infinite wisdom, my users have created duplicate records for John Doe, complete with inconsistent spelling and missing data. An administrator notices this, selects customers 2 and 4, and clicks “Merge”. They are then prompted to select which value is correct for each field, etc etc and my PHP determines that the merged record should look like this:

id   first_name   last_name   email                  phone
------------------------------------------------------------------------
?    John         Doe         [email protected]      123-456-7890

Let’s assume Mr. Doe has placed several orders, earned rewards, generated receipts.. but some of these have been associated with id 2, and some have been associated with id 4. The merged row needs to match all of the foreign keys in other tables that matched the original rows.

Here’s where I’m not sure what to do. My instinct is to do this:

DELETE FROM customers WHERE id = 4;

UPDATE customers
SET first_name = 'John',
    last_name  = 'Doe',
    email      = '[email protected]',
    phone      = '123-456-7890'
WHERE id = 2;

UPDATE orders, rewards, receipts
SET customer_id = 2
WHERE customer_id = 4;

I think that would work, but if later on I add another table that has a customer_id foreign key, I have to remember to go back and add that table to the second UPDATE query in my merge function, or risk loss of integrity.

There has to be a better way to do this.

How to&Answers:

As an update to my comment:

use information_schema;
select table_name from columns where column_name = 'customer_id';

Then loop through the resulting tables and update accordingly.

Personally, I would use your instinctive solution, as this one may be dangerous if there are tables containing customer_id columns that need to be exempt.

Answer:

I got here form google this is my 2 cents:

SELECT `TABLE_NAME` 
FROM `information_schema`.`KEY_COLUMN_USAGE` 
WHERE REFERENCED_TABLE_SCHEMA='DATABASE'
  AND REFERENCED_TABLE_NAME='customers'
  AND REFERENCED_COLUMN_NAME='customer_id'

add the db for insurance (you’ll never know when somebody copies the db).

Instead of looking for a column name, here we look at the foreign keys themselves

If you change the on delete restrictions to restrict nothing can be deleted before the children are deleted/migrated

Answer:

The short answer is, no there isn’t a better way (that I can think of).

It’s a trade off. If you find there are a lot of these instances, it might be worthwhile to invest some time writing a more robust algorithm for checking existing customers prior to adding a new one (i.e. checking variations on first / last names, presenting them to whoever is adding the customer, asking them 2 or 3 times if they are REALLY sure they want to add this new customer, etc.). If there are not a lot of these instances, it might not be worth investing that time.

Short of that, your approach is the only way I can think of. I would actually delete both records, and create a new one with the merged data, resulting in a new customer id rather than re-using an old one, but that’s just personal preference – functionally it’s the same as your approach. You still have to remember to go back and modify your merge function to reflect new relationships on the customer.id field.

Answer:

At a minimum, to prevent any triggers on deletions causing some cascading effect, I would FIRST do

update SomeTable set CustomerID = CorrectValue where CustomerID = WrongValue

(do that across all tables)…

THEN
Delete from Customers where CustomerID = WrongValue

As for duplicate data… Try to figure out which “Will Smith, Bill Smith, William Smith” if you are lacking certain information… Some could be completely legitimate different people.