Home » excel » excel – Merge two queries without duplicating rows in Power Query

excel – Merge two queries without duplicating rows in Power Query

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m trying to merge two queries using one column (which contains duplicate values, although the exact same amount of duplicates for the same values in both queries) as key; but whenever I expand the merged table, all rows get twice the duplicates.

I am not an expert in Power Query and I’ve tried several things, including different tipes of join (left, full, right, etc.) and the result is always the same.

The setup is like this:

Query Table 1:

|   Name  |Extension|Folder Path  |Product Name|Destination Path|
|File1.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ |
|File2.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ |
|File4.jpg|   .jpg  |c:/someroute/|  ProductB  |  c:/otherpath/ |
|File3.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ |
|File8.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ |
|File9.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ |

Query Table 2:

|productid|productSKU|Product Name|                    ImageIDs                  |
|   0001  | sku0001  |  ProductA  |          productA-1.jpg, productA-2.jpg      |
|   0002  | sku0002  |  ProductB  |                  productB-1.jpg              |
|   0003  | sku0003  |  ProductC  |productC-1.jpg, productc-2.jpg, productc-3.jpg|

Desired output (I’m building a dynamic renaming function, to rename all files with the names of ImageIDs:

|   Name  |Extension|Folder Path  |Product Name|Destination Path|    ImageID    |
|File1.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ | productA-1.jpg|
|File2.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ | productA-2.jpg|
|File4.jpg|   .jpg  |c:/someroute/|  ProductB  |  c:/otherpath/ | productB-1.jpg|
|File3.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-1.jpg|
|File8.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-2.jpg|
|File9.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-3.jpg|

What I’ve tried unsucsessfully in short is:

  • Split the ImageIDs column into rows on Query 2 -> Then Merge Queries using Product Name as Key -> Expand the ImageIDs column in the merged query.
  • Create a third query with unique Product Name values -> Merge the other two queries to the newly created -> Expand the desired columns as needed.
  • Merge the two queries first using Product Name as key -> Then expand the ImageIDs column in the merged query -> Split the ImageIDs column into rows.
  • Group the table in Query 1 by Product Name -> Merge the two queries using Product Name as key (which now has only unique values on each query) -> expand the name table in the merged query.

I’m guessing I should use some kind of index column that counts the duplication of values in Product Name but I don’t really know that, nor I understand why is it not working, but any of the options above output something like this:

|   Name  |Extension|Folder Path  |Product Name|Destination Path|    ImageID    |
|File1.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ | productA-1.jpg|
|File1.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ | productA-2.jpg|
|File2.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ | productA-1.jpg|
|File2.jpg|   .jpg  |c:/someroute/|  ProductA  |  c:/otherpath/ | productA-2.jpg|
|File4.jpg|   .jpg  |c:/someroute/|  ProductB  |  c:/otherpath/ | productB-1.jpg|
|File3.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-1.jpg|
|File3.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-2.jpg|
|File3.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-3.jpg|
|File8.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-1.jpg|
|File8.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-2.jpg|
|File8.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-3.jpg|
|File9.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-1.jpg|
|File9.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-2.jpg|
|File9.jpg|   .jpg  |c:/someroute/|  ProductC  |  c:/otherpath/ | productC-3.jpg|

As I mentioned above, I’m trying to create a renaming function, so files with random names, get the name of the product they are representing.

How to&Answers:

Index columns should sort this out assuming Table 2 has the same number of rows as Table 1 after splitting ImageIDs to new rows.

  1. Expand Table 2 to new rows.
  2. Sort the expanded Table 2 by Product Name and ImageId.
  3. Add an index column to Table 2.
  4. Sort Table 1 by Product Name and Name.
  5. Add an index column to Table 1.
  6. Merge Table 2 onto Table 1 using the index columns to match on.
  7. Expand the ImageId column from Table 2.