Home » excel » How to get intersection of rows from two tables by using Matlab or Excel?

How to get intersection of rows from two tables by using Matlab or Excel?

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have been googling but so far results found not fully match to my need. If someone has posted answer for the same question, can someone let me know ? Thank you very much.

I have two tables from two excel files that one contains only numerical values and the other contains both txt and numerical values. For example file1.xlsx has only a column vector of studentid which are integer number such as,

1  
3    
5     
11     
20 ....    

And the file2.xlsx contains a worksheet as:

 studentID   Name     score    email        
 1           Cassie   60       [email protected]     
 2           John     100      [email protected]      
 3           Marry    80       ...    
 4           Bob      70       ...     
 5           Kevin    99       ....     

I would like to find rows of students whose studentID are in the file1.xlsx so the final resulted table should be looked like:

studentID  Name    score  email       
1          Cassie  60      [email protected]     
3          Marry   80      [email protected]    
5          Kevin   99     [email protected]    
...    

I do not have script languages such as SQL or Perl. I know in matlab, there is a interset function, however, it only work with numerical matrix. For my file2.xlsx, it contains both numerical and txt values. I do not want to separate them. And I do not know how to read a excel file into a cellarray either since the table has both numerical and txt parts. If there is anyone who knows how to solve this problem either by using Matlab or Excel is fine with me, I just need to get the final resulted table.

Thank you so very much,

How to&Answers:

In Excel you create a new column in file2.xlsx with a formula:

=MATCH(A2,[\path\file1.xlsx]Sheet1!$A$2:$A$20,0)

where $A$2:$A$20 is a range with student IDs in file1.xlsx. Last 0 means exact match. Distribute this formula to all the rows. The function returns #N/A in rows if ID is not in file1. You can filter those rows out using this column.

For easier filtering you can add ISERROR function:

=NOT(ISERROR(MATCH(A2,[\path\file1.xlsx]Sheet1!$A$2:$A$20,0)))

Then the TRUE will correspond to existing IDs, and FALSE to missing IDs.

If you have additional column in file1.xlsx and want to combine it with data in file2 you can use INDEX function:

=INDEX([\path\file1.xlsx]Sheet1!B$2:B$20,MATCH($A2,[\path\file1.xlsx]Sheet1!$A$2:$A$20,0),1)

Note that MATCH/INDEX combination does not require the data in search column to be sorted, as in case of VLOOKUP.


In MATLAB you can use ISMEMBER function.

Let’s say you have imported both files to MATLAB and you have cell arrays file1 and file2. Then you can do:

[~,~,file1] = xlsread('file1.xlsx');
[~,~,file2] = xlsread('file2.xlsx');
[idx2, idx1] = ismember(file2(:,1), file1(:,1));
filtered2 = file2(idx2,:);

To find the rows in file1.xlsx existing in file2.xlsx and sort them in the same order:

idx1(idx1==0) = [];
filtered1 = file1(idx1,:);

If you want to keep the order of rows in file1.xlsx just switch file2 and file1 in the ismember statement.

Answer:

You can try Excels VLOOKUP (english) / SVERWEIS (german) function.
Considering the following layout of your Excel documents:

1.xlsx
 |  A|  B|  C|  D|
1|  1|   |   |   |
2|  3|   |   |   |
3|  5|   |   |   |
4| 11|   |   |   |
5| 20|   |   |   |


2.xlsx
 |  A|  B|   C|      D|
1|  1|  C|  60| [email protected]|
2|  2|  J| 100| [email protected]|
3|  3|  M|  80| [email protected]|
4|  4|  B|  70|       |
5|  5|  K|  99|       |

to populate 1.xlsx with students’ details from 2.xlsx assign following formulas to 1.xlsx:

B1: =VLOOKUP($A1;[<PATH_TO_2.xlsx>]<NAME_OF_SHEET_IN_2.xlsx>!$A$1:$D$5;2;TRUE)
B2: =VLOOKUP($A2;[<PATH_TO_2.xlsx>]<NAME_OF_SHEET_IN_2.xlsx>!$A$1:$D$5;2;TRUE)
...
C1: =VLOOKUP($A1;[<PATH_TO_2.xlsx>]<NAME_OF_SHEET_IN_2.xlsx>!$A$1:$D$5;3;TRUE)
C2: =VLOOKUP($A2;[<PATH_TO_2.xlsx>]<NAME_OF_SHEET_IN_2.xlsx>!$A$1:$D$5;3;TRUE)
...
D1: =VLOOKUP($A1;[<PATH_TO_2.xlsx>]<NAME_OF_SHEET_IN_2.xlsx>!$A$1:$D$5;4;TRUE)
D2: =VLOOKUP($A2;[<PATH_TO_2.xlsx>]<NAME_OF_SHEET_IN_2.xlsx>!$A$1:$D$5;4;TRUE)
...

VLOOKUP searches for occurrences of first parameter ($A1, ..) in sheet specified by second parameter and copies n-th value (3rd param) in matrix specified in 2nd param. 4th boolean param specifies whether exact match is required.

you need to replace ‘< PATH_TO_2.xlsx >’ and ‘< NAME_OF_SHEET_IN_2.xlsx >’ with correct values, ‘[]’ and ‘!’ must stay in place

further explanation: http://www.techonthenet.com/excel/formulas/vlookup.php

good luck 🙂