Home » excel » python – Drop rows in dataframe that have the same value in many columns

python – Drop rows in dataframe that have the same value in many columns

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have a table as below and the column name change by time. I just want to keep these rows in which there is a difference between any ww compare to ww12. In the table below I want to keep row 3 and row 7 and delete other rows.
In row 3 ww17 # ww12
In row 7 ww16 # ww12
Please help me, thank in advance.

    Type WW12       WW13        WW14        WW15        WW16        WW17        WW18        WW19        WW20
0   AA  1.999857143 1.999857143 1.999857143 1.999857143 1.999857143 1.999857143 1.999857143 1.999857143 1.999857143
1   AA  24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000
2   AA  1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143
3   BB  1.457285714 1.457285714 1.457285714 1.457285714 1.457285714 1.863928571 1.863928571 1.863928571 1.863928571
4   BB  24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000
5   BB  24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000
6   BB  1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143 1424.593143
7   BB  1.863928571 1.863928571 1.863928571 1.863928571 2.878857143 2.878857143 2.878857143 2.878857143 2.878857143
8   BB  24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000    24000000
How to&Answers:

Use boolean indexing:

#create index by column Type
df1 = df.set_index('Type')
#compare column WW12 for not equal and get at least one True per rows
df2 = df[df1.ne(df1['WW12'], 0).any(1).values]
#if want compare by second column (first is Index here)
#df2 = df[df1.ne(df1.iloc[:, 0], 0).any(1).values]

If want compare second column only:

df2 = df[df.iloc[:, 1:].ne(df.iloc[:, 1], axis=0).any(axis=1)]
print (df2)
  Type      WW12      WW13      WW14      WW15      WW16      WW17      WW18  \
3   BB  1.457286  1.457286  1.457286  1.457286  1.457286  1.863929  1.863929   
7   BB  1.863929  1.863929  1.863929  1.863929  2.878857  2.878857  2.878857   

       WW19      WW20  
3  1.863929  1.863929  
7  2.878857  2.878857  

Explanation:

Select second column by position:

print (df.iloc[:, 1])
0    1.999857e+00
1    2.400000e+07
2    1.424593e+03
3    1.457286e+00
4    2.400000e+07
5    2.400000e+07
6    1.424593e+03
7    1.863929e+00
8    2.400000e+07
Name: WW12, dtype: float64

Remove first column by positions and compare it by second column:

print (df.iloc[:, 1:].ne(df.iloc[:, 1], axis=0))

    WW12   WW13   WW14   WW15   WW16   WW17   WW18   WW19   WW20
0  False  False  False  False  False  False  False  False  False
1  False  False  False  False  False  False  False  False  False
2  False  False  False  False  False  False  False  False  False
3  False  False  False  False  False   True   True   True   True
4  False  False  False  False  False  False  False  False  False
5  False  False  False  False  False  False  False  False  False
6  False  False  False  False  False  False  False  False  False
7  False  False  False  False   True   True   True   True   True
8  False  False  False  False  False  False  False  False  False

Compare by any for at least one True per row:

print (df.iloc[:, 1:].ne(df.iloc[:, 1], axis=0).any(axis=1))
0    False
1    False
2    False
3     True
4    False
5    False
6    False
7     True
8    False
dtype: bool