The .csv file I upload here is an example file for me to explain my problem.
This file contain all the air quality information for all cities in China(represent in Code) in at an specific day.
For example, the column 1001A represent one city and the value in this column represent the air pollutant concentration corresponding to the
1. My problem
If I want to get the
AQI value for the city of
1014A in 20160205-00:00,
I just need to use
df = pd.read_csv("./this file") aqi = df["1014A"].iloc
The result is 42. But look the same file in LibraOffice, the result shows like this:
It seems like Pandas read the
1013A and make the mistake.
So, I want to figure out what happened in column
The pandas read this column(which has finite value inside) as the NaN value column. And it happened so many times in this file. It bother me in the aspects of followed:
Some columns which has its data are taken as NaN columns in pandas.Dataframe
The other columns also will be influenced by the Error-NaN columns indirectly.
The column location would be full of mistake if this problem hasn’t been solved.
Any advice would be appreciate!
Your csv has two commas in that position:
this gets read as NaN by pandas.
It looks like in your version of LibreOffice it’s skipped and uses the subsequent value (incorrectly).
In : s = open("china_sites_20160205.csv").readlines() In : s.split(",")[13:18] Out: ['1011A', '1012A', '1013A', '1014A', '1015A'] In : s.split(",")[13:18] Out: ['24', '29', '', '42', '39']