Explanation of Pandas Drop Non-Numeric Rows
In Pandas, the dropna()
function is used to drop rows or columns with missing values (NaN). However, if you specifically want to drop rows that contain non-numeric values, you can achieve it with a combination of the applymap()
and dropna()
functions.
Example
Let’s consider a DataFrame called data
which contains both numeric and non-numeric values:
import pandas as pd data = pd.DataFrame({'A': [1, 2, 3, 'four', 5], 'B': [6.1, 'seven', 8.2, 9.3, 10.4], 'C': [11, 12, 13, 14, 'fifteen']})
By default, all columns in a DataFrame are represented as the most general data type that can accommodate all the values. So, in this case, even though ‘A’, ‘B’, and ‘C’ contain both numbers and strings, the columns are of the object data type.
If we want to drop the rows with non-numeric values, we can use the applymap()
function to check if each element of the DataFrame is numeric or not. In the following example, we define a lambda function to apply the pd.to_numeric()
function to each element, which converts the non-numeric values to NaN:
numeric_data = data.applymap(lambda x: pd.to_numeric(x, errors='coerce'))
The resulting numeric_data
DataFrame will have NaN values in place of non-numeric elements:
Out[1]: A B C 0 1.0 6.1 11.0 1 2.0 NaN 12.0 2 3.0 8.2 13.0 3 NaN 9.3 14.0 4 5.0 10.4 NaN
Finally, you can use the dropna()
function to drop the rows with NaN values:
numeric_data = numeric_data.dropna()
The resulting numeric_data
DataFrame will only contain the rows with numeric values:
Out[2]: A B C 0 1.0 6.1 11.0 2 3.0 8.2 13.0