Pandas drop non numeric rows

Explanation of Pandas Drop Non-Numeric Rows

In Pandas, the dropna() function is used to drop rows or columns with missing values (NaN). However, if you specifically want to drop rows that contain non-numeric values, you can achieve it with a combination of the applymap() and dropna() functions.

Example

Let’s consider a DataFrame called data which contains both numeric and non-numeric values:

   import pandas as pd
   
   data = pd.DataFrame({'A': [1, 2, 3, 'four', 5],
                        'B': [6.1, 'seven', 8.2, 9.3, 10.4],
                        'C': [11, 12, 13, 14, 'fifteen']})
  

By default, all columns in a DataFrame are represented as the most general data type that can accommodate all the values. So, in this case, even though ‘A’, ‘B’, and ‘C’ contain both numbers and strings, the columns are of the object data type.

If we want to drop the rows with non-numeric values, we can use the applymap() function to check if each element of the DataFrame is numeric or not. In the following example, we define a lambda function to apply the pd.to_numeric() function to each element, which converts the non-numeric values to NaN:

   numeric_data = data.applymap(lambda x: pd.to_numeric(x, errors='coerce'))
  

The resulting numeric_data DataFrame will have NaN values in place of non-numeric elements:

   Out[1]:
        A     B     C
   0   1.0   6.1  11.0
   1   2.0   NaN  12.0
   2   3.0   8.2  13.0
   3   NaN   9.3  14.0
   4   5.0  10.4   NaN
  

Finally, you can use the dropna() function to drop the rows with NaN values:

   numeric_data = numeric_data.dropna()
  

The resulting numeric_data DataFrame will only contain the rows with numeric values:

   Out[2]:
       A    B     C
   0  1.0  6.1  11.0
   2  3.0  8.2  13.0
  

Leave a comment