Pd.to_numeric unable to parse string

The pd.to_numeric() function in Python is used to convert values in a pandas Series or DataFrame to numeric data types. However, if the function encounters a value that it is unable to parse as a number, it will raise an error.

To better understand this, let’s consider the following examples:

Example 1:

Suppose we have a pandas Series with both numeric and non-numeric values:

“`python
import pandas as pd

data = pd.Series([‘1’, ‘2.5’, ‘3a’, ‘4.2’])
print(data)
“`

The output will be:

“`
0 1
1 2.5
2 3a
3 4.2
dtype: object
“`

If we try to convert this Series to numeric using pd.to_numeric():

“`python
numeric_data = pd.to_numeric(data)
print(numeric_data)
“`

The output will be:

“`
0 1.0
1 2.5
2 NaN
3 4.2
dtype: float64
“`

As we can see, the function was able to convert the strings ‘1’ and ‘2.5’ to numbers, but it encountered the value ‘3a’ which it couldn’t parse. For such cases, pd.to_numeric() assigns a NaN (Not a Number) value.

Example 2:

Let’s consider another example with a DataFrame:

“`python
data = pd.DataFrame({‘A’: [‘1’, ‘2’, ‘3’],
‘B’: [‘4’, ‘5’, ‘6a’]})
print(data)
“`

The output will be:

“`
A B
0 1 4
1 2 5
2 3 6a
“`

If we try to convert this DataFrame to numeric using pd.to_numeric():

“`python
numeric_data = data.apply(pd.to_numeric, errors=’coerce’)
print(numeric_data)
“`

The output will be:

“`
A B
0 1 4.0
1 2 5.0
2 3 NaN
“`

In this example, we used the apply() function along with pd.to_numeric() to apply the conversion to each column of the DataFrame. The errors='coerce' parameter is used to replace any non-numeric value with NaN.

By default, pd.to_numeric() raises a ValueError if it encounters a non-numeric value. However, by setting errors='coerce', it will instead replace such values with NaN.

This allows us to handle the conversion gracefully without encountering errors that might disrupt our workflow.

Leave a comment