Pandas support¶

Warning: pandas support is currently experimental, don’t expect everything to work.

It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.

Installation¶

Pandas support is provided by the pint-pandas package. To install it use either:

python -m pip install pint-pandas

Or:

conda install -c conda-forge pint-pandas

Basic example¶

This example will show the simplist way to use pandas with pint and the underlying objects. It’s slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.

First some imports (you don’t need to import pint_pandas for this to work)

[1]:

import pandas as pd
import pint

Next, we create a DataFrame with PintArrays as columns.

[2]:

df = pd.DataFrame({
    "torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
df

[2]:

	torque	angular_velocity
0	1	1
1	2	2
2	2	2
3	3	3

Operations with columns are units aware so behave as we would intuitively expect.

[3]:

df['power'] = df['torque'] * df['angular_velocity']
df

[3]:

	torque	angular_velocity	power
0	1	1	1
1	2	2	4
2	2	2	4
3	3	3	9

We can see the columns’ units in the dtypes attribute

[4]:

df.dtypes

[4]:

torque                                       pint[foot * force_pound]
angular_velocity                         pint[revolutions_per_minute]
power               pint[foot * force_pound * revolutions_per_minute]
dtype: object

Each column can be accessed as a Pandas Series

[5]:

df.power

[5]:

0    1
1    4
2    4
3    9
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute]

Which contains a PintArray

[6]:

df.power.values

[6]:

<PintArray>
[1, 4, 4, 9]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute]

The PintArray contains a Quantity

[7]:

df.power.values.quantity

[7]:

Magnitude	[1 4 4 9]
Units	foot force_pound revolutions_per_minute

Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.

[8]:

df.power.pint.units

[8]:

foot force_pound revolutions_per_minute

[9]:

df.power.pint.to("kW").values

[9]:

<PintArray>
[0.00014198092353610379,  0.0005679236941444151,  0.0005679236941444151,
   0.001277828311824934]
Length: 4, dtype: pint[kilowatt]

Reading from csv¶

Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays.

[10]:

import pandas as pd
import pint
import pint_pandas
import io

Here’s the contents of the csv file.

[11]:

test_data = '''speed,mech power,torque,rail pressure,fuel flow rate,fluid power
rpm,kW,N m,bar,l/min,kW
1000.0,,10.0,1000.0,10.0,
1100.0,,10.0,100000000.0,10.0,
1200.0,,10.0,1000.0,10.0,
1200.0,,10.0,1000.0,10.0,'''

Let’s read that into a DataFrame. Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.

[12]:

df = pd.read_csv(io.StringIO(test_data), header=[0, 1])
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
df

[12]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
	rpm	kW	N m	bar	l/min	kW
0	1000.0	NaN	10.0	1000.0	10.0	NaN
1	1100.0	NaN	10.0	100000000.0	10.0	NaN
2	1200.0	NaN	10.0	1000.0	10.0	NaN
3	1200.0	NaN	10.0	1000.0	10.0	NaN

Then use the DataFrame’s pint accessor’s quantify method to convert the columns from np.ndarrays to PintArrays, with units from the bottom column level.

[13]:

df.dtypes

[13]:

speed           rpm      float64
mech power      kW       float64
torque          N m      float64
rail pressure   bar      float64
fuel flow rate  l/min    float64
fluid power     kW       float64
dtype: object

[14]:

df_ = df.pint.quantify(level=-1)
df_

[14]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
0	1000.0	nan	10.0	1000.0	10.0	nan
1	1100.0	nan	10.0	100000000.0	10.0	nan
2	1200.0	nan	10.0	1000.0	10.0	nan
3	1200.0	nan	10.0	1000.0	10.0	nan

As previously, operations between DataFrame columns are unit aware

[15]:

df_.speed * df_.torque

[15]:

0    10000.0
1    11000.0
2    12000.0
3    12000.0
dtype: pint[meter * newton * revolutions_per_minute]

[16]:

df_

[16]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
0	1000.0	nan	10.0	1000.0	10.0	nan
1	1100.0	nan	10.0	100000000.0	10.0	nan
2	1200.0	nan	10.0	1000.0	10.0	nan
3	1200.0	nan	10.0	1000.0	10.0	nan

[17]:

df_['mech power'] = df_.speed * df_.torque
df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
df_

[17]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
0	1000.0	10000.0	10.0	1000.0	10.0	10000.0
1	1100.0	11000.0	10.0	100000000.0	10.0	1000000000.0
2	1200.0	12000.0	10.0	1000.0	10.0	10000.0
3	1200.0	12000.0	10.0	1000.0	10.0	10000.0

The DataFrame’s pint.dequantify method then allows us to retrieve the units information as a header row once again.

[18]:

df_.pint.dequantify()

[18]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
unit	revolutions_per_minute	meter * newton * revolutions_per_minute	meter * newton	bar	liter / minute	bar * liter / minute
0	1000.0	10000.0	10.0	1000.0	10.0	1.000000e+04
1	1100.0	11000.0	10.0	100000000.0	10.0	1.000000e+09
2	1200.0	12000.0	10.0	1000.0	10.0	1.000000e+04
3	1200.0	12000.0	10.0	1000.0	10.0	1.000000e+04

This allows for some rather powerful abilities. For example, to change single column units

[19]:

df_['fluid power'] = df_['fluid power'].pint.to("kW")
df_['mech power'] = df_['mech power'].pint.to("kW")
df_.pint.dequantify()

[19]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
unit	revolutions_per_minute	kilowatt	meter * newton	bar	liter / minute	kilowatt
0	1000.0	1.047198	10.0	1000.0	10.0	1.666667e+01
1	1100.0	1.151917	10.0	100000000.0	10.0	1.666667e+06
2	1200.0	1.256637	10.0	1000.0	10.0	1.666667e+01
3	1200.0	1.256637	10.0	1000.0	10.0	1.666667e+01

The units are harder to read than they need be, so lets change pints default format for displaying units.

[20]:

pint_pandas.PintType.ureg.default_format = "~P"
df_.pint.dequantify()

[20]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
unit	rpm	kW	N·m	bar	l/min	kW
0	1000.0	1.047198	10.0	1000.0	10.0	1.666667e+01
1	1100.0	1.151917	10.0	100000000.0	10.0	1.666667e+06
2	1200.0	1.256637	10.0	1000.0	10.0	1.666667e+01
3	1200.0	1.256637	10.0	1000.0	10.0	1.666667e+01

or the entire table’s units

[21]:

df_.pint.to_base_units().pint.dequantify()

[21]:

	speed	mech power	torque	rail pressure	fuel flow rate	fluid power
unit	rad/s	kg·m²/s³	kg·m²/s²	kg/m/s²	m³/s	kg·m²/s³
0	104.719755	1047.197551	10.0	1.000000e+08	0.000167	1.666667e+04
1	115.191731	1151.917306	10.0	1.000000e+13	0.000167	1.666667e+09
2	125.663706	1256.637061	10.0	1.000000e+08	0.000167	1.666667e+04
3	125.663706	1256.637061	10.0	1.000000e+08	0.000167	1.666667e+04

Advanced example¶

This example shows alternative ways to use pint with pandas and other features.

Start with the same imports.

[22]:

import pandas as pd
import pint
import pint_pandas

We’ll be use a shorthand for PintArray

[23]:

PA_ = pint_pandas.PintArray

And set up a unit registry and quantity shorthand.

[24]:

ureg = pint.UnitRegistry()
Q_ = ureg.Quantity

Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue.

[25]:

pint_pandas.PintType.ureg = ureg

These are the possible ways to create a PintArray.

Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object.

[26]:

df = pd.DataFrame({
        "length" : pd.Series([1,2], dtype="pint[m]"),
        "width" : PA_([2,3], dtype="pint[m]"),
        "distance" : PA_([2,3], dtype="m"),
        "height" : PA_([2,3], dtype=ureg.m),
        "depth" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),
    })
df

[26]:

	length	width	distance	height	depth
0	1	2	2	2	2
1	2	3	3	3	3

[27]:

df.length.values.units

[27]:

meter

Pandas support¶

Installation¶

Basic example¶

Reading from csv¶

Advanced example¶

About Pint

Other Formats

Useful Links

Table of Contents

Related Topics

This Page