Pandas support

Warning: pandas support is currently experimental, don’t expect everything to work.

It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.

Installation

Pandas support is provided by pint-pandas. It is not available on PyPI yet, to install it use

python -m pip install git+https://github.com/hgrecco/pint-pandas.git

Basic example

This example will show the simplist way to use pandas with pint and the underlying objects. It’s slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.

First some imports (you don’t need to import pintpandas for this to work)

[1]:
import pandas as pd
import pint

Next, we create a DataFrame with PintArrays as columns.

[2]:
df = pd.DataFrame({
    "torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
df
[2]:
torque angular_velocity
0 1 1
1 2 2
2 2 2
3 3 3

Operations with columns are units aware so behave as we would intuitively expect.

[3]:
df['power'] = df['torque'] * df['angular_velocity']
df
[3]:
torque angular_velocity power
0 1 1 1
1 2 2 4
2 2 2 4
3 3 3 9

We can see the columns’ units in the dtypes attribute

[4]:
df.dtypes
[4]:
torque                                       pint[foot * force_pound]
angular_velocity                         pint[revolutions_per_minute]
power               pint[foot * force_pound * revolutions_per_minute]
dtype: object

Each column can be accessed as a Pandas Series

[5]:
df.power
[5]:
0    1
1    4
2    4
3    9
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute]

Which contains a PintArray

[6]:
df.power.values
[6]:
<PintArray>
[1, 4, 4, 9]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute]

The PintArray contains a Quantity

[7]:
df.power.values.quantity
[7]:
\[\begin{pmatrix}1 & 4 & 4 & 9\end{pmatrix} foot\ force\_pound\ revolutions\_per\_minute\]

Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.

[8]:
df.power.pint.units
[8]:
\[foot\ force\_pound\ revolutions\_per\_minute\]
[9]:
df.power.pint.to("kW").values
[9]:
<PintArray>
[0.00014198092353610379,  0.0005679236941444151,  0.0005679236941444151,
   0.001277828311824934]
Length: 4, dtype: pint[kilowatt]

Reading from csv

Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays.

[10]:
import pandas as pd
import pint
import pintpandas
import io

Here’s the contents of the csv file.

[11]:
test_data = '''speed,mech power,torque,rail pressure,fuel flow rate,fluid power
rpm,kW,N m,bar,l/min,kW
1000.0,,10.0,1000.0,10.0,
1100.0,,10.0,100000000.0,10.0,
1200.0,,10.0,1000.0,10.0,
1200.0,,10.0,1000.0,10.0,'''

Let’s read that into a DataFrame. Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.

[12]:
df = pd.read_csv(io.StringIO(test_data), header=[0, 1])
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
df
[12]:
speed mech power torque rail pressure fuel flow rate fluid power
rpm kW N m bar l/min kW
0 1000.0 NaN 10.0 1000.0 10.0 NaN
1 1100.0 NaN 10.0 100000000.0 10.0 NaN
2 1200.0 NaN 10.0 1000.0 10.0 NaN
3 1200.0 NaN 10.0 1000.0 10.0 NaN

Then use the DataFrame’s pint accessor’s quantify method to convert the columns from np.ndarrays to PintArrays, with units from the bottom column level.

[13]:
df.dtypes
[13]:
speed           rpm      float64
mech power      kW       float64
torque          N m      float64
rail pressure   bar      float64
fuel flow rate  l/min    float64
fluid power     kW       float64
dtype: object
[14]:
df_ = df.pint.quantify(level=-1)
df_
[14]:
speed mech power torque rail pressure fuel flow rate fluid power
0 1000.0 nan 10.0 1000.0 10.0 nan
1 1100.0 nan 10.0 100000000.0 10.0 nan
2 1200.0 nan 10.0 1000.0 10.0 nan
3 1200.0 nan 10.0 1000.0 10.0 nan

As previously, operations between DataFrame columns are unit aware

[15]:
df_.speed * df_.torque
[15]:
0    10000.0
1    11000.0
2    12000.0
3    12000.0
dtype: pint[meter * newton * revolutions_per_minute]
[16]:
df_
[16]:
speed mech power torque rail pressure fuel flow rate fluid power
0 1000.0 nan 10.0 1000.0 10.0 nan
1 1100.0 nan 10.0 100000000.0 10.0 nan
2 1200.0 nan 10.0 1000.0 10.0 nan
3 1200.0 nan 10.0 1000.0 10.0 nan
[17]:
df_['mech power'] = df_.speed * df_.torque
df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
df_
[17]:
speed mech power torque rail pressure fuel flow rate fluid power
0 1000.0 10000.0 10.0 1000.0 10.0 10000.0
1 1100.0 11000.0 10.0 100000000.0 10.0 1000000000.0
2 1200.0 12000.0 10.0 1000.0 10.0 10000.0
3 1200.0 12000.0 10.0 1000.0 10.0 10000.0

The DataFrame’s pint.dequantify method then allows us to retrieve the units information as a header row once again.

[18]:
df_.pint.dequantify()
[18]:
speed mech power torque rail pressure fuel flow rate fluid power
unit revolutions_per_minute meter * newton * revolutions_per_minute meter * newton bar liter / minute bar * liter / minute
0 1000.0 10000.0 10.0 1000.0 10.0 1.000000e+04
1 1100.0 11000.0 10.0 100000000.0 10.0 1.000000e+09
2 1200.0 12000.0 10.0 1000.0 10.0 1.000000e+04
3 1200.0 12000.0 10.0 1000.0 10.0 1.000000e+04

This allows for some rather powerful abilities. For example, to change single column units

[19]:
df_['fluid power'] = df_['fluid power'].pint.to("kW")
df_['mech power'] = df_['mech power'].pint.to("kW")
df_.pint.dequantify()
[19]:
speed mech power torque rail pressure fuel flow rate fluid power
unit revolutions_per_minute kilowatt meter * newton bar liter / minute kilowatt
0 1000.0 1.047198 10.0 1000.0 10.0 1.666667e+01
1 1100.0 1.151917 10.0 100000000.0 10.0 1.666667e+06
2 1200.0 1.256637 10.0 1000.0 10.0 1.666667e+01
3 1200.0 1.256637 10.0 1000.0 10.0 1.666667e+01

The units are harder to read than they need be, so lets change pints default format for displaying units.

[20]:
pintpandas.PintType.ureg.default_format = "~P"
df_.pint.dequantify()
[20]:
speed mech power torque rail pressure fuel flow rate fluid power
unit rpm kW N·m bar l/min kW
0 1000.0 1.047198 10.0 1000.0 10.0 1.666667e+01
1 1100.0 1.151917 10.0 100000000.0 10.0 1.666667e+06
2 1200.0 1.256637 10.0 1000.0 10.0 1.666667e+01
3 1200.0 1.256637 10.0 1000.0 10.0 1.666667e+01

or the entire table’s units

[21]:
df_.pint.to_base_units().pint.dequantify()
[21]:
speed mech power torque rail pressure fuel flow rate fluid power
unit rad/s kg·m²/s³ kg·m²/s² kg/m/s² m³/s kg·m²/s³
0 104.719755 1047.197551 10.0 1.000000e+08 0.000167 1.666667e+04
1 115.191731 1151.917306 10.0 1.000000e+13 0.000167 1.666667e+09
2 125.663706 1256.637061 10.0 1.000000e+08 0.000167 1.666667e+04
3 125.663706 1256.637061 10.0 1.000000e+08 0.000167 1.666667e+04

Advanced example

This example shows alternative ways to use pint with pandas and other features.

Start with the same imports.

[22]:
import pandas as pd
import pint
import pintpandas

We’ll be use a shorthand for PintArray

[23]:
PA_ = pintpandas.PintArray

And set up a unit registry and quantity shorthand.

[24]:
ureg = pint.UnitRegistry()
Q_ = ureg.Quantity

Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue.

[25]:
pintpandas.PintType.ureg = ureg

These are the possible ways to create a PintArray.

Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object.

[26]:
df = pd.DataFrame({
        "length" : pd.Series([1,2], dtype="pint[m]"),
        "width" : PA_([2,3], dtype="pint[m]"),
        "distance" : PA_([2,3], dtype="m"),
        "height" : PA_([2,3], dtype=ureg.m),
        "depth" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),
    })
df
[26]:
length width distance height depth
0 1 2 2 2 2
1 2 3 3 3 3
[27]:
df.length.values.units
[27]:
\[meter\]