Pandas support¶
It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.
Installation¶
Pandas support is provided by the pint-pandas
package. To install it use either:
python -m pip install pint-pandas
Or:
conda install -c conda-forge pint-pandas
Basic example¶
This example will show the simplist way to use pandas with pint and the underlying objects. It’s slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.
First some imports (you don’t need to import pint_pandas
for this to work)
[1]:
import pandas as pd
import pint
Next, we create a DataFrame with PintArrays as columns.
[2]:
df = pd.DataFrame({
"torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
"angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
df
[2]:
torque | angular_velocity | |
---|---|---|
0 | 1 | 1 |
1 | 2 | 2 |
2 | 2 | 2 |
3 | 3 | 3 |
Operations with columns are units aware so behave as we would intuitively expect.
[3]:
df['power'] = df['torque'] * df['angular_velocity']
df
[3]:
torque | angular_velocity | power | |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 2 | 2 | 4 |
2 | 2 | 2 | 4 |
3 | 3 | 3 | 9 |
We can see the columns’ units in the dtypes attribute
[4]:
df.dtypes
[4]:
torque pint[foot * force_pound]
angular_velocity pint[revolutions_per_minute]
power pint[foot * force_pound * revolutions_per_minute]
dtype: object
Each column can be accessed as a Pandas Series
[5]:
df.power
[5]:
0 1
1 4
2 4
3 9
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute]
Which contains a PintArray
[6]:
df.power.values
[6]:
<PintArray>
[1, 4, 4, 9]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute]
The PintArray contains a Quantity
[7]:
df.power.values.quantity
[7]:
Magnitude | [1 4 4 9] |
---|---|
Units | foot force_pound revolutions_per_minute |
Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.
[8]:
df.power.pint.units
[8]:
[9]:
df.power.pint.to("kW").values
[9]:
<PintArray>
[0.00014198092353610379, 0.0005679236941444151, 0.0005679236941444151,
0.001277828311824934]
Length: 4, dtype: pint[kilowatt]
Reading from csv¶
Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays.
[10]:
import pandas as pd
import pint
import pint_pandas
import io
Here’s the contents of the csv file.
[11]:
test_data = '''speed,mech power,torque,rail pressure,fuel flow rate,fluid power
rpm,kW,N m,bar,l/min,kW
1000.0,,10.0,1000.0,10.0,
1100.0,,10.0,100000000.0,10.0,
1200.0,,10.0,1000.0,10.0,
1200.0,,10.0,1000.0,10.0,'''
Let’s read that into a DataFrame. Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.
[12]:
df = pd.read_csv(io.StringIO(test_data), header=[0, 1])
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
df
[12]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
rpm | kW | N m | bar | l/min | kW | |
0 | 1000.0 | NaN | 10.0 | 1000.0 | 10.0 | NaN |
1 | 1100.0 | NaN | 10.0 | 100000000.0 | 10.0 | NaN |
2 | 1200.0 | NaN | 10.0 | 1000.0 | 10.0 | NaN |
3 | 1200.0 | NaN | 10.0 | 1000.0 | 10.0 | NaN |
Then use the DataFrame’s pint accessor’s quantify method to convert the columns from np.ndarray
s to PintArrays, with units from the bottom column level.
[13]:
df.dtypes
[13]:
speed rpm float64
mech power kW float64
torque N m float64
rail pressure bar float64
fuel flow rate l/min float64
fluid power kW float64
dtype: object
[14]:
df_ = df.pint.quantify(level=-1)
df_
[14]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
0 | 1000.0 | nan | 10.0 | 1000.0 | 10.0 | nan |
1 | 1100.0 | nan | 10.0 | 100000000.0 | 10.0 | nan |
2 | 1200.0 | nan | 10.0 | 1000.0 | 10.0 | nan |
3 | 1200.0 | nan | 10.0 | 1000.0 | 10.0 | nan |
As previously, operations between DataFrame columns are unit aware
[15]:
df_.speed * df_.torque
[15]:
0 10000.0
1 11000.0
2 12000.0
3 12000.0
dtype: pint[meter * newton * revolutions_per_minute]
[16]:
df_
[16]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
0 | 1000.0 | nan | 10.0 | 1000.0 | 10.0 | nan |
1 | 1100.0 | nan | 10.0 | 100000000.0 | 10.0 | nan |
2 | 1200.0 | nan | 10.0 | 1000.0 | 10.0 | nan |
3 | 1200.0 | nan | 10.0 | 1000.0 | 10.0 | nan |
[17]:
df_['mech power'] = df_.speed * df_.torque
df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
df_
[17]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
0 | 1000.0 | 10000.0 | 10.0 | 1000.0 | 10.0 | 10000.0 |
1 | 1100.0 | 11000.0 | 10.0 | 100000000.0 | 10.0 | 1000000000.0 |
2 | 1200.0 | 12000.0 | 10.0 | 1000.0 | 10.0 | 10000.0 |
3 | 1200.0 | 12000.0 | 10.0 | 1000.0 | 10.0 | 10000.0 |
The DataFrame’s pint.dequantify
method then allows us to retrieve the units information as a header row once again.
[18]:
df_.pint.dequantify()
[18]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
unit | revolutions_per_minute | meter * newton * revolutions_per_minute | meter * newton | bar | liter / minute | bar * liter / minute |
0 | 1000.0 | 10000.0 | 10.0 | 1000.0 | 10.0 | 1.000000e+04 |
1 | 1100.0 | 11000.0 | 10.0 | 100000000.0 | 10.0 | 1.000000e+09 |
2 | 1200.0 | 12000.0 | 10.0 | 1000.0 | 10.0 | 1.000000e+04 |
3 | 1200.0 | 12000.0 | 10.0 | 1000.0 | 10.0 | 1.000000e+04 |
This allows for some rather powerful abilities. For example, to change single column units
[19]:
df_['fluid power'] = df_['fluid power'].pint.to("kW")
df_['mech power'] = df_['mech power'].pint.to("kW")
df_.pint.dequantify()
[19]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
unit | revolutions_per_minute | kilowatt | meter * newton | bar | liter / minute | kilowatt |
0 | 1000.0 | 1.047198 | 10.0 | 1000.0 | 10.0 | 1.666667e+01 |
1 | 1100.0 | 1.151917 | 10.0 | 100000000.0 | 10.0 | 1.666667e+06 |
2 | 1200.0 | 1.256637 | 10.0 | 1000.0 | 10.0 | 1.666667e+01 |
3 | 1200.0 | 1.256637 | 10.0 | 1000.0 | 10.0 | 1.666667e+01 |
The units are harder to read than they need be, so lets change pints default format for displaying units.
[20]:
pint_pandas.PintType.ureg.default_format = "~P"
df_.pint.dequantify()
[20]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
unit | rpm | kW | N·m | bar | l/min | kW |
0 | 1000.0 | 1.047198 | 10.0 | 1000.0 | 10.0 | 1.666667e+01 |
1 | 1100.0 | 1.151917 | 10.0 | 100000000.0 | 10.0 | 1.666667e+06 |
2 | 1200.0 | 1.256637 | 10.0 | 1000.0 | 10.0 | 1.666667e+01 |
3 | 1200.0 | 1.256637 | 10.0 | 1000.0 | 10.0 | 1.666667e+01 |
or the entire table’s units
[21]:
df_.pint.to_base_units().pint.dequantify()
[21]:
speed | mech power | torque | rail pressure | fuel flow rate | fluid power | |
---|---|---|---|---|---|---|
unit | rad/s | kg·m²/s³ | kg·m²/s² | kg/m/s² | m³/s | kg·m²/s³ |
0 | 104.719755 | 1047.197551 | 10.0 | 1.000000e+08 | 0.000167 | 1.666667e+04 |
1 | 115.191731 | 1151.917306 | 10.0 | 1.000000e+13 | 0.000167 | 1.666667e+09 |
2 | 125.663706 | 1256.637061 | 10.0 | 1.000000e+08 | 0.000167 | 1.666667e+04 |
3 | 125.663706 | 1256.637061 | 10.0 | 1.000000e+08 | 0.000167 | 1.666667e+04 |
Advanced example¶
This example shows alternative ways to use pint with pandas and other features.
Start with the same imports.
[22]:
import pandas as pd
import pint
import pint_pandas
We’ll be use a shorthand for PintArray
[23]:
PA_ = pint_pandas.PintArray
And set up a unit registry and quantity shorthand.
[24]:
ureg = pint.UnitRegistry()
Q_ = ureg.Quantity
Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue.
[25]:
pint_pandas.PintType.ureg = ureg
These are the possible ways to create a PintArray.
Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object.
[26]:
df = pd.DataFrame({
"length" : pd.Series([1,2], dtype="pint[m]"),
"width" : PA_([2,3], dtype="pint[m]"),
"distance" : PA_([2,3], dtype="m"),
"height" : PA_([2,3], dtype=ureg.m),
"depth" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),
})
df
[26]:
length | width | distance | height | depth | |
---|---|---|---|---|---|
0 | 1 | 2 | 2 | 2 | 2 |
1 | 2 | 3 | 3 | 3 | 3 |
[27]:
df.length.values.units
[27]: