Pandas support

Warning: pandas support is currently experimental, don’t expect everything to work.

It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.

Installation

Pandas support is provided by the pint-pandas package. To install it use either:

python -m pip install pint-pandas

Or:

conda install -c conda-forge pint-pandas

Basic example

This example will show the simplist way to use pandas with pint and the underlying objects. It’s slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.

First some imports

[1]:
import pandas as pd
import pint
import pint_pandas

Next, we create a DataFrame with PintArrays as columns.

[2]:
df = pd.DataFrame({
    "torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
})
df
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
[2]:
torque angular_velocity
0 1.0 1.0
1 2.0 2.0
2 2.0 2.0
3 3.0 3.0

Operations with columns are units aware so behave as we would intuitively expect.

[3]:
df['power'] = df['torque'] * df['angular_velocity']
df
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
[3]:
torque angular_velocity power
0 1.0 1.0 1.0
1 2.0 2.0 4.0
2 2.0 2.0 4.0
3 3.0 3.0 9.0

We can see the columns’ units in the dtypes attribute

[4]:
df.dtypes
[4]:
torque                                       pint[foot * force_pound]
angular_velocity                         pint[revolutions_per_minute]
power               pint[foot * force_pound * revolutions_per_minute]
dtype: object

Each column can be accessed as a Pandas Series

[5]:
df.power
[5]:
0    1.0
1    4.0
2    4.0
3    9.0
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute]

Which contains a PintArray

[6]:
df.power.values
[6]:
<PintArray>
[1.0, 4.0, 4.0, 9.0]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute]

The PintArray contains a Quantity

[7]:
df.power.values.quantity
[7]:
Magnitude
[1.0 4.0 4.0 9.0]
Unitsfoot force_pound revolutions_per_minute

Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.

[8]:
df.power.pint.units
[8]:
foot force_pound revolutions_per_minute
[9]:
df.power.pint.to("kW").values
[9]:
<PintArray>
[0.00014198092353610379,  0.0005679236941444151,  0.0005679236941444151,
   0.001277828311824934]
Length: 4, dtype: pint[kilowatt]

Reading from csv

Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays.

[10]:
import pandas as pd
import pint
import pint_pandas
import io

Here’s the contents of the csv file.

[11]:
test_data = '''ShaftSpeedIndex,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
pump,,A,B,C,A,B,C,A,B,C
ShaftSpeed,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
FlowRate,m^3 h^-1,8.72,9.28,9.31,11.61,12.78,13.51,18.32,17.90,19.23
DifferentialPressure,kPa,162.03,144.16,136.47,286.86,241.41,204.21,533.17,526.74,440.76
ShaftPower,kW,1.32,1.23,1.18,3.09,2.78,2.50,8.59,8.51,7.61
Efficiency,dimensionless,30.60,31.16,30.70,30.72,31.83,31.81,32.52,31.67,32.05'''

Let’s read that into a DataFrame. Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.

[12]:
df = pd.read_csv(io.StringIO(test_data), header=[0, 1], index_col = [0,1]).T
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
df
[12]:
ShaftSpeed FlowRate DifferentialPressure ShaftPower Efficiency
rpm m^3 h^-1 kPa kW dimensionless
ShaftSpeedIndex pump
1200 A 1200.0 8.72 162.03 1.32 30.60
B 1200.0 9.28 144.16 1.23 31.16
C 1200.0 9.31 136.47 1.18 30.70
1600 A 1600.0 11.61 286.86 3.09 30.72
B 1600.0 12.78 241.41 2.78 31.83
C 1600.0 13.51 204.21 2.50 31.81
2300 A 2300.0 18.32 533.17 8.59 32.52
B 2300.0 17.90 526.74 8.51 31.67
C 2300.0 19.23 440.76 7.61 32.05

Then use the DataFrame’s pint accessor’s quantify method to convert the columns from np.ndarrays to PintArrays, with units from the bottom column level.

[13]:
df.dtypes
[13]:
ShaftSpeed            rpm              float64
FlowRate              m^3 h^-1         float64
DifferentialPressure  kPa              float64
ShaftPower            kW               float64
Efficiency            dimensionless    float64
dtype: object
[14]:
df_ = df.pint.quantify(level=-1)
df_
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
[14]:
ShaftSpeed FlowRate DifferentialPressure ShaftPower Efficiency
ShaftSpeedIndex pump
1200 A 1200.0 8.72 162.03 1.32 30.6
B 1200.0 9.28 144.16 1.23 31.16
C 1200.0 9.31 136.47 1.18 30.7
1600 A 1600.0 11.61 286.86 3.09 30.72
B 1600.0 12.78 241.41 2.78 31.83
C 1600.0 13.51 204.21 2.5 31.81
2300 A 2300.0 18.32 533.17 8.59 32.52
B 2300.0 17.9 526.74 8.51 31.67
C 2300.0 19.23 440.76 7.61 32.05

Let’s confirm the units have been parsed correctly

[15]:
df_.dtypes
[15]:
ShaftSpeed                    pint[revolutions_per_minute]
FlowRate                pint[meter ** 3 / planck_constant]
DifferentialPressure                      pint[kilopascal]
ShaftPower                                  pint[kilowatt]
Efficiency                             pint[dimensionless]
dtype: object

Here the h in m^3 h^-1 has been parsed as the planck constant. Let’s change the unit to hours.

[16]:
df_['FlowRate'] = pint_pandas.PintArray(df_['FlowRate'].values.quantity.m, dtype = "pint[m^3/hr]")
df_.dtypes
[16]:
ShaftSpeed              pint[revolutions_per_minute]
FlowRate                     pint[meter ** 3 / hour]
DifferentialPressure                pint[kilopascal]
ShaftPower                            pint[kilowatt]
Efficiency                       pint[dimensionless]
dtype: object

As previously, operations between DataFrame columns are unit aware

[17]:
df_.ShaftPower / df_.ShaftSpeed
[17]:
ShaftSpeedIndex  pump
1200             A                      0.0011
                 B                    0.001025
                 C       0.0009833333333333332
1600             A       0.0019312499999999998
                 B                   0.0017375
                 C                   0.0015625
2300             A        0.003734782608695652
                 B       0.0036999999999999997
                 C       0.0033086956521739133
dtype: pint[kilowatt / revolutions_per_minute]
[18]:
df_['ShaftTorque'] = df_.ShaftPower / df_.ShaftSpeed
df_['FluidPower'] = df_['FlowRate'] * df_['DifferentialPressure']
df_
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
[18]:
ShaftSpeed FlowRate DifferentialPressure ShaftPower Efficiency ShaftTorque FluidPower
ShaftSpeedIndex pump
1200 A 1200.0 8.72 162.03 1.32 30.6 0.0011 1412.9016000000001
B 1200.0 9.28 144.16 1.23 31.16 0.001025 1337.8048
C 1200.0 9.31 136.47 1.18 30.7 0.0009833333333333332 1270.5357000000001
1600 A 1600.0 11.61 286.86 3.09 30.72 0.0019312499999999998 3330.4446
B 1600.0 12.78 241.41 2.78 31.83 0.0017375 3085.2198
C 1600.0 13.51 204.21 2.5 31.81 0.0015625 2758.8771
2300 A 2300.0 18.32 533.17 8.59 32.52 0.003734782608695652 9767.6744
B 2300.0 17.9 526.74 8.51 31.67 0.0036999999999999997 9428.645999999999
C 2300.0 19.23 440.76 7.61 32.05 0.0033086956521739133 8475.8148

The DataFrame’s pint.dequantify method then allows us to retrieve the units information as a header row once again.

[19]:
df_.pint.dequantify()
[19]:
ShaftSpeed FlowRate DifferentialPressure ShaftPower Efficiency ShaftTorque FluidPower
unit revolutions_per_minute meter ** 3 / hour kilopascal kilowatt dimensionless kilowatt / revolutions_per_minute kilopascal * meter ** 3 / hour
ShaftSpeedIndex pump
1200 A 1200.0 8.72 162.03 1.32 30.60 0.001100 1412.9016
B 1200.0 9.28 144.16 1.23 31.16 0.001025 1337.8048
C 1200.0 9.31 136.47 1.18 30.70 0.000983 1270.5357
1600 A 1600.0 11.61 286.86 3.09 30.72 0.001931 3330.4446
B 1600.0 12.78 241.41 2.78 31.83 0.001737 3085.2198
C 1600.0 13.51 204.21 2.50 31.81 0.001563 2758.8771
2300 A 2300.0 18.32 533.17 8.59 32.52 0.003735 9767.6744
B 2300.0 17.90 526.74 8.51 31.67 0.003700 9428.6460
C 2300.0 19.23 440.76 7.61 32.05 0.003309 8475.8148

This allows for some rather powerful abilities. For example, to change single column units

[20]:
df_['FluidPower'] = df_['FluidPower'].pint.to("kW")
df_['FlowRate'] = df_['FlowRate'].pint.to("L/s")
df_['ShaftTorque'] = df_['ShaftTorque'].pint.to("N m")
df_.pint.dequantify()
[20]:
ShaftSpeed FlowRate DifferentialPressure ShaftPower Efficiency ShaftTorque FluidPower
unit revolutions_per_minute liter / second kilopascal kilowatt dimensionless meter * newton kilowatt
ShaftSpeedIndex pump
1200 A 1200.0 2.422222 162.03 1.32 30.60 10.504226 0.392473
B 1200.0 2.577778 144.16 1.23 31.16 9.788029 0.371612
C 1200.0 2.586111 136.47 1.18 30.70 9.390142 0.352927
1600 A 1600.0 3.225000 286.86 3.09 30.72 18.442079 0.925123
B 1600.0 3.550000 241.41 2.78 31.83 16.591903 0.857005
C 1600.0 3.752778 204.21 2.50 31.81 14.920776 0.766355
2300 A 2300.0 5.088889 533.17 8.59 32.52 35.664547 2.713243
B 2300.0 4.972222 526.74 8.51 31.67 35.332397 2.619068
C 2300.0 5.341667 440.76 7.61 32.05 31.595716 2.354393

The units are harder to read than they need be, so lets change pints default format for displaying units.

[21]:
pint_pandas.PintType.ureg.default_format = "~P"
df_.pint.dequantify()
[21]:
ShaftSpeed FlowRate DifferentialPressure ShaftPower Efficiency ShaftTorque FluidPower
unit rpm l/s kPa kW N·m kW
ShaftSpeedIndex pump
1200 A 1200.0 2.422222 162.03 1.32 30.60 10.504226 0.392473
B 1200.0 2.577778 144.16 1.23 31.16 9.788029 0.371612
C 1200.0 2.586111 136.47 1.18 30.70 9.390142 0.352927
1600 A 1600.0 3.225000 286.86 3.09 30.72 18.442079 0.925123
B 1600.0 3.550000 241.41 2.78 31.83 16.591903 0.857005
C 1600.0 3.752778 204.21 2.50 31.81 14.920776 0.766355
2300 A 2300.0 5.088889 533.17 8.59 32.52 35.664547 2.713243
B 2300.0 4.972222 526.74 8.51 31.67 35.332397 2.619068
C 2300.0 5.341667 440.76 7.61 32.05 31.595716 2.354393

or the entire table’s units

[22]:
df_.pint.to_base_units().pint.dequantify()
[22]:
ShaftSpeed FlowRate DifferentialPressure ShaftPower Efficiency ShaftTorque FluidPower
unit rad/s m³/s kg/m/s² kg·m²/s³ kg·m²/s² kg·m²/s³
ShaftSpeedIndex pump
1200 A 125.663706 0.002422 162030.0 1320.0 30.60 10.504226 392.472667
B 125.663706 0.002578 144160.0 1230.0 31.16 9.788029 371.612444
C 125.663706 0.002586 136470.0 1180.0 30.70 9.390142 352.926583
1600 A 167.551608 0.003225 286860.0 3090.0 30.72 18.442079 925.123500
B 167.551608 0.003550 241410.0 2780.0 31.83 16.591903 857.005500
C 167.551608 0.003753 204210.0 2500.0 31.81 14.920776 766.354750
2300 A 240.855437 0.005089 533170.0 8590.0 32.52 35.664547 2713.242889
B 240.855437 0.004972 526740.0 8510.0 31.67 35.332397 2619.068333
C 240.855437 0.005342 440760.0 7610.0 32.05 31.595716 2354.393000

Plotting

Pint’s matplotlib support allows columns with the same dimensionality to be plotted.

[23]:
pint_pandas.PintType.ureg.setup_matplotlib()
# ax = df_[['ShaftPower', 'FluidPower']].unstack("pump").plot()
[24]:
# ax.yaxis.units

Note that indexes cannot store PintArrays, so don’t contain unit information

[25]:
# print(ax.xaxis.units)

Advanced example

This example shows alternative ways to use pint with pandas and other features.

Start with the same imports.

[26]:
import pandas as pd
import pint
import pint_pandas

We’ll be use a shorthand for PintArray

[27]:
PA_ = pint_pandas.PintArray

And set up a unit registry and quantity shorthand.

[28]:
ureg = pint.UnitRegistry()
Q_ = ureg.Quantity

Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue.

[29]:
pint_pandas.PintType.ureg = ureg

These are the possible ways to create a PintArray.

Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object.

[30]:
df = pd.DataFrame({
        "length" : pd.Series([1.,2.], dtype="pint[m]"),
        "width" : PA_([2.,3.], dtype="pint[m]"),
        "distance" : PA_([2.,3.], dtype="m"),
        "height" : PA_([2.,3.], dtype=ureg.m),
        "depth" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),
    })
df
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:194: RuntimeWarning: pint-pandas does not support magnitudes of <class 'numpy.int64'>. Converting magnitudes to float.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/latest/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
[30]:
length width distance height depth
0 1.0 2.0 2.0 2.0 2.0
1 2.0 3.0 3.0 3.0 3.0
[31]:
df.length.values.units
[31]:
meter