Pandas support¶

Warning: pandas support is currently experimental, don’t expect everything to work.

It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.

Installation¶

Pandas support is provided by the pint-pandas package. To install it use either:

python -m pip install pint-pandas

Or:

conda install -c conda-forge pint-pandas

Basic example¶

This example will show the simplist way to use pandas with pint and the underlying objects. It’s slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.

First some imports

[1]:

import pandas as pd
import pint
import pint_pandas

Next, we create a DataFrame with PintArrays as columns.

[2]:

df = pd.DataFrame({
    "torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
})
df

/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)

[2]:

	torque	angular_velocity
0	1.0	1.0
1	2.0	2.0
2	2.0	2.0
3	3.0	3.0

Operations with columns are units aware so behave as we would intuitively expect.

[3]:

df['power'] = df['torque'] * df['angular_velocity']
df

/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)

[3]:

	torque	angular_velocity	power
0	1.0	1.0	1.0
1	2.0	2.0	4.0
2	2.0	2.0	4.0
3	3.0	3.0	9.0

We can see the columns’ units in the dtypes attribute

[4]:

df.dtypes

[4]:

torque                                       pint[foot * force_pound]
angular_velocity                         pint[revolutions_per_minute]
power               pint[foot * force_pound * revolutions_per_minute]
dtype: object

Each column can be accessed as a Pandas Series

[5]:

df.power

[5]:

0    1.0
1    4.0
2    4.0
3    9.0
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute]

Which contains a PintArray

[6]:

df.power.values

[6]:

<PintArray>
[1.0, 4.0, 4.0, 9.0]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute]

The PintArray contains a Quantity

[7]:

df.power.values.quantity

[7]:

Magnitude	[1.0 4.0 4.0 9.0]
Units	foot force_pound revolutions_per_minute

Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.

[8]:

df.power.pint.units

[8]:

foot force_pound revolutions_per_minute

[9]:

df.power.pint.to("kW").values

[9]:

<PintArray>
[0.00014198092353610379,  0.0005679236941444151,  0.0005679236941444151,
   0.001277828311824934]
Length: 4, dtype: pint[kilowatt]

Reading from csv¶

Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays.

[10]:

import pandas as pd
import pint
import pint_pandas
import io

Here’s the contents of the csv file.

[11]:

test_data = '''ShaftSpeedIndex,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
pump,,A,B,C,A,B,C,A,B,C
ShaftSpeed,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
FlowRate,m^3 h^-1,8.72,9.28,9.31,11.61,12.78,13.51,18.32,17.90,19.23
DifferentialPressure,kPa,162.03,144.16,136.47,286.86,241.41,204.21,533.17,526.74,440.76
ShaftPower,kW,1.32,1.23,1.18,3.09,2.78,2.50,8.59,8.51,7.61
Efficiency,dimensionless,30.60,31.16,30.70,30.72,31.83,31.81,32.52,31.67,32.05'''

Let’s read that into a DataFrame. Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.

[12]:

df = pd.read_csv(io.StringIO(test_data), header=[0, 1], index_col = [0,1]).T
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
df

[12]:

		ShaftSpeed	FlowRate	DifferentialPressure	ShaftPower	Efficiency
		rpm	m^3 h^-1	kPa	kW	dimensionless
ShaftSpeedIndex	pump
1200	A	1200.0	8.72	162.03	1.32	30.60
	B	1200.0	9.28	144.16	1.23	31.16
	C	1200.0	9.31	136.47	1.18	30.70
1600	A	1600.0	11.61	286.86	3.09	30.72
	B	1600.0	12.78	241.41	2.78	31.83
	C	1600.0	13.51	204.21	2.50	31.81
2300	A	2300.0	18.32	533.17	8.59	32.52
	B	2300.0	17.90	526.74	8.51	31.67
	C	2300.0	19.23	440.76	7.61	32.05

Then use the DataFrame’s pint accessor’s quantify method to convert the columns from np.ndarrays to PintArrays, with units from the bottom column level.

[13]:

df.dtypes

[13]:

ShaftSpeed            rpm              float64
FlowRate              m^3 h^-1         float64
DifferentialPressure  kPa              float64
ShaftPower            kW               float64
Efficiency            dimensionless    float64
dtype: object

[14]:

df_ = df.pint.quantify(level=-1)
df_

/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)

[14]:

		ShaftSpeed	FlowRate	DifferentialPressure	ShaftPower	Efficiency
ShaftSpeedIndex	pump
1200	A	1200.0	8.72	162.03	1.32	30.6
	B	1200.0	9.28	144.16	1.23	31.16
	C	1200.0	9.31	136.47	1.18	30.7
1600	A	1600.0	11.61	286.86	3.09	30.72
	B	1600.0	12.78	241.41	2.78	31.83
	C	1600.0	13.51	204.21	2.5	31.81
2300	A	2300.0	18.32	533.17	8.59	32.52
	B	2300.0	17.9	526.74	8.51	31.67
	C	2300.0	19.23	440.76	7.61	32.05

Let’s confirm the units have been parsed correctly

[15]:

df_.dtypes

[15]:

ShaftSpeed                    pint[revolutions_per_minute]
FlowRate                pint[meter ** 3 / planck_constant]
DifferentialPressure                      pint[kilopascal]
ShaftPower                                  pint[kilowatt]
Efficiency                             pint[dimensionless]
dtype: object

Here the h in m^3 h^-1 has been parsed as the planck constant. Let’s change the unit to hours.

[16]:

df_['FlowRate'] = pint_pandas.PintArray(df_['FlowRate'].values.quantity.m, dtype = "pint[m^3/hr]")
df_.dtypes

[16]:

ShaftSpeed              pint[revolutions_per_minute]
FlowRate                     pint[meter ** 3 / hour]
DifferentialPressure                pint[kilopascal]
ShaftPower                            pint[kilowatt]
Efficiency                       pint[dimensionless]
dtype: object

As previously, operations between DataFrame columns are unit aware

[17]:

df_.ShaftPower / df_.ShaftSpeed

[17]:

ShaftSpeedIndex  pump
1200             A                      0.0011
                 B                    0.001025
                 C       0.0009833333333333332
1600             A       0.0019312499999999998
                 B                   0.0017375
                 C                   0.0015625
2300             A        0.003734782608695652
                 B       0.0036999999999999997
                 C       0.0033086956521739133
dtype: pint[kilowatt / revolutions_per_minute]

[18]:

df_['ShaftTorque'] = df_.ShaftPower / df_.ShaftSpeed
df_['FluidPower'] = df_['FlowRate'] * df_['DifferentialPressure']
df_

/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)

[18]:

		ShaftSpeed	FlowRate	DifferentialPressure	ShaftPower	Efficiency	ShaftTorque	FluidPower
ShaftSpeedIndex	pump
1200	A	1200.0	8.72	162.03	1.32	30.6	0.0011	1412.9016000000001
	B	1200.0	9.28	144.16	1.23	31.16	0.001025	1337.8048
	C	1200.0	9.31	136.47	1.18	30.7	0.0009833333333333332	1270.5357000000001
1600	A	1600.0	11.61	286.86	3.09	30.72	0.0019312499999999998	3330.4446
	B	1600.0	12.78	241.41	2.78	31.83	0.0017375	3085.2198
	C	1600.0	13.51	204.21	2.5	31.81	0.0015625	2758.8771
2300	A	2300.0	18.32	533.17	8.59	32.52	0.003734782608695652	9767.6744
	B	2300.0	17.9	526.74	8.51	31.67	0.0036999999999999997	9428.645999999999
	C	2300.0	19.23	440.76	7.61	32.05	0.0033086956521739133	8475.8148

The DataFrame’s pint.dequantify method then allows us to retrieve the units information as a header row once again.

[19]:

df_.pint.dequantify()

[19]:

		ShaftSpeed	FlowRate	DifferentialPressure	ShaftPower	Efficiency	ShaftTorque	FluidPower
	unit	revolutions_per_minute	meter ** 3 / hour	kilopascal	kilowatt	dimensionless	kilowatt / revolutions_per_minute	kilopascal * meter ** 3 / hour
ShaftSpeedIndex	pump
1200	A	1200.0	8.72	162.03	1.32	30.60	0.001100	1412.9016
	B	1200.0	9.28	144.16	1.23	31.16	0.001025	1337.8048
	C	1200.0	9.31	136.47	1.18	30.70	0.000983	1270.5357
1600	A	1600.0	11.61	286.86	3.09	30.72	0.001931	3330.4446
	B	1600.0	12.78	241.41	2.78	31.83	0.001737	3085.2198
	C	1600.0	13.51	204.21	2.50	31.81	0.001563	2758.8771
2300	A	2300.0	18.32	533.17	8.59	32.52	0.003735	9767.6744
	B	2300.0	17.90	526.74	8.51	31.67	0.003700	9428.6460
	C	2300.0	19.23	440.76	7.61	32.05	0.003309	8475.8148

This allows for some rather powerful abilities. For example, to change single column units

[20]:

df_['FluidPower'] = df_['FluidPower'].pint.to("kW")
df_['FlowRate'] = df_['FlowRate'].pint.to("L/s")
df_['ShaftTorque'] = df_['ShaftTorque'].pint.to("N m")
df_.pint.dequantify()

[20]:

		ShaftSpeed	FlowRate	DifferentialPressure	ShaftPower	Efficiency	ShaftTorque	FluidPower
	unit	revolutions_per_minute	liter / second	kilopascal	kilowatt	dimensionless	meter * newton	kilowatt
ShaftSpeedIndex	pump
1200	A	1200.0	2.422222	162.03	1.32	30.60	10.504226	0.392473
	B	1200.0	2.577778	144.16	1.23	31.16	9.788029	0.371612
	C	1200.0	2.586111	136.47	1.18	30.70	9.390142	0.352927
1600	A	1600.0	3.225000	286.86	3.09	30.72	18.442079	0.925123
	B	1600.0	3.550000	241.41	2.78	31.83	16.591903	0.857005
	C	1600.0	3.752778	204.21	2.50	31.81	14.920776	0.766355
2300	A	2300.0	5.088889	533.17	8.59	32.52	35.664547	2.713243
	B	2300.0	4.972222	526.74	8.51	31.67	35.332397	2.619068
	C	2300.0	5.341667	440.76	7.61	32.05	31.595716	2.354393

The units are harder to read than they need be, so lets change pints default format for displaying units.

[21]:

pint_pandas.PintType.ureg.default_format = "~P"
df_.pint.dequantify()

[21]:

		ShaftSpeed	FlowRate	DifferentialPressure	ShaftPower	Efficiency	ShaftTorque	FluidPower
	unit	revolutions_per_minute	liter / second	kilopascal	kilowatt	dimensionless	meter * newton	kilowatt
ShaftSpeedIndex	pump
1200	A	1200.0	2.422222	162.03	1.32	30.60	10.504226	0.392473
	B	1200.0	2.577778	144.16	1.23	31.16	9.788029	0.371612
	C	1200.0	2.586111	136.47	1.18	30.70	9.390142	0.352927
1600	A	1600.0	3.225000	286.86	3.09	30.72	18.442079	0.925123
	B	1600.0	3.550000	241.41	2.78	31.83	16.591903	0.857005
	C	1600.0	3.752778	204.21	2.50	31.81	14.920776	0.766355
2300	A	2300.0	5.088889	533.17	8.59	32.52	35.664547	2.713243
	B	2300.0	4.972222	526.74	8.51	31.67	35.332397	2.619068
	C	2300.0	5.341667	440.76	7.61	32.05	31.595716	2.354393

or the entire table’s units

[22]:

df_.pint.to_base_units().pint.dequantify()

[22]:

		ShaftSpeed	FlowRate	DifferentialPressure	ShaftPower	Efficiency	ShaftTorque	FluidPower
	unit	radian / second	meter ** 3 / second	kilogram / meter / second ** 2	kilogram * meter 2 / second 3	dimensionless	kilogram * meter 2 / second 2	kilogram * meter 2 / second 3
ShaftSpeedIndex	pump
1200	A	125.663706	0.002422	162030.0	1320.0	30.60	10.504226	392.472667
	B	125.663706	0.002578	144160.0	1230.0	31.16	9.788029	371.612444
	C	125.663706	0.002586	136470.0	1180.0	30.70	9.390142	352.926583
1600	A	167.551608	0.003225	286860.0	3090.0	30.72	18.442079	925.123500
	B	167.551608	0.003550	241410.0	2780.0	31.83	16.591903	857.005500
	C	167.551608	0.003753	204210.0	2500.0	31.81	14.920776	766.354750
2300	A	240.855437	0.005089	533170.0	8590.0	32.52	35.664547	2713.242889
	B	240.855437	0.004972	526740.0	8510.0	31.67	35.332397	2619.068333
	C	240.855437	0.005342	440760.0	7610.0	32.05	31.595716	2354.393000

Plotting¶

Pint’s matplotlib support allows columns with the same dimensionality to be plotted.

[23]:

pint_pandas.PintType.ureg.setup_matplotlib()
# ax = df_[['ShaftPower', 'FluidPower']].unstack("pump").plot()

[24]:

# ax.yaxis.units

Note that indexes cannot store PintArrays, so don’t contain unit information

[25]:

# print(ax.xaxis.units)

Advanced example¶

This example shows alternative ways to use pint with pandas and other features.

Start with the same imports.

[26]:

import pandas as pd
import pint
import pint_pandas

We’ll be use a shorthand for PintArray

[27]:

PA_ = pint_pandas.PintArray

And set up a unit registry and quantity shorthand.

[28]:

ureg = pint.UnitRegistry()
Q_ = ureg.Quantity

Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue.

[29]:

pint_pandas.PintType.ureg = ureg

These are the possible ways to create a PintArray.

Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object.

[30]:

df = pd.DataFrame({
        "length" : pd.Series([1.,2.], dtype="pint[m]"),
        "width" : PA_([2.,3.], dtype="pint[m]"),
        "distance" : PA_([2.,3.], dtype="m"),
        "height" : PA_([2.,3.], dtype=ureg.m),
        "depth" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),
    })
df

/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:194: RuntimeWarning: pint-pandas does not support magnitudes of <class 'numpy.int64'>. Converting magnitudes to float.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)
/home/docs/checkouts/readthedocs.org/user_builds/pint/envs/0.18/lib/python3.8/site-packages/pint_pandas/pint_array.py:648: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.array(qtys, dtype="object", copy=copy)

[30]:

	length	width	distance	height	depth
0	1.0	2.0	2.0	2.0	2.0
1	2.0	3.0	3.0	3.0	3.0

[31]:

df.length.values.units

[31]:

meter

Pandas support¶

Installation¶

Basic example¶

Reading from csv¶

Plotting¶

Advanced example¶

About Pint

Other Formats

Useful Links

Table of Contents

Related Topics

This Page