Usage ===== For a cleaner use-case I would `go here `_ First we need to load some libraries including pandas-log ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: ipython3 import pandas as pd import numpy as np import pandas_log Let’s take a look at our dataset: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: ipython3 df = pd.read_csv("pokemon.csv") df.head(10) .. raw:: html
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
5 5 Charmeleon Fire NaN 405 58 64 58 80 65 80 1 False
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
7 6 CharizardMega Charizard X Fire Dragon 634 78 130 111 130 85 100 1 False
8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 159 115 100 1 False
9 7 Squirtle Water NaN 314 44 48 65 50 64 43 1 False
Lets say we want to find out: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Who is the weakest non-legendary fire pokemon? ---------------------------------------------- The strategy will probably be something like: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Filter out legendary pokemons using ``.query()`` . 2. Keep only fire pokemons using ``.query()`` . 3. Drop Legendary column using ``.drop()`` . 4. Keep the weakest pokemon among them using ``.nsmallest()`` . 5. Reset index using ``.reset_index()`` . .. code:: ipython3 res = (df.copy() .query("legendary==0") .query("type_1=='fire' or type_2=='fire'") .drop("legendary", axis=1) .nsmallest(1,"total") .reset_index(drop=True) ) res .. raw:: html
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
OH NOO!!! Our code does not work !! We got no records ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If only there was a way to track those issue ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fortunetly thats what **pandas-log** is for! either as a global function or context manager. This the example with pandas_log’s ``context_manager``. .. code:: ipython3 with pandas_log.enable(): res = (df.query("legendary==0") .query("type_1=='fire' or type_2=='fire'") .drop("legendary", axis=1) .nsmallest(1,"total") ) res .. parsed-literal:: 1) query(expr="legendary==0", inplace=False): Metadata: * Removed 65 rows (8.125%), 735 rows remaining. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 199.4 kB. * Output Dataframe size is 188.5 kB. 2) query(expr="type_1=='fire' or type_2=='fire'", inplace=False): Metadata: * Removed 735 rows (100.0%), 0 rows remaining. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 188.5 kB. * Output Dataframe size is 0 Bytes. 3) drop(labels="legendary", axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'): Metadata: * Removed the following columns (legendary) now only have the following columns (attack, sp_def, speed, hp, total, type_2, #, name, type_1, generation, defense, sp_atk). * No change in number of rows of input df. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 0 Bytes. * Output Dataframe size is 0 Bytes. 4) nsmallest(n=1, columns="total", keep='first'): Metadata: * Picked 1 smallest rows by columns (total). Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 0 Bytes. * Output Dataframe size is 0 Bytes. .. raw:: html
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
We can see clearly that in the second step (``.query()``) we filter all the rows!! and indeed we should of writen Fire as oppose to fire ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: ipython3 res = (df.copy() .query("type_1=='Fire' or type_2=='Fire'") .query("legendary==0") .drop("legendary", axis=1) .nsmallest(1,"total") .reset_index(drop=True) ) res .. raw:: html
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2
Whoala we got Slugma !!!!!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some more advance usage ----------------------- One can use verbose variable which allows lower level logs functionalities like whether the dataframe was copied as part of pipeline. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This can explain comparision issues. .. code:: ipython3 with pandas_log.enable(verbose=True): res = (df.query("legendary==0") .query("type_1=='Fire' or type_2=='Fire'") .drop("legendary", axis=1) .nsmallest(1,"total") .reset_index(drop=True) ) res .. parsed-literal:: 1) query(expr="legendary==0", inplace=False): Metadata: * Removed 65 rows (8.125%), 735 rows remaining. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 199.4 kB. * Output Dataframe size is 188.5 kB. 2) query(expr="type_1=='Fire' or type_2=='Fire'", inplace=False): Metadata: * Removed 679 rows (92.38095238095238%), 56 rows remaining. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 188.5 kB. * Output Dataframe size is 14.4 kB. 3) drop(labels="legendary", axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'): Metadata: * Removed the following columns (legendary) now only have the following columns (attack, sp_def, speed, hp, total, type_2, #, name, type_1, generation, defense, sp_atk). * No change in number of rows of input df. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.4 kB. * Output Dataframe size is 14.3 kB. X) __getitem__(key="total"): Metadata: Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.3 kB. * Output Dataframe size is 896 Bytes. X) copy(deep=True): Metadata: * Using default strategy (some metric might not be relevant). Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.3 kB. * Output Dataframe size is 14.3 kB. X) reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=''): Metadata: Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.3 kB. * Output Dataframe size is 14.0 kB. X) __getitem__(key="total"): Metadata: Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.0 kB. * Output Dataframe size is 576 Bytes. 4) nsmallest(n=1, columns="total", keep='first'): Metadata: * Picked 1 smallest rows by columns (total). Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.3 kB. * Output Dataframe size is 236 Bytes. X) copy(deep=True): Metadata: * Using default strategy (some metric might not be relevant). Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 236 Bytes. * Output Dataframe size is 236 Bytes. X) reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=''): Metadata: Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 236 Bytes. * Output Dataframe size is 356 Bytes. .. raw:: html
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2
as we can see after both the drop and nsmallest functions the dataframe was being copied One can use silent variable which allows to suppress stdout ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: ipython3 with pandas_log.enable(silent=True): res = (df.copy() .query("legendary==0") .query("type_1=='Fire' or type_2=='Fire'") .drop("legendary", axis=1) .nsmallest(1,"total") .reset_index(drop=True) ) res .. raw:: html
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2
One can use full_signature variable which allows to suppress the signature ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: ipython3 with pandas_log.enable(full_signature=False): res = (df.query("legendary==0") .query("type_1=='Fire' or type_2=='Fire'") .drop("legendary", axis=1) .nsmallest(1,"total") .reset_index(drop=True) ) res .. parsed-literal:: 1) query(expr="legendary==0", inplace=False): Metadata: * Removed 65 rows (8.125%), 735 rows remaining. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 199.4 kB. * Output Dataframe size is 188.5 kB. 2) query(expr="type_1=='Fire' or type_2=='Fire'"): Metadata: * Removed 679 rows (92.38095238095238%), 56 rows remaining. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 188.5 kB. * Output Dataframe size is 14.4 kB. 3) drop(labels="legendary"): Metadata: * Removed the following columns (legendary) now only have the following columns (attack, sp_def, speed, hp, total, type_2, #, name, type_1, generation, defense, sp_atk). * No change in number of rows of input df. Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.4 kB. * Output Dataframe size is 14.3 kB. 4) nsmallest(n=1, columns="total"): Metadata: * Picked 1 smallest rows by columns (total). Execution Stats: * Execution time: Step Took a moment seconds.. * Input Dataframe size is 14.3 kB. * Output Dataframe size is 236 Bytes. .. raw:: html
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2