Usage

For a cleaner use-case I would go here

First we need to load some libraries including pandas-log

Let’s take a look at our dataset:

# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
5 5 Charmeleon Fire NaN 405 58 64 58 80 65 80 1 False
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
7 6 CharizardMega Charizard X Fire Dragon 634 78 130 111 130 85 100 1 False
8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 159 115 100 1 False
9 7 Squirtle Water NaN 314 44 48 65 50 64 43 1 False

Lets say we want to find out:

Who is the weakest non-legendary fire pokemon?

The strategy will probably be something like:

  1. Filter out legendary pokemons using .query() .
  2. Keep only fire pokemons using .query() .
  3. Drop Legendary column using .drop() .
  4. Keep the weakest pokemon among them using .nsmallest() .
  5. Reset index using .reset_index() .
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation

Fortunetly thats what pandas-log is for! either as a global function or context manager. This the example with pandas_log’s context_manager.

1) query(expr="legendary==0", inplace=False):
    Metadata:
    * Removed 65 rows (8.125%), 735 rows remaining.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 199.4 kB.
    * Output Dataframe size is 188.5 kB.

2) query(expr="type_1=='fire' or type_2=='fire'", inplace=False):
    Metadata:
    * Removed 735 rows (100.0%), 0 rows remaining.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 188.5 kB.
    * Output Dataframe size is 0 Bytes.

3) drop(labels="legendary", axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'):
    Metadata:
    * Removed the following columns (legendary) now only have the following columns (attack, sp_def, speed, hp, total, type_2, #, name, type_1, generation, defense, sp_atk).
    * No change in number of rows of input df.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 0 Bytes.
    * Output Dataframe size is 0 Bytes.

4) nsmallest(n=1, columns="total", keep='first'):
    Metadata:
    * Picked 1 smallest rows by columns (total).
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 0 Bytes.
    * Output Dataframe size is 0 Bytes.
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation

We can see clearly that in the second step (.query()) we filter all the rows!! and indeed we should of writen Fire as oppose to fire

# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2

Some more advance usage

One can use verbose variable which allows lower level logs functionalities like whether the dataframe was copied as part of pipeline.

This can explain comparision issues.

1) query(expr="legendary==0", inplace=False):
    Metadata:
    * Removed 65 rows (8.125%), 735 rows remaining.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 199.4 kB.
    * Output Dataframe size is 188.5 kB.

2) query(expr="type_1=='Fire' or type_2=='Fire'", inplace=False):
    Metadata:
    * Removed 679 rows (92.38095238095238%), 56 rows remaining.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 188.5 kB.
    * Output Dataframe size is 14.4 kB.

3) drop(labels="legendary", axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'):
    Metadata:
    * Removed the following columns (legendary) now only have the following columns (attack, sp_def, speed, hp, total, type_2, #, name, type_1, generation, defense, sp_atk).
    * No change in number of rows of input df.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.4 kB.
    * Output Dataframe size is 14.3 kB.

X) __getitem__(key="total"):
    Metadata:

    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.3 kB.
    * Output Dataframe size is 896 Bytes.

X) copy(deep=True):
    Metadata:
    * Using default strategy (some metric might not be relevant).
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.3 kB.
    * Output Dataframe size is 14.3 kB.

X) reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=''):
    Metadata:

    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.3 kB.
    * Output Dataframe size is 14.0 kB.

X) __getitem__(key="total"):
    Metadata:

    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.0 kB.
    * Output Dataframe size is 576 Bytes.

4) nsmallest(n=1, columns="total", keep='first'):
    Metadata:
    * Picked 1 smallest rows by columns (total).
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.3 kB.
    * Output Dataframe size is 236 Bytes.

X) copy(deep=True):
    Metadata:
    * Using default strategy (some metric might not be relevant).
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 236 Bytes.
    * Output Dataframe size is 236 Bytes.

X) reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=''):
    Metadata:

    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 236 Bytes.
    * Output Dataframe size is 356 Bytes.
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2

as we can see after both the drop and nsmallest functions the dataframe was being copied

One can use silent variable which allows to suppress stdout

# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2

One can use full_signature variable which allows to suppress the signature

1) query(expr="legendary==0", inplace=False):
    Metadata:
    * Removed 65 rows (8.125%), 735 rows remaining.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 199.4 kB.
    * Output Dataframe size is 188.5 kB.

2) query(expr="type_1=='Fire' or type_2=='Fire'"):
    Metadata:
    * Removed 679 rows (92.38095238095238%), 56 rows remaining.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 188.5 kB.
    * Output Dataframe size is 14.4 kB.

3) drop(labels="legendary"):
    Metadata:
    * Removed the following columns (legendary) now only have the following columns (attack, sp_def, speed, hp, total, type_2, #, name, type_1, generation, defense, sp_atk).
    * No change in number of rows of input df.
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.4 kB.
    * Output Dataframe size is 14.3 kB.

4) nsmallest(n=1, columns="total"):
    Metadata:
    * Picked 1 smallest rows by columns (total).
    Execution Stats:
    * Execution time: Step Took a moment seconds..
    * Input Dataframe size is 14.3 kB.
    * Output Dataframe size is 236 Bytes.
# name type_1 type_2 total hp attack defense sp_atk sp_def speed generation
0 218 Slugma Fire NaN 250 40 40 40 70 40 20 2