Operations On Dataframe - Part One

So far,  we have learned many concepts in Pandas,
Now we will learn about various operations that we can perform in regards to dataFrames,
 
All these categories mentioned above perform various operations that could be helpful in different kinds of data analysis. So, let’s study them in depth now,
 

Binary Operations

 
Binary means ‘two’, if we perform any operation between two elements then it is a Binary Operation. This includes addition, subtraction, multiplication and division. Since we are considering dataFrames here, so their operations are in regard to two dataFrames, like adding, subtracting or multiplying elements of two dataFrames.
 
+ , add(), radd()
  • If 2 dataFrames are all numeric and we want to add those 2 dataFrames, then we use ‘+’.

    SYNTAX
    dataFrame1-dataFrame2
  • For the addition of 2 dataFrames we can also use the method ‘add()’.

    SYNTAX
    dataFrame1.add(dataFrame2)
  • Also, you can use ‘radd()’, this works the same as add(), the difference is that if we want A+B, we use add(), else if we want B+A, we use radd(). (It won’t make any difference in addition but it would make sense when we need subtraction and division.)

    SYNTAX

    dataFrame1.radd(dataFrame2)
    1. import pandas as pd    
    2.     
    3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }    
    4.     
    5. df1=pd.DataFrame(dict1,index=['0','1','2'])    
    6. print("This is df1:")    
    7. print(df1)    
    8. print('\n')    
    9.     
    10. dict2= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], D': [95,87,92] }    
    11.     
    12. df2=pd.DataFrame(dict2,index=['0','1','2'])    
    13. print("This is df2:")    
    14. print(df2)    
    15. print('\n')    
    16.     
    17. df3=df1+df2    
    18. print("Using '+', This is df1+df2 :")    
    19. print("This is df3:")    
    20. print(df3)    
    21. print('\n')    
    22.     
    23. df4=df2.add(df3)    
    24. print("Using 'add()', This is df2+df3 :")    
    25. print("This is df4:")    
    26. print(df4)    
    27. print('\n')    
    28.     
    29. df5=df3.radd(df4)    
    30. print("Using 'radd()', This is df4+df3:")    
    31. print(df5)    
OUTPUT
 
 
- , sub(),rsub()
  • If you want to perform subtraction within 2 dataFrames then you can use ‘-‘ or method ‘sub()’.

    SYNTAX
    dataFrame1-dataFrame2

    SYNTAX
    dataFrame1.sub(dataFrame2)
  • As mentioned above, if you want A-B, then use ‘sub()’, but if you want B-A, then use ‘rsub()’

    SYNTAX
    dataFrame1.rsub(dataFrame2)

  • For B-A, you can also use,

    SYNTAX
    dataFrame2-dataFrame1
    1. import pandas as pd    
    2.     
    3. dict1= {'A':[85,73,98], 'B':[60,80,58],'C':[90,60,74], 'D': [95,87,92] }    
    4.     
    5. df1=pd.DataFrame(dict1,index=['0','1','2'])    
    6. print("This is df1:")    
    7. print(df1)    
    8. print('\n')    
    9.     
    10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }    
    11.     
    12. df2=pd.DataFrame(dict2,index=['0','1','2'])    
    13.     
    14. print("This is df2:")    
    15. print(df2)    
    16. print('\n')    
    17.     
    18. df3=df1.sub(df2)    
    19. print("Using 'sub()', This is df1-df2 :")    
    20. print(df3)    
    21. print('\n')    
    22.     
    23. df4=df1.rsub(df2)    
    24. print("Using 'rsub()', This is df2-df1 :")    
    25. print(df4)   
OUTPUT
 
 
* , mul(), rmul()
 
If you want to multiply 2 dataFrames then you can use ‘*‘ or method ‘mul()’.
 
SYNTAX
dataFrame1*dataFrame2
 
SYNTAX
dataFrame1.mul(dataFrame2)
 
‘rmul()’ works same as radd()
 
SYNTAX
dataFrame1.rmul(dataFrame2)
  1. import pandas as pd    
  2.     
  3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], D': [95,87,92] }    
  4.     
  5. df1=pd.DataFrame(dict1,index=['0','1','2'])    
  6. print("This is df1:")    
  7. print(df1)    
  8. print('\n')    
  9.     
  10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }    
  11.     
  12. df2=pd.DataFrame(dict2,index=['0','1','2'])    
  13. print("This is df2:")    
  14. print(df2)    
  15. print('\n')    
  16.     
  17. df3=df1*df2    
  18. print("Using '*', This is df1*df2 :")    
  19. print("This is df3:")    
  20. print(df3)    
  21. print('\n')    
  22.     
  23. df4=df2.mul(df3)    
  24. print("Using 'mul()', This is df2*df3 :")    
  25. print(df4)   
OUTPUT
 
 
/ , div(), rdiv()
  • If you want to perform division within 2 dataFrames then you can use ‘/‘ or method ‘div()’.

    SYNTAX
    dataFrame1/dataFrame2

    SYNTAX
    dataFrame1.div(dataFrame2)
  • As mentioned above, if you want A/B, then use ‘div()’, but if you want B-A, then use ‘rdiv()’

    SYNTAX
    dataFrame1.rdiv(dataFrame2)
  • For B/A, you can also use,

    SYNTAX
    dataFrame2/dataFrame1
    1. import pandas as pd    
    2.     
    3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }    
    4.     
    5. df1=pd.DataFrame(dict1,index=['0','1','2'])    
    6. print("This is df1:")    
    7. print(df1)    
    8. print('\n')    
    9.     
    10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }    
    11.     
    12. df2=pd.DataFrame(dict2,index=['0','1','2'])    
    13. print("This is df2:")    
    14. print(df2)    
    15. print('\n')    
    16.     
    17. df3=df1/df2    
    18. print("Using '/', This is df1/df2 :")    
    19. print("This is df3:")    
    20. print(df3)    
    21. print('\n')    
    22.     
    23. df4=df3.div(df2)    
    24. print("Using 'div()', This is df3/df2 :")    
    25. print(df4)    
    26. print('\n')    
    27.     
    28. print("Using 'rdiv()', This is df2/df1 :")    
    29. df5=df1.rdiv(df2)    
    30. print(df5)    
OUTPUT
 
 

Inspection Functions

 
As the name suggests, these functions are used to inspect or you can say examine a dataframe. To gather information or to know the detailed description of a dataframe these inspection functions are used.
 
These are to 2 kinds,
  1. info()
  2. describe()
Let us understand them briefly,
 
info()
 
If you want to gather any information about a particular dataFrame like how many rows are there, how many columns, what is its data type, how much memory it uses, etc., then we use method ‘info()’
 
info() method gives you an output in 7 parts,
  1. Type – Gives data type of the object, which is of given dataFrame
  2. No. of rows- Prints no. of rows and row names
  3. No. of columns- Prints no. of columns and column names
  4. Description of all the columns
  5. Data Type- Displays data type of each column if it differs
  6. Memory Usage
  7. Null Count
    1. import pandas as pd    
    2.     
    3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }    
    4.     
    5. df=pd.DataFrame(dict,index=['0','1','2'])    
    6. print(df)    
    7. print("\n")    
    8. print(df.info()   
OUTPUT
 
 
describe()
 
If you want the description of a particular dataFrame, as in statistical information like mean, stand deviation, count of non-NA values, etc. then use method ‘describe()’.
describe() method gives you an output in 8 parts,
  1. Count of non-NA values in each column
  2. Mean of each column
  3. Standard Deviation of each column
  4. Minimum values in each column
  5. 25% of each column
  6. 25% of each column
  7. 25% of each column
  8. Maximum values in each column
    1. import pandas as pd    
    2.     
    3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }    
    4.     
    5. df=pd.DataFrame(dict,index=['0','1','2'])    
    6. print(df)    
    7. print("\n")    
    8. print(df.describe())    
OUTPUT
 
 
3. Retrieve Head and Tail Rows
  • If you want to display top 5 rows of a dataFrame, then use- ‘head()’.
  • If you want to display bottom 5 rows of a dataFrame, then use- ‘tail()’.
  • If you want to display top 7 rows of a dataFrame, then use- ‘head(7)’
  • Default value of head() and tail() methods is: 5.

    SYNTAX
dataFrame.head()
dataFrame.tail()
dataFrame.head(7)
  1. import pandas as pd    
  2.     
  3. dict= {'A':[85,73,98,59,27,78,99,36,58,24,25,32],     
  4. 'B':[60,80,58,78,52,54,89,63,54,87,52,65],    
  5. 'C':[90,60,74,69,98,74,23,65,45,78,98,98],    
  6. 'D':[55,27,92,56,78,88,78,89,23,45,54,34],    
  7. 'E':[91,12,98,63,98,97,45,96,91,32,65,76]    
  8. }    
  9.     
  10. df=pd.DataFrame(dict,index=['0','1','2','3','4','5','6','7','8','9','10','11'])    
  11. print(df)    
  12. print("\n")    
  13.     
  14. print("Using head():","\n",df.head())    
  15. print("\n")    
  16. print("Using tail():","\n",df.tail())    
  17. print("\n")    
  18. print("Top 7 rows:","\n",df.head(7))  
OUTPUT
 
 
4. Iteration
 
It could be a scenario sometimes that you want to see each item of rows or columns separately. In these kinds of scenarios, we use iteration.
  • If you want to separate all the rows or you want to see items a every row separately then use ‘iterrows()’.
  • iterrows() would iterate the dataFrame row-wise.
  • Here each horizonalsubset is in the form- (row_index, columnNames_and_values)
    1. import pandas as pd      
    2.       
    3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }      
    4.       
    5. df=pd.DataFrame(dict,index=['0','1','2'])      
    6. print(df)      
    7. print("\n")      
    8.       
    9. for row,row_series in df.iterrows():    
    10.     print("Row Index:",row)    
    11.     print("Columns Names and Values:","\n",row_series,"\n")  
OUTPUT
 
  • If you want to separate all the columns or you want to see items of every column separately then use ‘iteritems()’.
  • iteritems() would iterate the dataFrame column-wise.
  • Here each verticalsubset is in the form- (column_index, rowNames_and_values)
    1. import pandas as pd      
    2.       
    3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }      
    4.       
    5. df=pd.DataFrame(dict,index=['0','1','2'])      
    6. print(df)      
    7. print("\n")      
    8.       
    9. for col,col_series in df.iteritems():    
    10.     print("Column Index:",col)    
    11. print("\n")    
    12. i=0    
    13. for val in col_series:    
    14.     print("At Row",i,":",val)    
    15.     i=i+1     
OUTPUT
 
 

SUMMARY

 
In this article, we covered a few operations- Binary Operations, Inspection Functions, Retrieve Head and Tail Rows and Iteration; Hence Operations on DataFrame-Part1
My next article will be Part 2 of the same topic and we will continue with more operations on dataFrames which will be Combining DataFrames and Aggregation Functions.
 
Feedback or queries related to this article are most welcome.
 
Thanks for reading!!


Similar Articles