Python is a versatile programming language that can significantly enhance your ability to work with Excel files. Using Python with Excel can greatly improve data analysis and manipulation.
In this tutorial, we will use Python with Excel to read from and write in spreadsheets using the libraries.
Most Used Libraries
Here are some common methods and libraries to work with Excel files in Python.
- Pandas: A powerful and most common library for data manipulation and analysis. It provides extensive functionality to read and write Excel files.
- OpenPyXL: This library can read and write earlier Excel files like 2010 xlsx/xlsm/xltx/xltm files.
- Pywin32: This library interacts with Excel through the COM interface to automate Excel tasks directly.
- Xlwings: You can use Python to interact with Excel macros using this library.
Prerequisites
There are some prerequisites to follow for using Python with Excel.
- Install Python on your system
- Use the pip command to install the required libraries
pip install pandas pip install pandas openpyxl pip install pywin32
- Need familiarity with basic Python programming
Using Python Pandas to Perform Basic Operations
Let’s show you how you can read and write Excel files using the pandas library.
Reading Data from an Excel File
You can read any Excel file by using the .read property of the panda library. In this example, we’ll read data from the “Sales” sheet of the “Using_Python_ with_ Excel.xlsx file” workbook.
import pandas as pd #Read an Excel file file_path = 'Using_Python_ with_ Excel.xlsx' df = pd.read_excel(file_path, sheet_name='Sales') #Display the DataFrame print(df.head())
Output:
Date Salesperson Region Product Units Sold Unit Price 2024-01-05 John Doe North Smartphone 15 400 2024-01-06 Jane Smith South Laptop 10 800 2024-01-07 John Doe West Refrigerator 5 500 2024-01-08 Sarah Johnson East Tablet 25 200 2024-01-09 Mark Wilson North Washing Machine 8 600
This reads the data from the Excel file into a DataFrame and prints the first five rows using .head() property.
Modify Data to Write in the Excel File
Now let’s modify the Price of Product, instead of directly overwriting the file, we will modify the DataFrame first. Then write the updated DataFrame back to the same Excel file. We will use the .to_excel() method to write the new price.
import pandas as pd #Load an Excel file file_path = 'Using_Python_ with_ Excel.xlsx' # Modify the price of Product df.loc[df['Product'] == 'Smartphone', 'Unit Price'] = 400 # Write the modified data back to the same Excel file df.to_excel(file_path, sheet_name='Sales', index=False) # Display the updated data print(df)
Output:
Date Salesperson Region Product Units Sold Unit Price 0 2024-01-05 John Doe North Smartphone 15 400 1 2024-01-06 Jane Smith South Laptop 10 800 2 2024-01-07 John Doe West Refrigerator 5 500 3 2024-01-08 Sarah Johnson East Tablet 25 200 4 2024-01-09 Mark Wilson North Washing Machine 8 600 5 2024-01-10 Jane Smith West Headphones 30 50 6 2024-01-11 Sarah Johnson East Air Conditioner 4 700 7 2024-01-12 John Doe South Television 11 450 8 2024-02-13 Mark Wilson West Microwave 23 150
Using OpenPyXL to Write to Another Sheet Without Overwriting
When using pandas to write to an Excel file, it will overwrite the entire file by default, including all existing sheets, unless you specify that you want to keep the other sheets. To write data to a new sheet without deleting the existing ones, you will need to use the ExcelWriter class with the openpyxl engine and the option mode=’a’ (append).
import pandas as pd from openpyxl import load_workbook #Load an Excel file file_path = 'Using_Python_ with_ Excel.xlsx' # Create a DataFrame data = { 'Customer Name': ['John Doe', 'Jane Smith', 'Sarah Johnson'], 'Age': [29, 31, 34], } df = pd.DataFrame(data) #Write the data to a new sheet without deleting the existing sales sheet with pd.ExcelWriter(file_path, engine='openpyxl', mode='a') as writer: df.to_excel(writer, sheet_name='Customer Info', index=False) #Display the DataFrame print(df)
Output:
Customer Name Age 0 John Doe 29 1 Jane Smith 31 2 Sarah Johnson 34
Note: If you don’t use the mode=’a’ option, pandas will overwrite the entire Excel file, removing all existing sheets except the one you’re currently writing to.
Automating Excel with Pywin32
If you want to automate Excel, you can use the Pywin32 library which allows you to control Excel directly from Python. You can use this library to automatically update monthly reports, apply consistent formatting, and calculate summaries.
import win32com.client as win32 # Start an instance of Excel excel = win32.Dispatch('Excel.Application') # Make Excel visible excel.Visible = True # Create a new workbook workbook = excel.Workbooks.Add() # Access the active sheet sheet = workbook.ActiveSheet # Write data in the sheet sheet.Cells(1, 1).Value = 'Products' sheet.Cells(1, 2).Value = 'Quantity' sheet.Cells(1, 3).Value = 'Unit Price' # Save the workbook workbook.SaveAs('automated_excel_file.xlsx') # Close and Quit Excel workbook.Close() excel.Application.Quit()
It creates an Excel workbook with the data. You also can insert new records into a growing Excel sheet without manually opening the file.
Conclusion
Using Python with Excel can streamline many data analysis tasks, from simple data manipulation to complex analysis and automation. You can create, read, modify, and automate Excel tasks using Python externally. In some cases, you won’t need to open an Excel file to perform calculations. Depending on your needs, you can choose the appropriate library and approach.
Get FREE Advanced Excel Exercises with Solutions!