Create a dictionary of DataFrames using Pandas library in Python | Explained

When working with large data sets in Python, organizing the data into a dictionary of DataFrames is often helpful. it allows easy access and manipulation of DataFrames simultaneously, making data analysis tasks more efficient & streamlined.

Now, we will learn how to create a dictionary of DataFrames in Python. From initializing the dictionary to performing operations on the DataFrames within, we will provide a comprehensive guide to help you take full advantage of this valuable data structure. So let’s learn in-depth how the dictionary of DataFrames leverages in Python data analysis workflow.

What is a Dictionary?

A Python dictionary is an unordered collection of key-value pairs. Each key must be unique as well as immutable. Python dictionaries are created using curly braces ({}) & separating each key-value pair using a colon (:). The syntax for creating a dictionary is as follows:

Syntax

my_dict = {key1: value1, key2: value2, key3: value3}
  • my_dict” is the name of the Python dictionary variable.
  • key1“, “key2” & “key3” are the keys of the dictionary, which can be of any immutable data type such as strings, integers, or tuples.
  • value1“, “value2” & “value3” are the related values parallel to each key. The values could be of any data type such as strings, integers, lists, or dictionaries itself.

Example

my_dict = {"name": 'Wade Gilbert', "designation":  'Project Manager', "employeeid": 1456}

In the example above, “name“, “designation” & “employeeid” are the keys while Wade Gilbert, Project Manager, and 1456 are their respective values.

Here we have some distinctive features of Python dictionaries. That results in a better understanding of Python dictionaries.

  • A dictionary is mutable, which means we can add, remove, or modify its key-value pairs.
  • We can access the values associated with a key using the square bracket notation, e.g., my_dict[“name”] returns Wade Gilbert.
  • We can iterate over the keys, values, or key-value pairs of a dictionary using some built-in methods including keys()values(), or items().
  • Dictionaries are a major built-in data structure in Python that has eased the data retrieval process, such as counting occurrences of items, representing graph nodes and edges, or storing configuration settings.

What is a Dataframe

A DataFrame in Python is a two-dimensional labeled data structure commonly used for data analysis and manipulation. It is part of the Pandas library, a popular open-source data analysis and manipulation library for Python.

Syntax

import pandas as pd
df = pd.DataFrame(data, columns=[column1, column2, ...])
  • import pandas as alias pd in the program
  • invoking DataFrame() method from pandas module to initialize a DaraFrame on argument values and conditions.

Example

import pandas as pd
data = {'Name': ['John', 'Mary', 'Paul', 'Jane'],
        'Age': [24, 32, 19, 27],
        'Gender': ['Male', 'Female', 'Male', 'Female']}
df = pd.DataFrame(data, columns=['Name', 'Age', 'Gender'])
print(df)

Output

Create a dictionary of DataFrames
  • Line#1: The import statement imports the Pandas library and assigns it an alias pd for convenience.
  • Line#2-4: The data dictionary contains the data that will be used to create the DataFrame.
  • Line#5: The columns parameter is optional & allows to specify the order and names of the columns in the DataFrame.
  • Line#6: The print(df) statement outputs the DataFrame to the console.

Dictionary of DataFrames

A dictionary of DataFrames in Python is a collection of DataFrames organized and accessible by keys. It allows users to store and manipulate multiple data frames efficiently, making it a popular choice for data analysis & manipulation tasks.

Creating a Dictionary of DataFrames

To create a dictionary of DataFrames in Python, you can follow these steps:

  1. Initialize an empty dictionary: Start by creating an empty dictionary using the {} notation or the dict() function.
  2. Create DataFrames: Next, create the DataFrames that you want to store in the dictionary using the pd.DataFrame() function from Pandas library.
  3. Add the DataFrames to the dictionary: Add each DataFrame to the dictionary using a key-value pair, where the key is a string representing the name of the DataFrame, and the value is the DataFrame itself.
  4. Name the DataFrames: It’s a good practice to give each DataFrame a descriptive name that reflects its contents.

Here’s an example that shows how to create a dictionary of DataFrames in Python:

import pandas as pd
# initialize an empty dictionary
df_dict = {}
# create data frames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})
# add the data frames to the dictionary
df_dict['df1'] = df1
df_dict['df2'] = df2
# name the data frames
df1.name = 'Employee Data'
df2.name = 'Manager Data'
print(df1)
print(df2)
  • Line#1-6: In this example, we first import the pandas library and initialize an empty dictionary called df_dict. Then, we created two DataFrames (df1 & df2) using the pd.DataFrame() function from the Pandas library. 
  • Line#8, 9: Finally, we add the DataFrames to the dictionary using a key-value pair, where the key is a string representing the name of the DataFrame (‘df1’ & ‘df2’), and the value is the DataFrame itself.
  • Line#11,12: To name the DataFrames, we set the name attribute of each DataFrame to a descriptive string that reflects the contents of the DataFrame.

Once you have created the dictionary of DataFrames, you can use it to perform various data manipulation tasks, such as filtering, grouping, or merging, as well as accessing and modifying individual DataFrames.

Output

Create a dictionary of DataFrames

Accessing & Modifying DataFrames in a Dictionary

To access and modify DataFrames in a dictionary of DataFrames in Python, you can use the keys to access the specific DataFrame you want to work with. Once you have access to the DataFrame, you can use any of the built-in methods and functions of pandas to manipulate the data.

Here are some examples of how to access and modify DataFrames in a dictionary of DataFrames:

import pandas as pd
# create a dictionary of data frames
df_dict = {'df1': pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]}),
           'df2': pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})}
# print the data in df1
print(df_dict['df1'])
# modify the data in df2
df_dict['df2']['Age'] = [40, 45]
# add a new column to df1
df_dict['df1']['Salary'] = [50000, 60000]
# delete a column from df2
del df_dict['df2']['Name']
  • Line#1-6: We first created a dictionary of DataFrames with two DataFrames, df1, and df2. To access and print the data in df1, we simply use the key ‘df1’ to access the DataFrame and then use the print() function to display the data.
  • Line#8: Modify the data in df2, we use the key ‘df2’ to access the DataFrame and then use the standard pandas syntax to modify the values in the ‘Age’ column.
  • Line#10: Add a new column to df1 (1st DataFrame), we use the key ‘df1’ to access the DataFrame and then add a new column called ‘Salary’ by assigning a list of values.
  • Line#12: In order to delete a column from df2, we use the del keyword to delete the ‘Name’ column from the DataFrame.

Performing Operations on DataFrames in a Dictionary

Performing operations on data frames in a dictionary of data frames in Python is similar to performing operations on individual data frames. You can use the keys of the dictionary to access the specific data frame you want to work with and then use the built-in methods and functions of pandas to perform various data manipulation tasks.

Here are some examples of how to perform operations on data frames in a dictionary of data frames:

import pandas as pd
# create a dictionary of data frames
df_dict = {'df1': pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]}),
           'df2': pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})}
# concatenate the two data frames
df_concat = pd.concat(df_dict.values(), ignore_index=True)
# group the data by age and calculate the mean salary
df_salary = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Salary': [50000, 60000, 70000, 80000]})
df_grouped = df_concat.merge(df_salary).groupby('Age')['Salary'].mean()
# filter the data by age and name
df_filtered = df_concat[(df_concat['Age'] > 30) & (df_concat['Name'].str.contains('a'))]
  • Line#3: Created a dictionary of DataFrames with two DataFrames, df1 & df2. To concatenate the two DataFrames into a single DataFrame.
  • Line#6: Using the pd.concat() function and pass in the dictionary values using the values() method. We also set the ignore_index parameter to True to reset the index of the concatenated data frame.
  • Line#8, 9: To group the concatenated data by age and calculate the mean salary for each age group, we first create a separate data frame called df_salary that contains the salary data for each person. We then merge the salary data with the concatenated data frame using the merge() method and group the data by age using the groupby() method. Finally, we calculate the mean salary for each age group using the mean() method.
  • Line#11: To filter the concatenated data by age & name, we use the standard Pandas syntax to filter the data based on the criteria of age greater than 30 & name containing the letter ‘a‘.

Benefits of using a dictionary of data frames

Using a dictionary of data frames in Python has several benefits, including:

  1. Efficient organization and access to multiple data frames: With a dictionary of data frames, users can store and organize multiple data frames using keys, making it easy to access and manipulate them in a structured and organized way.
  2. Improved performance and memory management: By using a dictionary of data frames, users can avoid duplicating data frames in memory, leading to improved performance and memory management when working with large datasets.
  3. Flexibility in data manipulation: Using a dictionary of data frames allows users to apply various data manipulation techniques on each data frame, such as filtering, grouping, or merging, without affecting the other data frames in the dictionary.
  4. Ease of data analysis and reporting: A dictionary of data frames can simplify data analysis and reporting, making it easier to create visualizations and reports from multiple data frames.
  5. Code readability and maintainability: Using a dictionary of data frames can help improve the readability and maintainability of code. It provides a clear and organized structure for working with multiple data frames.

The Ending Lines

In conclusion, creating a dictionary of DataFrames in Python is an interesting technique for organizing & manipulating multiple DataFrames. By using keys to access and modify individual DataFrames, you can easily perform various data manipulation tasks and operations on the data frames as a whole.

We discussed the basics of a dictionary of DataFrames and provided code examples for creating one in Python. We also covered how to access and modify DataFrames in a dictionary and perform operations on DataFrames in a dictionary.

To recap, some key points to remember when working with a dictionary of DataFrames include:

  • Creating a dictionary of DataFrames using the pd.DataFrame() function and assigning them to keys using a dictionary comprehension or loop.
  • Accessing and modifying DataFrames in a dictionary using the keys of the dictionary.
  • Performing operations on DataFrames in a dictionary using standard pandas syntax.

Please enjoy this expertly crafted piece of writing contributed by the talented Shittu Olumide.

Stay in the Loop

Get the weekly email from Algoideas that makes reading the AI/ML stuff instructive. Join our mailing list to stay in the loop to stay informed, for free.

Latest stories

- Advertisement -

You might also like...