Pandas

background image

Pandas is a software library in Python that is used for data manipulation and analysis. It provides data structures and data analysis tools for handling and manipulating numerical tables and time series data.

One of the main data structures in Pandas is the DataFrame, which is a two-dimensional table of data with rows and columns. Pandas provides a variety of functions and methods for working with DataFrames, including reading data from files, cleaning and transforming data, and performing statistical analysis.

Pandas is widely used in the data science community for tasks such as preparing data for analysis, visualization, and machine learning. It is a powerful and flexible tool for working with structured data in Python.

Here is an example of how to use Pandas to read data from a CSV file and perform some basic data manipulation:

import pandas as pd

# Read the data from a CSV file
df = pd.read_csv('data.csv')

# Print the first 5 rows of the DataFrame
print(df.head())

# Select a specific column
price_column = df['Price']

# Calculate the mean of a column
mean_price = price_column.mean()
print(mean_price)

# Filter the DataFrame based on a condition
high_prices = df[df['Price'] > 100]
print(high_prices)

In this example, we use Pandas to read a CSV file into a DataFrame, which is a two-dimensional table of data. We use the head method to print the first 5 rows of the DataFrame, and the mean method to calculate the mean of a specific column. We also use boolean indexing to filter the DataFrame based on a condition.