Hands-on Tutorial on Python Data Processing Library Pandas

Pandas is a Python language package, which is used for data processing. It is intended to be a high-level building block for actual data analysis in Python. This article is an introductory tutorial to it.Pandas provides fast, flexible and expressive data structures with the goal of making the work of “relational” or “marking” data simple and intuitive

Pandas

Suitable for table data with heterogeneous columns

Ordered and unordered time series data
Matrix data with row and column labels
Any other form of observation/statistical data set
Currently, the latest version of pandas is v0.22.0

DataFrame

Create a 4×4 matrix through the NumPy interface to create a DataFrame.

Default index and column names are of the form [0, N-1].
You can specify the column name and index when creating the DataFrame, like this: # data_structure.py df1 = pd.DataFrame(np.arange(16).reshape(4,4)), print(“df1:\n{}\n”.format(df1)) The output of this code is as follows: df1: � – column1, column2, column3, column4
index=[1, 2, 3, 4, 5, 6, 7]

Conclusion

Pandas is a Python language package which is used for data processing

This is a very common basic programming library when we use Python language for machine learning programming
We recommend you to read the first pandas introductory tutorial here before starting to explore this

Ignoring Invalid Values

Pandas.DataFrame.dropnadiscard invalid values through functions

By default, the original data structure will not be changed.
If you want to change the data directly, you can pass arguments when you call this function inplace = True

Core data structure of Pandas

Series 1: an array of isomorphic types with labels

Replace Invalid Value

You can also fill and replace invalid values with valid ones by functions

For ease of operation, before filling, you can rename or modify the names of rows and columns by methods
Example: # process_na.py – rename(index={0: ‘index1’, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)

Series

One-dimensional structure of data

Generates data directly through an array like this: series1 = pd.Series([1, 2, 3, 4, 5, 6, 7], index=[“C”, “D”, “E”, “F”, “G”, “A”, “B”]]
data is output in the second column
index: RangeIndex(start=0, stop=4, step=1)

Handle Invalid Value

The real world is not perfect. If you do not deal with these invalid values, it will cause great disruption to the program.

There are two main methods to treat invalid values: directly ignore them or replace them with valid ones
Create a data structure that contains invalid values and use pandas.isnafunctions to confirm which values are invalid
df = pd.DataFrame(1, np.nan, 3, 4, 8, 9, 12, 15, 16)
print “df:\n{}\n”.format(df)); print(“df”: (pd.isna(df)));

Processing Strings

Data is often involved in the processing of strings, then pandas is used for string manipulation

The strfield contains a series of functions to process the string.
In the first set of data, we deliberately set some strings containing spaces:
Process_string.py
s1 = pd.Series([‘1’, ‘2’, ‘3’, ‘4’, ‘5’]); print(“s1.rstrip():\n{}\n”.format(s.str.lstrip()))
S1.isdigit():
All Along the Watchtower
stairway to heaven
eruption
freebird
comfortably numb
all along the tower

Conclusion

In this article, we covered the most basic operations in data processing using pandas. We hope that you understood the tutorial well and if you have any queries, please drop your comment in the below comment box. We will get back to you as soon as possible.

Index object and data access

The Pandas Index object contains metadata describing the axis.

When creating a Series or DataFrame, the array or sequence of tags is converted to Index. You can get the Index object of the DataFrame column and row in the following way:
DataFrame provides the following two operators to access the data: loc: Accessing Data Through Row and Column Indexes
Accessing data through row and column subscripts

Reading Excel files

Install the following library: xlrd

Library for developers to extract data from Microsoft Excel ™ spreadsheet files.
Homepage: http://www.python-excel.org/
Author: John Machin
License: BSD
Location: /Library/Frameworks/Python.framework/Versions/3.6.6/lib/python3.8
Requires:

Read CSV file

A large number of parameters are supported to adjust the read parameters, as shown in the following table: parameter Parameters

Path file path Sep or delimiter Field separator Header, Index_col, Skiprows, Skip_footer, etc.
Verbose Output various parsed output information
Encoding file encoding Squeeze If the parsed data contains only one column, one is returned
Series Thousands of Thousands of separators

Source

Hands-on Tutorial on Python Data Processing Library Pandas – Part 1

Pandas

DataFrame

Conclusion

Ignoring Invalid Values

Core data structure of Pandas

Replace Invalid Value

Series

Handle Invalid Value

Processing Strings

Conclusion

Index object and data access

Reading Excel files

Read CSV file

Written by:

MoAI Staff

Member discussion: