Pandas is a Python language package, which is used for data processing. It is intended to be a high-level building block for actual data analysis in Python. This article is an introductory tutorial to it.Pandas provides fast, flexible and expressive data structures with the goal of making the work of “relational” or “marking” data simple and intuitive

Pandas

Suitable for table data with heterogeneous columns

  • Ordered and unordered time series data
  • Matrix data with row and column labels
  • Any other form of observation/statistical data set
  • Currently, the latest version of pandas is v0.22.0

DataFrame

Create a 4×4 matrix through the NumPy interface to create a DataFrame.

  • Default index and column names are of the form [0, N-1].
  • You can specify the column name and index when creating the DataFrame, like this: # data_structure.py df1 = pd.DataFrame(np.arange(16).reshape(4,4)), print(“df1:\n{}\n”.format(df1)) The output of this code is as follows: df1: � – column1, column2, column3, column4
  • index=[1, 2, 3, 4, 5, 6, 7]

Conclusion

Pandas is a Python language package which is used for data processing

  • This is a very common basic programming library when we use Python language for machine learning programming
  • We recommend you to read the first pandas introductory tutorial here before starting to explore this

Ignoring Invalid Values

Pandas.DataFrame.dropnadiscard invalid values through functions

  • By default, the original data structure will not be changed.
  • If you want to change the data directly, you can pass arguments when you call this function inplace = True

Core data structure of Pandas

Series 1: an array of isomorphic types with labels

Replace Invalid Value

You can also fill and replace invalid values with valid ones by functions

  • For ease of operation, before filling, you can rename or modify the names of rows and columns by methods
  • Example: # process_na.py – rename(index={0: ‘index1’, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)

Series

One-dimensional structure of data

  • Generates data directly through an array like this: series1 = pd.Series([1, 2, 3, 4, 5, 6, 7], index=[“C”, “D”, “E”, “F”, “G”, “A”, “B”]]
  • data is output in the second column
  • index: RangeIndex(start=0, stop=4, step=1)

Handle Invalid Value

The real world is not perfect. If you do not deal with these invalid values, it will cause great disruption to the program.

  • There are two main methods to treat invalid values: directly ignore them or replace them with valid ones
  • Create a data structure that contains invalid values and use pandas.isnafunctions to confirm which values are invalid
  • df = pd.DataFrame(1, np.nan, 3, 4, 8, 9, 12, 15, 16)
  • print “df:\n{}\n”.format(df)); print(“df”: (pd.isna(df)));

Processing Strings

Data is often involved in the processing of strings, then pandas is used for string manipulation

  • The strfield contains a series of functions to process the string.
  • In the first set of data, we deliberately set some strings containing spaces:
  • Process_string.py
  • s1 = pd.Series([‘1’, ‘2’, ‘3’, ‘4’, ‘5’]); print(“s1.rstrip():\n{}\n”.format(s.str.lstrip()))
  • S1.isdigit():
  • All Along the Watchtower
  • stairway to heaven
  • eruption
  • freebird
  • comfortably numb
  • all along the tower

Conclusion

In this article, we covered the most basic operations in data processing using pandas. We hope that you understood the tutorial well and if you have any queries, please drop your comment in the below comment box. We will get back to you as soon as possible.

Index object and data access

The Pandas Index object contains metadata describing the axis.

  • When creating a Series or DataFrame, the array or sequence of tags is converted to Index. You can get the Index object of the DataFrame column and row in the following way:
  • DataFrame provides the following two operators to access the data: loc: Accessing Data Through Row and Column Indexes
  • Accessing data through row and column subscripts

Reading Excel files

Install the following library: xlrd

  • Library for developers to extract data from Microsoft Excel ™ spreadsheet files.
  • Homepage: http://www.python-excel.org/
  • Author: John Machin
  • License: BSD
  • Location: /Library/Frameworks/Python.framework/Versions/3.6.6/lib/python3.8
  • Requires:

Read CSV file

A large number of parameters are supported to adjust the read parameters, as shown in the following table: parameter Parameters

  • Path file path Sep or delimiter Field separator Header, Index_col, Skiprows, Skip_footer, etc.
  • Verbose Output various parsed output information
  • Encoding file encoding Squeeze If the parsed data contains only one column, one is returned
  • Series Thousands of Thousands of separators

Source