pandas - Python Data Analysis Library

https://pandas.pydata.org/pandas-docs/stable/indexing.html

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
  1. A fast and efficient DataFrame object for data manipulation with integrated indexing;
  2. Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
  3. Intelligent data alignment and integrated handling of missing data: gain * automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
  4. Flexible reshaping and pivoting of data sets;
  5. Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;

Pandas資料結構

Series 欄位(一維度)

s = pd.Series(data, index=index)
#data: numpy ndarray
#index: list
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
#data from dict
d = {'b' : 1, 'a' : 0, 'c' : 2}
pd.Series(d)

DataFrame 表格(二維度)

s = pd.DataFrame(data, index=index, columns)
#data: numpy ndarray
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))
#data from a list of dicts
data2 = [{'a': 1, 'b': 2, 'c': 3}, {'a': 5, 'b': 10, 'c': 20}]
df=pd.DataFrame(data2)
df

Tutorials

http://pandas.pydata.org/pandas-docs/stable/tutorials.html https://github.com/jvns/pandas-cookbook

Reading data from a csv file

broken_df = pd.read_csv('../data/bikes.csv')
fixed_df = pd.read_csv('../data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')
# Look at the first 3 rows
broken_df[:3]

Select

# Single selections using iloc and DataFrame
# Rows:
data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.
data.iloc[1] # second row of data frame (Evan Zigomalas)
data.iloc[-1] # last row of data frame (Mi Richan)
# Columns:
data.iloc[:,0] # first column of data frame (first_name)
data.iloc[:,1] # second column of data frame (last_name)
data.iloc[:,-1] # last column of data frame (id)
# Multiple row and column selections using iloc and DataFrame
data.iloc[0:5] # first five rows of dataframe
data.iloc[:, 0:2] # first two columns of data frame with all rows
data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.
data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame (county -> phone1).

# Select rows with first name Antonio, # and all columns between 'city' and 'email'
data.loc[data['first_name'] == 'Antonio', 'city':'email']

# Select rows where the email column ends with 'hotmail.com', include all columns
data.loc[data['email'].str.endswith("hotmail.com")]

Panel 三維表格(除了特殊需求之外少使用… 略過)

Plotting a column

fixed_df['Berri 1'].plot()
fixed_df.plot(figsize=(15, 10))

Selecting data (columns, rows)

complaints[['Complaint Type', 'Borough']]
complaints[['Complaint Type', 'Borough']][:10]

.value_counts()

complaint_counts = complaints['Complaint Type'].value_counts()
complaint_counts[:10].plot(kind='bar')

pandas.date_range

pandas.DataFrame.loc

columns

newdf = df[df.columns[2:4]]

Pandas.dropna()

pandas裡面要使用onehot-encoding使用get_dummies()

results matching ""

    No results matching ""