Case Study : Data Visualization

Case Study : Data Visualization

2020, Nov 22    

In every project, the first step is to load data.
To read the CSV files we are the pandas function read_csv() :

Model: Read a CSV file with pandas

import pandas

df = pandas.read_csv('data.csv') 

View: Dropping unnecessary columns in a DataFrame

drop_columns = ['col1','col2']

df.drop(drop_columns, inplace=True, axis=1)

df = df.set_index('Identifier')

df.set_index('Identifier', inplace=True)

View: Changing the index of a DataFrame

df.get_dtype_counts()

regex = r'^(\d{4})'

extr = df['Date of Publication'].str.extract(r'^(\d{4})', expand=False)

df['Date of Publication'] = pd.to_numeric(extr)

View: Using .str() methods to clean columns

np.where(condition,
then, else)

pub = df['Place of Publication']

london = pub.str.contains('London')

london[:5]

View: Using the DataFrame.applymap() function to clean the entire dataset, element-wise

university_towns = []

with open('Datasets/university_towns.txt') as file:
    for line in file:
        if '[edit]' in line:
            # Remember this `state` until the next is found
            state = line
        else:
            # Otherwise, we have a city; keep `state` as last-seen
            university_towns.append((state, line))

university_towns[:5]

towns_df = pd.DataFrame(university_towns, columns=['State', 'RegionName'])

towns_df.head()
 
def get_citystate(item):
    if '(' in item:
        return item[:item.find('(')]
    elif '[' in item:
        return item[:item.find('[')]
    else:
        return item

towns_df = towns_df.applymap(get_citystate)

towns_df.head()

View: Renaming columns to a more recognizable set of labels & Skipping unnecessary rows in a CSV file

olympics_df = pd.read_csv('Datasets/olympics.csv', header=1)

new_names =  {'Unnamed: 0': 'Country',
              '? Summer': 'Summer Olympics',
              '01 !': 'Gold',
              '02 !': 'Silver',
              '03 !': 'Bronze',
              '? Winter': 'Winter Olympics',
              '01 !.1': 'Gold.1',
              '02 !.1': 'Silver.1',
              '03 !.1': 'Bronze.1',
              '? Games': '# Games',
              '01 !.2': 'Gold.2',
              '02 !.2': 'Silver.2',
              '03 !.2': 'Bronze.2'}

olympics_df.rename(columns=new_names, inplace=True)

Final thoughts

The demo project now has 97% test coverage, all thanks to the Clean Architecture’s “dependency rule” and segregation of the app on multiple layers.