Enterprise Analytics | Fixing Enterprise Case Research Assignments

[ad_1]

Introduction

Transport and logistics, meals and buying, funds, each day wants, enterprise, information and leisure, Gojek, an Indonesian agency does all of it via a cell app contributing greater than $7 billion to the economic system. It has 900K registered retailers, greater than 190M app downloads, and greater than 2M drivers delivering greater than 180K orders inside a span of 120 minutes. It’s a behemoth!  Utilizing enterprise analytics, we will likely be fixing a case examine. Beneath is the final of the 20+ providers it gives:

Business Analytics | Business case | Case study | Business
https://www.gojek.io/weblog/food-debarkation-tensoba
  • Transport and Logistics
    • Go-ride – Your two-wheeler taxi, the indigenous Ojek
    • Go-car – Consolation on wheels. Sit again. Sleep. Snore.
    • Go-send – Ship or get packages delivered inside hours.
    • Go-box – Shifting out? We’ll do the weights.
    • Go-bluebird – Journey unique with the Bluebird.
    • Go-transit -Your commute assistant, with or with out Gojek
  • Meals & Purchasing
    • Go-mall – Store from a web based market
    • Go-mart – Residence supply from close by shops
    • Go-med – Purchase medicines, nutritional vitamins, and many others from licensed pharmacies.
  • Funds
    • Go-pay – Drop the pockets and go cashless
    • Go-bills – Pay payments, fast and easily
    • Paylater – Order now pay later.
    • Go-pulsa – Knowledge or speak time, top-up on the go.
    • Go-sure – Insure belongings you worth.
    • Go-give – Donate for what issues, contact lives.
    • Go-investasi – Make investments good, save higher.
  • Day by day wants
    • GoFitness permits customers to entry workouts similar to yoga, pilates, pound match, barre, muay thai and Zumba.
  • Enterprise
    • Go-biz – A service provider #SuperApp to run and develop enterprise.
  • Information & Leisure
    • Go-tix – E-book your present, Skip the queue.
    • Go-play – App for films and sequence.
    • Go-games – Gaming ideas tendencies and many others
    • Go-news – Prime information from prime aggregators.

Knowledge generated via these providers is big and GO staff has engineering options to sort out with everyday information engineering points. Central Analytics and Science Crew(CAST) allows a number of merchandise inside the Gojek ecosystem to effectively use the abundance of information concerned within the working of the app. The staff has analysts, information scientists, information engineers, enterprise analysts, and determination scientists engaged on growing in-house deep analytics options and different ML methods.

GoJek central analysis of data science team | Business Analytics | Business case | Case study | Business
https://www.gojek.io/weblog/working-in-the-central-analytics-and-science-team

The analysts’ position is focused on fixing day-to-day enterprise issues, having good enterprise data, creating impression, deriving insights, RCA’s(root trigger evaluation), and conserving prime administration knowledgeable on micro in addition to macro metrics, and product choices to deal with enterprise issues.

Studying Goals

  • RCA on progress drivers and headwinds confronted by the organizations.
    • Utilizing Pandas for EDA, slicing, and dicing.
  • Advertising price range optimization
    • Earnings because the north star metric(L0 metric)
    • Utilizing Pulp solver to unravel LP.
    • Writing LP issues utilizing Pulps with clear and crisp directions.
  • Linear regression and cross-validation
    • Easy regression train utilizing the steps supplied within the questionnaire.

This text was revealed as part of the Knowledge Science Blogathon.

Drawback Assertion

Half I

GOJEK administrators have requested BI analysts to take a look at the information to grasp what occurred throughout Q1 2016 and what they need to do to maximise the income for Q2 2016.

  • Given the information in Drawback A, what are the principle issues that we have to concentrate on?
  • Given the information in Desk B, how will you maximize the revenue if we solely have a price range of IDR 40,000,000,000?
  • Current your findings and concrete options for a administration assembly.

Half II

  • Drawback Utilizing a number of linear regression, predict the total_cbv.
  • Create 1 mannequin for every service.
  • Forecast interval = 2016-03-30, 2016-03-31, and 2016-04-01
  • Prepare interval = the remainder Listing of predictors to make use of:
    • Day of month
    • Month
    • Day of week
    • Weekend/weekday flag (weekend = Saturday & Sunday)
  • Pre-processing (do it on this order):
    • Take away GO-TIX
    • Hold solely `Cancelled` order_status
    • Guarantee the whole mixtures (cartesian product) of date and repair are current
    • Impute lacking values with 0
    • Create is_weekend flag predictor (1 if Saturday/Sunday, 0 if different days)
    • One-hot encode month and day of week predictors
    • Standardize all predictors into z-scores utilizing the imply and commonplace deviation from train-period information solely
  • Analysis metric: MAPE Validation: 3-fold scheme. Every validation fold has the identical size because the forecast interval.
  • Query 1 – After all of the pre-processing steps, what’s the worth of all of the predictors for service = GO-FOOD, date = 2016-02-28?
  • Query 2 – Present the primary 6 rows of one-hot encoded variables (month and day of the week)
  • Query 3 – Print the primary 6 rows of the information after pre-processing for service = GO-KILAT. Kind ascendingly by date
  • Query 4 – Compute the forecast-period MAPE for every service. Show in ascending order primarily based on the MAPE
  • Query 5 – Create graphs to indicate the efficiency of every validation fold. One graph one service. x = date, y = total_cbv. Shade: black = precise total_cbv, different colours = the fold predictions (there ought to be 3 different colours). Solely present the validation interval. For instance, if rows 11, 12, and 13 had been used for validations, then don’t present the opposite rows within the graphs. Clearly present the month and date on the x-axis

Half III

Our GO-FOOD service in Surabaya carried out very properly final month – they’d 20% extra accomplished orders final month than the month earlier than. The supervisor of GO-FOOD in Surabaya must see what is occurring so as to consistently preserve this success for the subsequent month onwards.

  • What quantitative strategies would you employ to guage the sudden progress? How would you consider the shoppers’ conduct?

Dataset

The Resolution to Half One

Earlier than starting to unravel, begin researching blogs and whitepapers which are current on the corporate web site(hyperlinks are added beneath). Firm archives present helpful sources that act as guides and assist perceive what the corporate stands for or what the corporate is anticipating out of this position. Questions one and three could be thought-about open-ended issues. Query two is a straightforward train on regression, not essentially specializing in the very best mannequin, however the focus is on the processes concerned in constructing a mannequin.

RCA on Development Drivers and Headwinds Confronted by the Organizations

Import information:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
#import csv


sales_df =pd.read_csv('https://uncooked.githubusercontent.com/chrisdmell/Project_DataScience/working_branch/09_gojek/sales_data_all.csv')

print("Form of the df")
show(sales_df.form)

print("HEAD")
show(sales_df.head())

print("NULL CHECK")
show(sales_df.isnull().any().sum())

print("NULL CHECK")
show(sales_df.isnull().sum())

print("df INFO")
show(sales_df.information())

print("DESCRIBE")
show(sales_df.describe())
Business Analytics | Business case | Case study | Business

Create pandas datetime from object format. Pandas datetimes is a straightforward format to work with and manipulate dates. Derive the month column from datetime. Filter out month 4(April) as properly. Rename months as Jan, Feb, March.

## convert thus far time 
# convert order_status to strinf
##

time_to_pandas_time = ["date"]

for cols in time_to_pandas_time:
  sales_df[cols] = pd.to_datetime(sales_df[cols])

sales_df.dtypes

sales_df['Month'] = sales_df['date'].dt.month 
sales_df.head()

sales_df['Month'].drop_duplicates()

sales_df[sales_df['Month'] !=4]

Q1_2016_df = sales_df[sales_df['Month'] !=4]

Q1_2016_df['Month'] = np.the place(Q1_2016_df['Month'] == 1,"Jan",np.the place(Q1_2016_df['Month'] == 2,"Feb",np.the place(Q1_2016_df['Month'] == 3,"Mar","Apr")))

print(Q1_2016_df.head(1))

show(Q1_2016_df.order_status.distinctive())

show(Q1_2016_df.service.distinctive())
#import csv
"
"

On the group degree, total income has grown by 14%. This can be a optimistic final result. Let’s break this down by varied providers and determine providers which are performing properly.

revenue_total.sort_values(["Jan"], ascending=[False],inplace=True)

revenue_total.head()

revenue_total['cummul1'] = revenue_total["Jan"].cumsum()
revenue_total['cummul2'] = revenue_total["Feb"].cumsum()
revenue_total['cummul3'] = revenue_total["Mar"].cumsum()

top_95_revenue = revenue_total[revenue_total["cummul3"]<=95 ] 

show(top_95_revenue)
ninety_five_perc_gmv = listing(top_95_revenue.service.distinctive())
print(ninety_five_perc_gmv)

top_95_revenue_plot = top_95_revenue[["Jan", "Feb", "Mar"]]
top_95_revenue_plot.index = top_95_revenue.service
top_95_revenue_plot.T.plot.line(figsize=(5,3))

## share of income is modified however has the general income modified for these prime 4 providers#import csv
revenue change | Business Analytics | Business case | Case study | Business
"
  • For all three months, Journey, Meals, Store, and Ship contribute to greater than 90% web income share.(In Jan Journey contributed to 51% of web income.)
  • Therefore following the 80:20 rule for the newest month, we will limit this evaluation to the highest 3 providers, particularly – Journey, Meals, Ship.
  • Out of the 11 obtainable providers, solely 3 contribute to greater than 90% of income. This can be a reason for concern and there’s immense alternative for the remainder of the providers to develop.

Accomplished Rides

## NET - accomplished rides
Q1_2016_df_pivot_cbv_4 = Q1_2016_df[Q1_2016_df["order_status"] == "Accomplished"]
Q1_2016_df_pivot_cbv_4 = Q1_2016_df_pivot_cbv_4[Q1_2016_df_pivot_cbv_4.service.isin(ninety_five_perc_gmv)]

Q1_2016_df_pivot_cbv = Q1_2016_df_pivot_cbv_4.pivot_table(index='service', columns=['Month' ], values="total_cbv", aggfunc="sum")
# show(Q1_2016_df_pivot_cbv.head())
Q1_2016_df_pivot_cbv = Q1_2016_df_pivot_cbv[["Jan",	"Feb",		"Mar"]]

for cols in Q1_2016_df_pivot_cbv.columns:
  Q1_2016_df_pivot_cbv[cols]=(Q1_2016_df_pivot_cbv[cols]/1000000000)

show(Q1_2016_df_pivot_cbv)

show(Q1_2016_df_pivot_cbv.T.plot())

## We see that go store as decreased its income however others the income is fixed. 

Q1_2016_df_pivot_cbv_4 = Q1_2016_df_pivot_cbv
Q1_2016_df_pivot_cbv_4.reset_index(inplace = True)

Q1_2016_df_pivot_cbv_4["Feb_jan_growth"] = (Q1_2016_df_pivot_cbv_4.Feb / Q1_2016_df_pivot_cbv_4.Jan -1)*100
Q1_2016_df_pivot_cbv_4["Mar_Feb_growth"] = (Q1_2016_df_pivot_cbv_4.Mar / Q1_2016_df_pivot_cbv_4.Feb -1)*100

show(Q1_2016_df_pivot_cbv_4)#import csv
Gojek completed rides | Business Analytics | Business case | Case study | Business
Gojek services | Business Analytics | Business case | Case study | Business
"
  • Journey – which is the revenue-driving engine has grown by 19%(Jan to March) in comparison with Ship which has grown by 25%.
  • Meals has degrown by 7%, given meals supply as a enterprise is rising across the globe, and it is a main reason for concern.

Canceled Rides(Misplaced Alternative)

Q1_2016_df_pivot_cbv = Q1_2016_df[Q1_2016_df["order_status"] != "Accomplished"]
Q1_2016_df_pivot_cbv = Q1_2016_df_pivot_cbv.pivot_table(index='service', columns=['Month' ], values="total_cbv", aggfunc="sum")
Q1_2016_df_pivot_cbv = Q1_2016_df_pivot_cbv[["Jan",	"Feb",		"Mar"]]

revenue_total = pd.DataFrame()

for cols in Q1_2016_df_pivot_cbv.columns:
  revenue_total[cols]=(Q1_2016_df_pivot_cbv[cols]/Q1_2016_df_pivot_cbv[cols].sum())*100

revenue_total.reset_index(inplace = True)
show(revenue_total.head())

overall_cbv =     Q1_2016_df_pivot_cbv.sum()
print(overall_cbv)
overall_cbv.plot()
plt.present()


overall_cbv =     Q1_2016_df_pivot_cbv.sum()
overall_cbv_df = pd.DataFrame(information = overall_cbv).T
show(overall_cbv_df)

overall_cbv_df["Feb_jan_growth"] = (overall_cbv_df.Feb / overall_cbv_df.Jan -1)*100
overall_cbv_df["Mar_Feb_growth"] = (overall_cbv_df.Mar / overall_cbv_df.Feb -1)*100

show(overall_cbv_df)

revenue_total.sort_values(["Jan"], ascending=[False],inplace=True)

revenue_total.head()

revenue_total['cummul1'] = revenue_total["Jan"].cumsum()
revenue_total['cummul2'] = revenue_total["Feb"].cumsum()
revenue_total['cummul3'] = revenue_total["Mar"].cumsum()

top_95_revenue = revenue_total[revenue_total["cummul3"]<=95 ] 

show(top_95_revenue)
ninety_five_perc_gmv = listing(top_95_revenue.service.distinctive())
print(ninety_five_perc_gmv)
Cancelled rides | Business Analytics | Business case | Case study | Business
"
  • Misplaced income has grown by 6%.
  • Administrators can enhance their efforts to scale back this to lower than 5%.

Evaluation of Orders


Q1_2016_df_can_com = Q1_2016_df[Q1_2016_df.order_status.isin(["Cancelled", "Completed"])]
Q1_2016_df_can_com = Q1_2016_df_can_com[Q1_2016_df_can_com.service.isin(ninety_five_perc_gmv)]


Q1_2016_df_pivot = Q1_2016_df_can_com.pivot_table(index='service', columns=['order_status','Month' ], values="num_orders", aggfunc="sum")
Q1_2016_df_pivot.fillna(0, inplace = True)

multi_tuples =[
                ('Cancelled', 'Jan'),
               ('Cancelled', 'Feb'),
            ('Cancelled', 'Mar'),
            ('Completed', 'Jan'),
            ('Completed', 'Feb'),
            ('Completed', 'Mar')]

multi_cols = pd.MultiIndex.from_tuples(multi_tuples, names=['Experiment', 'Lead Time'])

Q1_2016_df_pivot = pd.DataFrame(Q1_2016_df_pivot, columns=multi_cols)


show(Q1_2016_df_pivot.columns)
show(Q1_2016_df_pivot.head(3))

Q1_2016_df_pivot.columns = ['_'.join(col) for col in Q1_2016_df_pivot.columns.values]

show(Q1_2016_df_pivot)
#import csv

Q1_2016_df_pivot["jan_total"] = Q1_2016_df_pivot.Cancelled_Jan  + Q1_2016_df_pivot.Completed_Jan
Q1_2016_df_pivot["feb_total"] = Q1_2016_df_pivot.Cancelled_Feb  + Q1_2016_df_pivot.Completed_Feb
Q1_2016_df_pivot["mar_total"] = Q1_2016_df_pivot.Cancelled_Mar  + Q1_2016_df_pivot.Completed_Mar

Q1_2016_df_pivot[ "Cancelled_Jan_ratio"	]    =Q1_2016_df_pivot.Cancelled_Jan/Q1_2016_df_pivot.jan_total
Q1_2016_df_pivot[	"Cancelled_Feb_ratio"	]=Q1_2016_df_pivot.Cancelled_Feb/Q1_2016_df_pivot.feb_total
Q1_2016_df_pivot[	"Cancelled_Mar_ratio"	]=Q1_2016_df_pivot.Cancelled_Mar/Q1_2016_df_pivot.mar_total
Q1_2016_df_pivot[	"Completed_Jan_ratio"	]=Q1_2016_df_pivot.Completed_Jan/Q1_2016_df_pivot.jan_total
Q1_2016_df_pivot[	"Completed_Feb_ratio"	]=Q1_2016_df_pivot.Completed_Feb/Q1_2016_df_pivot.feb_total
Q1_2016_df_pivot[	"Completed_Mar_ratio" ]  =Q1_2016_df_pivot.Completed_Mar/Q1_2016_df_pivot.mar_total

Q1_2016_df_pivot_1 = Q1_2016_df_pivot[["Cancelled_Jan_ratio"
,"Cancelled_Feb_ratio"
,"Cancelled_Mar_ratio"
,"Completed_Jan_ratio"
,"Completed_Feb_ratio"
,"Completed_Mar_ratio"]]

Q1_2016_df_pivot_1
Analysis of order | Business Analytics | Business case | Case study | Business
  • In March, Meals, Journey, Ship had 17%,15%, and 13% of whole orders canceled respectively.
  • Meals has elevated its order completion price, from 69% in January to 83% in March. This can be a vital enchancment.
## column smart cancellation verify if elevated
perc_of_cols_orders = pd.DataFrame()

for cols in Q1_2016_df_pivot.columns:
  perc_of_cols_orders[cols]=(Q1_2016_df_pivot[cols]/Q1_2016_df_pivot[cols].sum())*100
  
perc_of_cols_orders

perc_of_cols_cbv.T.plot(form='bar', stacked=True)
perc_of_cols_orders.T.plot(form='bar', stacked=True)
"
"
  • In March, of all of the rides canceled, Journey has 72% share of orders, adopted by Meals(17%) and ship(6%).

Abstract of Findings and Suggestions for Enterprise Analytics

"
  • Journey –
    • The highest contributor to income.
    • Cancellation(GMV) in March has grown by 42%
    • Cut back cancelations via product intervention and new product options.
  • Meals –
    • Canceled orders have elevated, however because of price optimization, GMV loss has been efficiently arrested.
    • Enhance web income by decreasing prices and cancellations.
    • Drive increased buyer acquisition.
  • Ship –
    • Canceled GMV and orders, each have taken a success and are a significant reason for concern.
    • Good experience completion expertise, thus, growing retention and powering income progress via retention.

Maximize Earnings By Optimizing Price range Spends

The Enterprise staff has a price range of 40 Billon for Q2 and it has set progress targets for every service. For every service, the price of incremental 100 rides and the utmost progress goal in Q2 is given beneath.  For Go-Field, to get 100 extra bookings, it prices 40M, and the utmost progress goal in Q2 is 7%.

Maximise the profits | Business Analytics | Business case | Case study | Business

Import price range information and use gross sales information from the above evaluation.

budget_df =pd.read_csv('https://uncooked.githubusercontent.com/chrisdmell/Project_DataScience/working_branch/09_gojek/optimization_budge.csv')

print("Form of the df")
show(budget_df.form)

print("HEAD")
show(budget_df.head())

print("NULL CHECK")
show(budget_df.isnull().any().sum())

print("NULL CHECK")
show(budget_df.isnull().sum())

print("df INFO")
show(budget_df.information())

print("DESCRIBE")
show(budget_df.describe())


## convert thus far time 
# convert order_status to string
##

time_to_pandas_time = ["date"]

for cols in time_to_pandas_time:
  sales_df[cols] = pd.to_datetime(sales_df[cols])

sales_df.dtypes

sales_df['Month'] = sales_df['date'].dt.month 
sales_df.head()

sales_df['Month'].drop_duplicates()

sales_df_q1 = sales_df[sales_df['Month'] !=4]
## Assumptions
sales_df_q1 = sales_df_q1[sales_df_q1["order_status"] == "Accomplished"]

# Q1_2016_df_pivot = Q1_2016_df.pivot_table(index='service', columns=['order_status','Month' ], values="num_orders", aggfunc="sum")

sales_df_q1_pivot = sales_df_q1.pivot_table(index='service', columns=['order_status'], values="total_cbv", aggfunc="sum")
sales_df_q1_pivot_orders = sales_df_q1.pivot_table(index='service', columns=['order_status'], values="num_orders", aggfunc="sum")

sales_df_q1_pivot.reset_index(inplace = True)
sales_df_q1_pivot.columns = ["Service","Q1_revenue_completed"]
sales_df_q1_pivot

sales_df_q1_pivot_orders.reset_index(inplace = True)
sales_df_q1_pivot_orders.columns = ["Service","Q1_order_completed"]

optimization_Df = pd.merge(
    sales_df_q1_pivot,
    budget_df,
    how="left",
    on="Service",

)

optimization_Df = pd.merge(
    optimization_Df,
    sales_df_q1_pivot_orders,
    how="left",
    on="Service",

)

optimization_Df.columns = ["Service",	"Q1_revenue_completed",	"Cost_per_100_inc_booking",	"max_q2_growth_rate","Q1_order_completed"]
optimization_Df.head(5)
#import csv
Business Analytics | Business case | Case study | Business
  • For Field, Q1 income is 23B, the associated fee for incremental 100 rides is 40M, its most anticipated progress price is 7% and 63K whole rides had been accomplished @ 370K per order.

Is it potential to realize the utmost progress price for all of the providers with an obtainable price range of 40B?

## If all service max progress is to be achived what's the price range wanted? and whats the deficiet?
optimization_Df["max_q2_growth_rate_upd"]   = optimization_Df['max_q2_growth_rate'].str.extract('(d+)').astype(int)                 ## extract int from string
optimization_Df["max_growth_q2_cbv"]        = (optimization_Df.Q1_order_completed *(1+ optimization_Df.max_q2_growth_rate_upd/100))  ## Q2 max orders primarily based on Q1 orders
optimization_Df["abs_inc_orders"]           = optimization_Df.max_growth_q2_cbv-optimization_Df.Q1_order_completed                   ## Whole enhance in orders 
optimization_Df["cost_of_max_inc_q2_order"] = optimization_Df.abs_inc_orders * optimization_Df.Cost_per_100_inc_booking /100         ## Whole Price to get most progress for every serivce

show(optimization_Df)

show(budget_df[budget_df["Service"] == "Price range:"].reset_index())
budget_max = budget_df[budget_df["Service"] == "Price range:"].reset_index()
budget_max = budget_max.iloc[:,2:3].values[0][0]
print("Price range distinction by")
show(budget_max-optimization_Df.cost_of_max_inc_q2_order.sum() )

## Due to this fact max of the the whole lot can't be achieved#import csv

The reply is No. 247B(247,244,617,204) extra price range is required to realize progress targets for all providers.

Is it potential to realize at the least 10% of the utmost progress price for all of the providers with an obtainable price range of 40B?

## Then what's the price range wanted and what is going to the additional price range at hand??
optimization_Df["min_10_max_growth_q2_cbv"]        = (optimization_Df.Q1_order_completed *(1+ optimization_Df.max_q2_growth_rate_upd/1000))  ## atleast 10% of max if achieved, that is orders 
optimization_Df["min_10_abs_inc_orders"]           = optimization_Df.min_10_max_growth_q2_cbv-optimization_Df.Q1_order_completed             ## what's the enhance in orders wanted to realize 10% orders progress
optimization_Df["min_10_cost_of_max_inc_q2_order"] = optimization_Df.min_10_abs_inc_orders * optimization_Df.Cost_per_100_inc_booking /100   ## Price associatedfor 10% enhance in orders 

show(budget_max-optimization_Df.min_10_cost_of_max_inc_q2_order.sum() )  ## Whole price range remaining

show((budget_max-optimization_Df.min_10_cost_of_max_inc_q2_order.sum())/budget_max)  ## Price range utilization proportion 


optimization_Df["perc_min_10_max_growth_q2_cbv"] =( ( optimization_Df.max_q2_growth_rate_upd/1000))  ## atleast 10% of max if achieved, 7 to p.c divide by 100, 10% of this quantity. divide by 10, so 1000
optimization_Df["perc_max_growth_q2_cbv"]        =( ( optimization_Df.max_q2_growth_rate_upd/100))   ## Max progress to be achieved

optimization_Df["q1_aov"] = optimization_Df.Q1_revenue_completed/optimization_Df.Q1_order_completed  ## Q1 common order worth
optimization_Df["order_profitability"] = 0.1  ## that is assumption that 10% will likely be revenue

optimization_Df["a_orders_Q2"]        = (optimization_Df.Q1_order_completed *(1+ optimization_Df.perc_min_10_max_growth_q2_cbv))  ## primarily based on 10% progress, whole new orders for qc

optimization_Df["a_abs_inc_orders"]   = optimization_Df.a_orders_Q2-optimization_Df.Q1_order_completed

optimization_Df["a_Q2_costs"] = optimization_Df.Cost_per_100_inc_booking* optimization_Df.a_abs_inc_orders/100

##There's scope for enchancment right here, so This may be adjusted primarily based on income or rating from Q1
show(budget_max - optimization_Df.a_Q2_costs.sum())

optimization_Df#import csv

The reply is Sure. With solely 28% of the obtainable 40B price range, this may be achieved. Underutilization of the obtainable price range is rarely an choice, and no enterprise chief would use solely 28% of the obtainable price range.

So, the utmost progress throughout all providers can’t be achieved, and attaining 10% of the utmost progress price will result in an underutilized price range. Therefore the necessity right here is to optimize spending such that:

  • The general money burn doesn’t cross 40B.
  • The general progress price in Q2 throughout providers is the same as or beneath the utmost progress price.
  • There are known as constraints in Linear optimization.
  • The target is to Maximize earnings.

Assumptions used right here:

  • Each service has a revenue of 10%.
  • AOV(income/orders) will stay the identical as in Q1.

Pre-optimization information pipeline:

## Knowledge prep for pulp optimization
perc_all_df = pd.DataFrame(information = listing(vary(1,optimization_Df.max_q2_growth_rate_upd.max()+1)), columns = ["growth_perc"])  
## create an inventory of all proportion progress, from 1 to max to progress anticipated, that is to create simulation for optimization
show(perc_all_df.head(1))

optimization_Df_2 = optimization_Df.merge(perc_all_df, how = "cross")  ## cross be part of with opti DF

## Filter and conserving all percentgaes upto most for every service
## Minimal proportion saved is 1
optimization_Df_2["filter_flag"] = np.the place(optimization_Df_2.max_q2_growth_rate_upd >= (optimization_Df_2.growth_perc),1,0)
optimization_Df_2["abs_profit"] =  (optimization_Df_2.q1_aov)*(optimization_Df_2.order_profitability)
optimization_Df_3 = optimization_Df_2[optimization_Df_2["filter_flag"] == 1]


show(optimization_Df_3.head(1))
show(optimization_Df_3.columns)

## Filter columns wanted
optimization_Df_4 = optimization_Df_3[[
    'Service',                       ## services offered
    'Cost_per_100_inc_booking',      ## cost of additional 100 orders
    'Q1_order_completed',            ## to calculate q2 growth based on q1 orders
    'perc_min_10_max_growth_q2_cbv', ## minimum growth percent need 
    'perc_max_growth_q2_cbv',        ## max growth percent allowed
    'abs_profit',                    ## profit per order 
    'growth_perc'                    ## to simulative growth percet across
    ]]

show(optimization_Df_4.head(2))


optimization_Df_4["orders_Q2"]        = (optimization_Df_4.Q1_order_completed *(1+ optimization_Df_4.growth_perc/100))  ## primarily based on progress, whole new orders for qc
optimization_Df_4["abs_inc_orders"]   = optimization_Df_4.orders_Q2-optimization_Df_4.Q1_order_completed
optimization_Df_4["profit_Q2_cbv"]    = optimization_Df_4.orders_Q2 * optimization_Df_4.abs_profit
optimization_Df_4["growth_perc"]      = optimization_Df_4.growth_perc/100
optimization_Df_4["Q2_costs"]         = optimization_Df_4.Cost_per_100_inc_booking* optimization_Df_4.abs_inc_orders/100

show(optimization_Df_4.head())

optimization_Df_5 = optimization_Df_4[[
    'Service',                       ## services offered
    'Q2_costs',                      ## cost total for the growth expected
    'perc_min_10_max_growth_q2_cbv', ## minimum growth percent need 
    'perc_max_growth_q2_cbv',        ## max growth percent allowed
    'profit_Q2_cbv',                 ## total profit at the  assumed order_profitability rate
    'growth_perc'                    ## to simulative growth percet across
    ]]

optimization_Df_5


show(optimization_Df_5.head(10))
show(optimization_Df_5.form)
"

Understanding the Optimization Dataset

  • Service – Go product.
  • 10% of max progress, is the minimal progress that every service ought to obtain. So Field ought to at the least obtain 0.7% progress.
  • Max progress determined by enterprise leaders for Field is 7%.
  • For Field, 1% to 7% is the vary of progress.1% is greater than 0.7% and seven% is the utmost. The optimizer will select the very best progress price primarily based on constraints.
    • This can be a determination variable. The algorithm will decide one amongst 7.
  • For 1% progress(Incremental), the money burn is 255M.
  • If incremental progress is 1%, then total revenue(natural + inorganic) is 2.4B.
## Greatest optimization for our case case. That is good. 

prob = LpProblem("growth_maximize", LpMaximize)    ## Initialize optimization drawback - Maximization drawback

optimization_Df_5.reset_index(inplace = True, drop = True)   
markdowns = listing(optimization_Df_5['growth_perc'].distinctive())   ## Listing of all progress percentages
cost_v    = listing(optimization_Df_5['Q2_costs'])               ## Listing of all incremental price to realize the expansion % wanted

perc_min_10_max_growth_q2_cbv = listing(optimization_Df_5['perc_min_10_max_growth_q2_cbv'])
growth_perc                   = listing(optimization_Df_5['growth_perc'])

## lp variables
low     = LpVariable.dicts("l_", perc_min_10_max_growth_q2_cbv, lowBound = 0, cat = "Steady")
progress  = LpVariable.dicts("g_", growth_perc, lowBound = 0, cat = "Steady")
delta   = LpVariable.dicts ("d", markdowns, 0, 1, LpBinary)
x       = LpVariable.dicts ("x", vary(0, len(optimization_Df_5)), 0, 1, LpBinary)

## goal operate - Maximise revenue, column title - profit_Q2_cbv
## Assign worth for every of the rows -
## For all rows within the desk every row will likely be assidned x_0, x_1, x_2 and many others and many others
## That is later used to filter the optimum progress p.c 
prob += lpSum(x[i] * optimization_Df_5.loc[i, 'profit_Q2_cbv'] for i in vary(0, len(optimization_Df_5)))

## one distinctive progress percentahe for every service
## Constraint one 
for i in optimization_Df_5['Service'].distinctive():
                prob += lpSum([x[idx] for idx in optimization_Df_5[(optimization_Df_5['Service'] == i) ].index]) == 1

## Don't cross whole price range
## Constraint two
prob += (lpSum(x[i] * optimization_Df_5.loc[i, 'Q2_costs'] for i in vary(0, len(optimization_Df_5))) - budget_max) <= 0 

## constraint to say minimal ought to be achived
for i in vary(0, len(optimization_Df_5)):
  prob +=  lpSum(x[i] * optimization_Df_5.loc[i, 'growth_perc'] ) >=  lpSum(x[i] * optimization_Df_5.loc[i, 'perc_min_10_max_growth_q2_cbv'] )


prob.writeLP('markdown_problem')     ## Write Drawback title
prob.resolve()                         ## Clear up Drawback
show(LpStatus[prob.status])       ## Drawback standing - Optimum, if drawback solved efficiently
show(worth(prob.goal))       ## Goal, on this case what's the maximized revenue with availble price range  - 98731060158.842 @ 10% revenue per order #import csv
print(prob)
print(progress)

Understanding Write An LP Drawback is Key to Fixing it

  • Initialize the issue
    • prob = LpProblem(“growth_maximize”, LpMaximize)
    • growth_maximize is the title of the issue.
    • LpMaximize is letting the solver know that it’s a maximization drawback.
  • Create a variable of the choice operate
    • progress = LpVariable.dicts(“g_”, growth_perc, lowBound = 0, cat = “Steady”)
    • For Pulp, pulp dicts must be created
    • g_ is the prefix for the variable.
    • growth_perc is the title of the listing
    • low certain is the minimal progress p.c, it could actually begin from 0.
    • The variable is steady.
    • There are 60 distinctive progress percentages from 1%(minimal) to 60%(most). (Meals has a 60% most progress price).
    • Variables – 0 <= x_0 <= 1 Integer for row 0 to 0 <= x_279 <= 1 Integer for row 279.
  • Add goal operate to the issue
    • prob += lpSum(x[i] * optimization_Df_5.loc[i, ‘profit_Q2_cbv’] for i in vary(0, len(optimization_Df_5)))
    • An equation is created by pulp -> 2423147615.954*x_0 + 2447139176.5080004*x_1 + 225916468.96*x_3+ …. + 8576395.965000002*x_279. There are 280 rows within the dataset, so for every revenue worth, a variable is created.
  • Add constraint:
    • One – One progress proportion for every service
      • for i in optimization_Df_5[‘Service’].distinctive(): prob += lpSum([x[idx] for idx in optimization_Df_5[(optimization_Df_5[‘Service’] == i) ].index]) == 1
      • For every service, solely choose one progress p.c.
      • For Field out of 1 to 7 choose just one.
      • The equation for field – _C1: x_0 + x_1 + x_2 + x_3 + x_4 + x_5 + x_6 = 1
      • The equation for GLAM – _C2: x_10 + x_11 + x_12 + x_13 + x_14 + x_15 + x_16 + x_7 + x_8 + x_9 = 1
      • As there are 11 providers so 11 constraints are created, one for every service.
    • Two – Don’t cross the overall price range of 40B
      • prob += (lpSum(x[i] * optimization_Df_5.loc[i, ‘Q2_costs’] for i inrange(0, len(optimization_Df_5))) – budget_max) <= 0
      • The sum of all prices minus the overall price range ought to be lower than or equal to zero.
      • Equation _C12: 255040000 x_0 + 510080000 x_1 + …. + 16604 x_279 <= 0
      • _C12: is the one constraint right here as a result of, there’s one whole price range of 40B, and there’s no constraint on how a lot every service can spend.
    • Three – constraint to say minimal ought to be achieved
      • for i in vary(0, len(optimization_Df_5)): prob += lpSum(x[i] * optimization_Df_5.loc[i, ‘growth_perc’] ) >= lpSum(x[i] * optimization_Df_5.loc[i, ‘perc_min_10_max_growth_q2_cbv’] )
      • For every row, the minimal progress p.c constraint equation is created. There are 279 rows, so 279 constraints are created.
      • _C13: 0.003 x_0 >= 0 from row 0 to _C292: 0.315 x_279 >= 0 to row 279.
    • Optimum‘” is the specified output.
      • show(LpStatus[prob.status])
    • 98731060158.842 is the maximized revenue.
      • show(worth(prob.goal))
var_name = []
var_values = []
for variable in prob.variables():
    if 'x' in variable.title:
        var_name.append(variable.title)
        var_values.append(variable.varValue)

outcomes = pd.DataFrame()

outcomes['variable_name'] = var_name
outcomes['variable_values'] = var_values
outcomes['variable_name_1'] = outcomes['variable_name'].apply(lambda x: x.cut up('_')[0])
outcomes['variable_name_2'] = outcomes['variable_name'].apply(lambda x: x.cut up('_')[1])
outcomes['variable_name_2'] = outcomes['variable_name_2'].astype(int)
outcomes.sort_values(by='variable_name_2', inplace=True)
outcomes.drop(columns=['variable_name_1', 'variable_name_2'], inplace=True)
outcomes.reset_index(inplace=True)
outcomes.drop(columns="index", axis=1, inplace=True)

# outcomes.head()


optimization_Df_5['variable_name'] = outcomes['variable_name'].copy()
optimization_Df_5['variable_values'] = outcomes['variable_values'].copy()
optimization_Df_5['variable_values'] = optimization_Df_5['variable_values'].astype(int)# optimization_Df_6.head()

#import csv## with no price range contraint
optimization_Df_10 = optimization_Df_5[optimization_Df_5['variable_values'] == 1].reset_index()

optimization_Df_10["flag"] = np.the place(optimization_Df_10.growth_perc >= optimization_Df_10.perc_min_10_max_growth_q2_cbv,1,0)

show(optimization_Df_10)

show(budget_max - optimization_Df_10.Q2_costs.sum())
show( optimization_Df_10.Q2_costs.sum())
"
  • The utmost progress price for respective providers is within the chart above. For Field it’s 1%, for Clear it’s 1%, for Meals it’s 17%, and many others.
  • The overall money burn is – 39999532404.0
  • Underutilized price range – 467596.0
  • Maximized revenue – 98731060158.0

The Resolution to Half Two

sales_df =pd.read_csv('https://uncooked.githubusercontent.com/chrisdmell/Project_DataScience/working_branch/09_gojek/model_analytics__data.csv')

time_to_pandas_time = ["date"]

for cols in time_to_pandas_time:
  sales_df[cols] = pd.to_datetime(sales_df[cols])


sales_df['Month'] = sales_df['date'].dt.month 

Q1_2016_df = sales_df[sales_df['Month'] !=900]

Q1_2016_df['Month'] = np.the place(Q1_2016_df['Month'] == 1,"Jan",np.the place(Q1_2016_df['Month'] == 2,"Feb",np.the place(Q1_2016_df['Month'] == 3,"Mar","Apr")))

Q1_2016_df['test_control'] = np.the place(Q1_2016_df['date'] <= "2016-03-30","practice", "take a look at")

show(Q1_2016_df.head(5))

show(Q1_2016_df.order_status.distinctive())

show(Q1_2016_df.service.distinctive())

show(Q1_2016_df.date.max())
#import csv
  • Import dataset
  • Convert date to pandas datetime
  • Derive month columns
  • Derive practice and take a look at columns
show(Q1_2016_df.head())
show(Q1_2016_df.date.max())

Q1_2016_df_2 = Q1_2016_df[Q1_2016_df["date"] <= "2016-04-01"]
show(Q1_2016_df_2.date.max())

Q1_2016_df_2 = Q1_2016_df_2[Q1_2016_df["order_status"]  == "Cancelled"] 

Q1_2016_df_date_unique = Q1_2016_df_2[["date"]].drop_duplicates()
Q1_2016_df_date_service = Q1_2016_df_2[["service"]].drop_duplicates()

Q1_2016_df_CJ = Q1_2016_df_date_unique.merge(Q1_2016_df_date_service, how = "cross")  ## cross be part of with opti DF

show(Q1_2016_df_date_unique.head())
show(Q1_2016_df_date_unique.form)
show(Q1_2016_df_date_unique.max())
show(Q1_2016_df_date_unique.min())

show(Q1_2016_df_2.form)
Q1_2016_df_3 = Q1_2016_df_CJ.merge(Q1_2016_df_2, on=['date','service'], how='left', suffixes=('_x', '_y'))

show(Q1_2016_df_3.head())
show(Q1_2016_df_3.form)
show(Q1_2016_df_CJ.form)

Q1_2016_df_3["total_cbv"].fillna(0, inplace = True)
print("Null verify ",Q1_2016_df_3.isnull().values.any())

nan_rows = Q1_2016_df_3[Q1_2016_df_3['total_cbv'].isnull()]
nan_rows

show(Q1_2016_df_3[Q1_2016_df_3.isnull().any(axis=1)])

Q1_2016_df_3["dayofweek"] = Q1_2016_df_3["date"].dt.dayofweek
Q1_2016_df_3["dayofmonth"] = Q1_2016_df_3["date"].dt.day

Q1_2016_df_3["Is_Weekend"] = Q1_2016_df_3["date"].dt.day_name().isin(['Saturday', 'Sunday'])

Q1_2016_df_3.head()
  • Filter for under canceled orders.
  • For all providers, cross be part of with dates from Jan 01 to Apr 01, in order that predictions for all days can be found.
  • Change NULL with 0.
  • Derive day of the month
  • Derive day of the week.
  • Create binary weekend/weekday column
"
Q1_2016_df_4 = Q1_2016_df_3[Q1_2016_df_3["service"] != "GO-TIX"]

Q1_2016_df_5 = pd.get_dummies(Q1_2016_df_4, columns=["Month","dayofweek"])

show(Q1_2016_df_5.head())


import numpy as np
import pandas as pd
# from sklearn.datasets import load_boston
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from numpy import imply
from numpy import std
from sklearn.metrics import make_scorer
from sklearn.model_selection import cross_val_predict


Q1_2016_df_5.columns


all_columns = ['date', 'service', 'num_orders', 'order_status', 'total_cbv',
                'test_control', 'dayofmonth', 'Is_Weekend', 'Month_Apr', 'Month_Feb',
                'Month_Jan', 'Month_Mar', 'dayofweek_0', 'dayofweek_1', 'dayofweek_2',
                'dayofweek_3', 'dayofweek_4', 'dayofweek_5', 'dayofweek_6']

                
model_variables = [ 'dayofmonth', 'Is_Weekend', 'Month_Apr', 'Month_Feb',
                  'Month_Jan', 'Month_Mar', 'dayofweek_0', 'dayofweek_1', 'dayofweek_2',
                  'dayofweek_3', 'dayofweek_4', 'dayofweek_5', 'dayofweek_6']

target_Variable = ["total_cbv"]

all_columns = ['service', 
                'test_control', 'dayofmonth', 'Is_Weekend', 'Month_Apr', 'Month_Feb',
                'Month_Jan', 'Month_Mar', 'dayofweek_0', 'dayofweek_1', 'dayofweek_2',
                'dayofweek_3', 'dayofweek_4', 'dayofweek_5', 'dayofweek_6']
  • Filter out GO-TIX
  • One sizzling encode – Month and day of the week
  • Import all the required libraries
  • Create an inventory of columns, practice, predictor, and many others.

model_1 = Q1_2016_df_5[Q1_2016_df_5["service"] =="GO-FOOD"]

take a look at  = model_1[model_1["test_control"]!="practice"]
practice = model_1[model_1["test_control"]=="practice"]

X = practice[model_variables]
y = practice[target_Variable]

train_predict = model_1[model_1["test_control"]=="practice"]
x_ = X[model_variables]

sc = StandardScaler()
X_train = sc.fit_transform(X)
X_test = sc.rework(x_)
  • Filter information for one service – GO-FOOD
  • Create practice and take a look at dataframes
  • Create X – with practice columns, and y with predictor column.
  • Use Standardscalar for z-score transformation.
#outline customized operate which returns single output as metric rating
def NMAPE(y_true, y_pred): 
    return 1 - np.imply(np.abs((y_true - y_pred) / y_true)) * 100

#make scorer from custome operate
nmape_scorer = make_scorer(NMAPE)

# put together the cross-validation process
cv = KFold(n_splits=3, random_state=1, shuffle=True)
# create mannequin
mannequin = LinearRegression()
# consider mannequin
scores = cross_val_score(mannequin, X, y, scoring=nmape_scorer, cv=cv, n_jobs=-1)
# report efficiency
print('Accuracy: %.3f (%.3f)' % (imply(scores), std(scores)))

y_pred = cross_val_predict(mannequin, X, y,  cv=cv)
  • cross_val_score doesn’t have MAPE as an in-built scorer, so outline MAPE.
  • Create CV occasion
  • Create LR occasion
  • Use  cross_val_score to get the common MAPE scores throughout CV Folds for GO-Meals.
  • For every service, this code could be lopped, create a operate to create

def go_model(Q1_2016_df_5, go_service,model_variables,target_Variable):
  """
  Q1_2016_df_5
  go_service
  model_variables
  target_Variable
  """
  model_1 = Q1_2016_df_5[Q1_2016_df_5["service"] ==go_service]

  take a look at  = model_1[model_1["test_control"]!="practice"]
  practice = model_1[model_1["test_control"]=="practice"]

  X = practice[model_variables]
  y = practice[target_Variable]

  train_predict = model_1[model_1["test_control"]=="practice"]
  x_ = X[model_variables]

  X_train = sc.fit_transform(X)
  X_test = sc.rework(x_)

  # put together the cross-validation process
  cv = KFold(n_splits=3, random_state=1, shuffle=True)
  # create mannequin
  mannequin = LinearRegression()
  # consider mannequin
  scores = cross_val_score(mannequin, X, y, scoring=nmape_scorer, cv=cv, n_jobs=-1)
  # report efficiency
  print('Accuracy: %.3f (%.3f)' % (imply(scores), std(scores)))

  y_pred = cross_val_predict(mannequin, X, y,  cv=cv)

  return y_pred,imply(scores), std(scores)

a,b,c = go_model(Q1_2016_df_5, "GO-FOOD",model_variables,target_Variable)

b
  • Modeling steps transformed to a operate:
    • Q1_2016_df_5 – Base information
    • go_service – go-tix, go-send and many others
    • model_variables – variables used to coach the mannequin
    • target_Variable – predictor variable(total_cbv).
  • For every service, the tactic could be run to get the common forecast MAPE throughout all 11 providers.

The Resolution to Half Three

Query 3 is an open-ended query and readers are inspired to unravel it on their very own. Among the hypotheses are:

  • As that is particular to at least one particle space and geography, it’s secure to imagine that the APP roughly remained the identical, and product interventions might have performed solely a minor position. And if there was product intervention, it was simply particular to this explicit space.
  • Good high quality/well-known eating places and meals chains had been onboarded, and customers now have lot of excellent choice to order from or order from acquainted eating places.
  • The supply velocity was considerably improved by onboarding a better variety of supply brokers.
  • Re-trained supply brokers successfully to scale back cancelations.
  • Labored with restaurant companions, to deal with peak-time chaos in a greater method.

Helpful Sources and References

  • Working In The ‘Central Analytics and Science Crew’
  • How We Estimate Meals Debarkation Time With ‘Tensoba’
  • Enterprise Case Research Assignments For Entry Stage Knowledge Analysts
  • Fixing Enterprise Case Research Assignments For Knowledge Scientists
  • Utilizing Knowledge To Admire Our Prospects
  • Beneath the Hood of Gojek’s Automated Forecasting Instrument
  • Experimentation at Gojek
  • GO-JEK’s Impression for Indonesia
  • GO-FAST: The Knowledge Behind Ramadan
  • Pulp optimization.
  • Linear programming utilizing pulp.
  • Advertising marketing campaign optimization.
  • Easy methods to optimize one thing utilizing python.

Conclusion

Case research, when accomplished proper, following the steps given above, can have a optimistic impression on the enterprise. Recruiters aren’t in search of solutions however an method to these solutions, the construction adopted, the reasoning used, and enterprise and sensible data utilizing enterprise analytics. This text supplies an easy-to-follow framework for information analysts utilizing an actual enterprise case examine for example.

Key Takeaways:

  • There are two approaches to reply this case examine, bottom-up, top-down. Right here, bottom-up method has been thought-about, due to unfamiliarity with information and unavailability of enterprise context.
  • Slicing and dicing the gross sales numbers throughout dimensions, figuring out tendencies and patterns throughout providers, is the very best method to determine the challenges for progress.
  • Be crisp and to the purpose, whereas offering suggestions.
  • Let the information inform a narrative, as a substitute of simply proving information factors – Eg: The highest three providers contribute in the direction of greater than 90% of income. Whereas on the group degree, progress is on the optimistic facet, at varied providers, there are challenges with experience completion, driver cancellation, and many others. For Meals – Lowering cancelations by Y% will drive increased revenues in Q2 by x% and many others.
  • Optimization utilizing pulp is intimidating when there are greater than 3 constraints. Writing down an LP drawback on a chunk of paper, then coding it out will certainly make the duty simpler.

Good luck! Right here’s my Linkedin profile if you wish to join with me or need to assist enhance the article. Be at liberty to ping me on Topmate/Mentro; you possibly can drop me a message together with your question. I’ll be completely happy to be linked. Try my different articles on information science and analytics right here.

The media proven on this article shouldn’t be owned by Analytics Vidhya and is used on the Creator’s discretion.

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *