Machine Learning

机器学习

This is part of final project of IST707 Applied Machine Learning instructed by Professor Yang Yang at Syracuse University. It aims to use the main ML skills to solve a real data analytics problem.

 

Online Shopper’s Intention Prediction:

Are they browsing or purchasing ?

Motivation:

Due to the covid-19, many of the longer-term changes in consumer behavior are still being formed, giving companies an opportunity to help shape the “Next Normal”.

 
 
  • Capture attributes that contribute the most to revenue growth & Create a model to prediction shoppers’ intention.

  • The dataset is collected by UCI Machine Learning Repository in 2018. It contains 18 columns and more than 12 thousand rows.

    Revenue“ is the dependent variable in this case.

data info

DATA INFO

 

DATA INSIGHT

〰️

DATA INSIGHT 〰️

 

Conversion Rate

Color represent whether a conversion is made, which also means whether a consumer make a purchase. Red indicates not buying the product while the rest indicates an actual purchase. 85% of the transaction did not convert into sales, which means most of the consumers online are just browsing instead of buying.

 

Pages-wise vs. Conversation

Product related pages contribute the majority of the visits and the time spent on the website. Further, it has the highest contribution to revenue generation.

Traffic by Month

When we look the traffic by month, we can see that 28% of the visitors come in May, among these most of them are returning visitors. While November attracts most of the new visitors given a possibility that special discount and holidays are mostly around November and December.

 

Loyal customer matters

Other charts indicates the distribution of weekend and weekdays buyers and the total distribution of visitor type. This can help us to make actionable strategy such as ad campaigns should reach our audience during weekdays and follow up with the returning visitors either on social media or email.

Correlation

High Bounce and Exit Rates lead to less Revenue

ML MODELS

〰️

ML MODELS 〰️

  • Training dataset: 70% of the total

  • Testing dataset: 30% of the total

The default decision tree model receives an accuracy of .86 and .89 after tunning. PageValues remained at the top of the list followed by Bounce rate, ProductRelated duration, and exit rates.

In the random forest classifier, it returns an accuracy of .89 with the default one and .90 after tuning. Through random forest algorithms, e-commerce vendors can predict the preference of customers based on past consumption behavior.

No matter for precision, f1 score or overall accuracy, random forest performs better than the decision tree model. Especially looking at the overall accuracy, the default random tree model’s accuracy is the same with the decision tree after tuning. Although random forest is more time consuming, it still has better performance.

Conclusion &

Recommandation

 

Actionable strategy:

•Page value is high when there is a revenue: improve content readability

•High Bounce and Exit Rates lead to no Revenue: improve user experience by optimizing UI interactions and avoiding popups

•Engaging loyal customers

•Increase the duration of website on monthly basis by introducing monthly themed offers and products

Site metric:

• Identify which pages can be used best in the marketing campaigns according to “Page Value”.

•Identify important KPIs that drive Revenue by decision tree model

•Identify what product page performs the best in driving conversion.

•Further identify consumers’ touch points by adding “tags” to web pages.

Reflection

 

Imbalanced:

The dataset is highly imbalanced as there are more negative cases than positive.

Columns:

Some columns such as region only contains binary values, which did not indicate insightful information.

Diversity:

Other factors that might impact revenue include ad click, social post, site engagement, etc.

Previous
Previous

Data Visualization数据可视化

Next
Next

Marketing Analysis 市场分析案例