Data Scraping using Python & Stata

Data Scraping using Python & Stata

Enrol Here
Enrol Here
360,00 €
Guaranteed safe and secure checkout
2 Days
Online via Teams
Stata

Overview

Accessing and retrieving online data is increasingly vital for researchers and analysts. This two-day, interactive online seminar explores how to use Python—within Stata—to scrape and structure online data for analysis.

 

Participants will learn how to identify, extract, and convert online data (e.g., HTML tables or embedded content) into formats compatible with Stata (.txt, .csv). The course covers the basics of Python and HTML parsing, progressing to hands-on coding sessions. No prior coding experience is required, though a basic understanding of Stata or Python is beneficial.

Course Aims & Objectives
  • Introduce data scraping techniques using Python embedded within STATA.

  • Provide foundational knowledge of Python programming and HTML structure.

  • Equip participants with skills to identify and extract online data for quantitative analysis.

Key Skills Acquired

By the end of the course, students will understand:

  • Systems of linear equations and solution methods.

  • Matrix operations, transposition, determinants, and inverses.

  • Vector spaces, eigenvalues, and quadratic forms.

  • Calculus basics: derivatives, differentials, concavity/convexity.

  • Techniques in unconstrained optimisation for functions of a single variable.

Learning Outcomes
  • Technical Fluency: Gain practical experience with Python inside STATA, focusing on scripting for web scraping.

  • Data Acquisition Skills: Learn to extract useful, structured data from unstructured web pages.

  • Problem-Solving: Develop the ability to troubleshoot typical data scraping challenges and adapt code to new data sources.

  • Application-Oriented Learning: Build transferable skills applicable to academic, policy, and private-sector research projects.

Course Structure

Format: Two-day online seminar
Daily Sessions: 10:00–12:00 & 14:00–16:00 (BST)
Q&A: 1-hour concluding session on Day 2

Total contact time: 8 hours of instruction + 1 hour Q&A

Agenda

Day 1:

Lecture 1: Python Basics
Lecture 2: Web Structure and HTML Fundamentals
Day 2:

Lecture 3: Extracting and Saving Data
Lecture 4: Writing Efficient Python Code for Web Scraping

Prerequisites

No specific readings are required. A basic knowledge of Stata and Python is useful but not essential.

Course Timetable

Subject to minor changes

Day Morning Session Afternoon Session (including Tutorial)
Day One 10am-12pm (London time) 2pm-4pm (London time)
Day Two 10am-12pm (London time) 2pm-4pm (London time)

Delivered By

Student Testimonials

Giovanni's delivery is fantastic; makes great connections between new and prior knowledge and focuses on the key strengths and limitations of the discussed methods. Excellent course design that builds on the Introductory Machine Learning course and knowledge acquired in the PhD Econometrics sequences of courses. This is all nicely supplemented by detailed Stata code with explanations and sample datasets. 

Georgi Boichev

A student on our

Excellent course and great explanations on ML techniques and applications from Giovanni ! I leanred so much including the coding and applications plus the fundamentals of ML.

Marco Delprado

A student on our

The 'Advanced Machine Learning (AML)' experience was excellent for trying to gain more experience in Statistics using links Python and STATA.  

I'm not a Statistician! However, Giovanni managed to link the 'Fundamentals of Machine Learning (FML) ' to 'Advanced Machine Learning' in his usual excellent way. When starting the AML, for me I am pleased that the FML was a tremendous help and allowed me to use my mathematical knowledge for Physics and Science. I'm looking forward to Giovanni's next course (using large datasets) and his book.

Linking my knowledge of mathematics (from Science and Engineering) to Statistics. I do hope it is leading towards becoming better at 'Medical Statistics' that require very large datasets...and a big thank you to Giovanni!

William Ware

A student on our

Very well organized, very useful and relevant content, looking forward to joining future events!

Hebatallah Nashaat

A student on our

As always great service and real good courses. In addition, thanks to Professor Cerulli for making himself understood in the best way.

David Pineda

A student on our

The delivery of this course was exceptionally well done. It really helped me to appreciate the concepts as well as the practical applications in Stata. If you are new to this topic, this will provide a good introduction to complex issues.

Very easy to communicate, all emails contained all the information necessary. I think that the course was very well structured and organized. The tutor provided a number of codes that were extremely helpful for understanding. Overall, very useful and easy to follow!

I highly appreciated Professor Giovannu Cerulli course. The classes notes are very clear   and well prepared with an extensive coverage of the course subjects. And they are simultanesouly quite objective by focusing on the most important contents. Professor Giovannu Cerulli lectures are very didatic which greately helps the easily assimilation of the   corespondent knowledge. Furthermore, the course materials are quite   comprehensive and they englobe not only the classes notes, but also the referenced papers as well as data and Stata programs to estimate the models in this software. All in all, I greatly recommend this   course, as it really amazingly speeds up the acquaintance of the underlying theory and appied aplication in a very short period of time.

I found the Stata Summer School 2021 very useful and interesting. The course was perfectly structured and organised, with a good progression during the week. The instructors presented the topics covered in an easy and understandable way. There were room for questions and answers when needed. Materials shared for the course were tidy and informative, and I am sure I will use them frequently. This course was arranged online, which in my opinion worked very well. I believe the course delivered as promised and according to information found online when I signed up for the course. Easy to purchase/sign up for the course. User friendly. Quick and timely response.

Very efficient in terms of communication and delivery. Provides a very comprehesnive applied knowledge of stata. I would definitely recommend others to buy from them.

I went UK University of Cambridge for a summer school with Timberlake, it was excellent.

It was a great course and I thoroughly enjoyed it. Many of my fellow participants were eager to share their ideas. I thought the course could help further many people in a similar stage to my career!