IT Log

Record various IT issues and difficulties.

Python-Based VIP Movies Web Crawler


Table of Contents:

First and foremost:

Common types of movie resources that are often parsed

I. Direct parsing, save as to local storage (brutal download)

Perfect for beginners, extremely comfortable.

II. Movies that can be directly packet sniffer

Simpler packet sniffer method

III. Unencrypted m3u8 format movies

Step 1: Right-click and open the inspection panel; find the m3u8 file in the network section.

Step 2: Download each small ts file through the m3u8 file and complete the merging process.

IV. AES encrypted m3u8 files

Based on the third type, this one adds an encryption format.

first: We need to download each directory (the m3u8 file)

second: We need to extract the AES key and each ts URL from the directory using regular expressions.

last: Use the obtained keys to decrypt each ts file, enable multithreading for downloading, and merge the files.

First and foremost

The textual description in this article is only meant to organize ideas and assist in understanding; the real essence lies in the code. Please take a serious look at the code.

Common types of video resources that are often parsed:

Applicable audience:

1. Pure beginners, just wanting to watch movies for free

2. Beginners; at least know how to use requests and basic syntax

3. Slightly more advanced beginners: familiar with os, requests, time, basic re, basic concurrent, basic AES (why all basics? Because that’s my level…

4. Experts (anyone beyond the first three is considered an expert)

I. Direct parsing, save as to local storage (brutal download)

Perfect for beginners, extremely comfortable.

The simplest method without a doubt; pure enjoyment without any prerequisites!

Parsing address https://jx.iiiv.vip/?url=

This website is applicable for video parsing and viewing on Tencent Video, iQiyi, Youku, and Bilibili (non-premium members).

For Tencent Video, simply right-click twice with the mouse to save as a local file in mp4 format (not html).

However, other video platforms do not allow direct right-click download. Note: This website cannot directly packet capture.

II. Direct Packet Capture Movies

This is simpler than (I), open the developer tools, select “media,” then parse and play (note the order). This will refresh the file list; double-click to open the file and enter the video interface for direct download. While big movies are less commonly used, the download speed is acceptable.

For example, since Bilibili videos cannot be directly downloaded, packet capture solves it.

Parsing URL: 90 Occasionally – QQ Group: 1265608 (iiiv.vip) (Very user-friendly)


III. Unencrypted m3u8 Format Movies

Modules used: requests, os, time, concurrent (optional)

Recently watching Luo Xiaohei’s War Diary (updated after seven years), but it turns out only available to iQiyi premium members. Following the principle of “why pay when you can get it for free,” I found a website that can parse iQiyi premium videos:

Parsing URL Guan Lu Dēng (gualudeng.com)

Recommended website: it has a lot of content. (When I used it last time, it was working fine, but after half a year, now you need to follow their WeChat official account to use it smoothly… By the way, most resources now require following accounts, and the reasons are usually malicious web crawlers or interface misuse, not sure if they’re genuine concerns about revenue.)

Select the OK interface (if that doesn’t work now, just adapt and find alternatives yourself)

Image description

At this point, you can watch the movie. Those familiar with m3u8 format know that if it’s not on a major platform, it tends to buffer a lot, ruining the viewing experience. Hence, we opt to download the movie. Here comes the main part.

Regarding the m3u8 format: essentially, a long video (a few hours) is split into thousands of small fragments with .ts extensions, which can be played like regular MP4 files. When you watch, each fragment is rendered one after another. For more details, you can check CSDN or Baidu; I won’t elaborate here (to be honest, I haven’t delved deep either, hehe).

So our approach is:

0. Retrieve the m3u8 file from the parsing website (or via PHP)
1. Extract all URLs and remove those starting with ‘#’ which are useless
2. Iterate through each video fragment
3. Combine all fragments into a single movie file
4. Delete the downloaded individual fragments

Blood, sweat, and tears—this is what we’ve learned step by step.

Step 1: Right-click to open the developer tools, then find the m3u8 file in the ‘Network’ section

Open the dev tools, select XHR, and play the video (important to maintain order), which will refresh many files.

****

The file name is usuallyindex.m3u8. If not, find the file with the .m3u8 extension and double-click it. It will download a document to your local machine, which looks like this:

About two thousand lines or so…

Yeah, this file contains many https URLs, which are video segments in .ts format.

Step 2: Write Python code

Once the idea is clear, it’s time to write the code. Here we go (not too long):

headers**** are anti-crawling measures, which have no effect here. The program opens aThreadPoolExecutor, so a movie can be downloaded in three minutes. Without it, the download would take hours, as seen in the previous example.

In addition, due to potential errors in web crawling, both thedownload function and subsequent deletion of ts fragments use try-except to catch exceptions. A thread pool is also used for re-recording, handling any content that fails to download initially.

Thetime module is used to feedback the time taken to download the movie, though it can be omitted if desired.

In XHR, find m3u8. Usually, you can see thousands of URLs in the preview, as shown in the above figure.

Section 4. AES-encrypted m3u8 files

Modules used: requests, re, AES, os, time, concurrent (optional)

relatively more advanced crawling techniques(just a little bit more advanced)

In(III), we learned how to download m3u8 video files. However, not all m3u8 files are that straightforward. Some websites apply encryption to their files (I’ve only encountered AES encryption; though I’ve heard of other modes, but haven’t seen them). Here, we’ll focus on solving the issue with AES-encrypted videos.

1.Firstly, we need to determine if encryption is applied

2.Secondly, since encryption is in place, we need a key. Let’s obtain it.

3.Use the key to decrypt and retrieve the ts files.

#Basically, it’s just adding a decryption module to what we did in (II)

We’ll take Bilibili Premium Member (the most challenging one by far, with its arrogant attitude) and “Lu Xiaohēi Zhanji” episodes 37-40 as examples.

Firstly,we need to download the playlist for each episode (the m3u8 file)

Use the method from(III) to do so.

Bilibili Premium Member Selection Interface (It is obvious that any changes can be resolved with adjustments)

Like this.

Since this involves multiple episodes, we choose to extract the key from the downloaded m3u8 file. This requires a bit of regular expression (re) usage; however, manually extracting and pasting into the code is also possible. Nonetheless, for series with several episodes, this manual method becomes quite tedious. Regular expressions are essential when extracting each ts file’s URL. Unless you’re as skilled as I am in string manipulation using functions like startswitch, regular expressions remain necessary.

Once the key is obtained, it must be encoded into utf-8 format; otherwise, the program will throw an error (since Python can’t understand data encoded in other formats).

About Python Skill Development

Mastering Python is advantageous for both employment and side hustles. However, to learn Python effectively, one must have a structured study plan. Here, we share a comprehensive set of Python learning materials for those looking to enhance their skills!

Includes:

Become proficient in Python from zero foundation with our systematic tutorials!

Click to claim 100% free!

Learning Routes for All Python Directions

The learning routes encompass all common technical points in Python, organizing them into knowledge areas. This allows you to find corresponding resources based on the outlined topics, ensuring a comprehensive study approach. (Complete tutorial set available at the end)

Reminder: Due to space constraints, content has been bundled into a folder. For access details, refer to the end of the article.

70 Practical Python Hands-On Cases & Source Codes

Theoretical knowledge alone isn’t sufficient; applying it through hands-on practice is essential for real-world application. Engage in practical projects to reinforce learning.

Description here

Python Part-Time and Freelance Routes & Methods
Mastering Python is a great choice for both employment and part-time income. However, to effectively take on freelance projects, it’s essential to have a solid learning plan in place.

Description here
Description here


, , , , , , , , ,

10 responses to “Python-Based VIP Movies Web Crawler”

  1. Thanks for this excellent guide on web crawling for movies! It’s comprehensive, well-explained, and covers all necessary tools and techniques. Highly recommend it to anyone looking to explore free movie access.

  2. This article is a treasure trove for movie lovers! It provides clear instructions on various methods to access premium content without paying. Perfect for both beginners and intermediate users.

  3. I love how the author simplified complex topics like m3u8 parsing and AES decryption. The inclusion of practical Python code makes it even more valuable for learning and implementation.

  4. This guide is a lifesaver for those who want to bypass premium memberships. It’s well-structured and covers everything from basic to advanced techniques, making it suitable for all skill levels.

  5. Awesome resource! The explanations on parsing m3u8 files and dealing with AES encryption are particularly useful. Highly recommended for anyone interested in free movie access.

  6. I appreciate how the author broke down the process into clear sections. From direct download to packet sniffer and m3u8 handling, it covers all bases. Definitely worth checking out the provided code!

  7. This article is gold for movie enthusiasts looking to save on subscriptions. The methods explained are practical and easy to follow, even for those not deeply familiar with Python or web scraping.

  8. Thanks for sharing this detailed guide! It’s perfect for someone like me who’s just starting with web crawling. The part about m3u8 files and AES encryption was a bit technical, but overall very informative.

  9. Great breakdown of different movie resource parsing methods. Especially useful for beginners who want to download movies without direct links. Love how it covers both simple and complex scenarios.

  10. This article is a must-read for anyone interested in web crawling for movies! It explains everything from basic methods to more advanced techniques like m3u8 parsing. The step-by-step guide and mention of Python code make it super helpful.

Leave a Reply