Table of Contents:
First and foremost:
Common types of movie resources that are often parsed
I. Direct parsing, save as to local storage (brutal download)
Perfect for beginners, extremely comfortable.
II. Movies that can be directly packet sniffer
Simpler packet sniffer method
III. Unencrypted m3u8 format movies
Step 1: Right-click and open the inspection panel; find the m3u8 file in the network section.
Step 2: Download each small ts file through the m3u8 file and complete the merging process.
IV. AES encrypted m3u8 files
Based on the third type, this one adds an encryption format.
first: We need to download each directory (the m3u8 file)
second: We need to extract the AES key and each ts URL from the directory using regular expressions.
last: Use the obtained keys to decrypt each ts file, enable multithreading for downloading, and merge the files.
First and foremost
The textual description in this article is only meant to organize ideas and assist in understanding; the real essence lies in the code. Please take a serious look at the code.
Common types of video resources that are often parsed:
Applicable audience:
1. Pure beginners, just wanting to watch movies for free
2. Beginners; at least know how to use requests and basic syntax
3. Slightly more advanced beginners: familiar with os, requests, time, basic re, basic concurrent, basic AES (why all basics? Because that’s my level…
4. Experts (anyone beyond the first three is considered an expert)
I. Direct parsing, save as to local storage (brutal download)
Perfect for beginners, extremely comfortable.
The simplest method without a doubt; pure enjoyment without any prerequisites!
Parsing address https://jx.iiiv.vip/?url=
This website is applicable for video parsing and viewing on Tencent Video, iQiyi, Youku, and Bilibili (non-premium members).
For Tencent Video, simply right-click twice with the mouse to save as a local file in mp4 format (not html).
However, other video platforms do not allow direct right-click download. Note: This website cannot directly packet capture.
![]()
II. Direct Packet Capture Movies
This is simpler than (I), open the developer tools, select “media,” then parse and play (note the order). This will refresh the file list; double-click to open the file and enter the video interface for direct download. While big movies are less commonly used, the download speed is acceptable.
For example, since Bilibili videos cannot be directly downloaded, packet capture solves it.
Parsing URL: 90 Occasionally – QQ Group: 1265608 (iiiv.vip) (Very user-friendly)
III. Unencrypted m3u8 Format Movies
Modules used: requests, os, time, concurrent (optional)
Recently watching Luo Xiaohei’s War Diary (updated after seven years), but it turns out only available to iQiyi premium members. Following the principle of “why pay when you can get it for free,” I found a website that can parse iQiyi premium videos:
Parsing URL Guan Lu Dēng (gualudeng.com)
Recommended website: it has a lot of content. (When I used it last time, it was working fine, but after half a year, now you need to follow their WeChat official account to use it smoothly… By the way, most resources now require following accounts, and the reasons are usually malicious web crawlers or interface misuse, not sure if they’re genuine concerns about revenue.)
Select the OK interface (if that doesn’t work now, just adapt and find alternatives yourself)
At this point, you can watch the movie. Those familiar with m3u8 format know that if it’s not on a major platform, it tends to buffer a lot, ruining the viewing experience. Hence, we opt to download the movie. Here comes the main part.
Regarding the m3u8 format: essentially, a long video (a few hours) is split into thousands of small fragments with .ts extensions, which can be played like regular MP4 files. When you watch, each fragment is rendered one after another. For more details, you can check CSDN or Baidu; I won’t elaborate here (to be honest, I haven’t delved deep either, hehe).
So our approach is:
0. Retrieve the m3u8 file from the parsing website (or via PHP)
1. Extract all URLs and remove those starting with ‘#’ which are useless
2. Iterate through each video fragment
3. Combine all fragments into a single movie file
4. Delete the downloaded individual fragments
Blood, sweat, and tears—this is what we’ve learned step by step.
Step 1: Right-click to open the developer tools, then find the m3u8 file in the ‘Network’ section
Open the dev tools, select XHR, and play the video (important to maintain order), which will refresh many files.
****
The file name is usuallyindex.m3u8. If not, find the file with the .m3u8 extension and double-click it. It will download a document to your local machine, which looks like this:
About two thousand lines or so…
Yeah, this file contains many https URLs, which are video segments in .ts format.
Step 2: Write Python code
Once the idea is clear, it’s time to write the code. Here we go (not too long):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
#Import modules import requests import os import time from concurrent.futures import ThreadPoolExecutor start = time.time() #Prepare download path m3u8 = [] with open(‘play.php’,‘r’) as file: lst = file.readlines() for i in lst: i = i.strip() if i.startswith(‘#’): continue else: m3u8.append(i) print(f‘Total number of target files: {len(m3u8)}’) #Function to download one ts file def download_ts(i): for _ in range(10): try: with open(f‘{i}.ts’,‘wb’) as file: resp = requests.get(m3u8[i], timeout=15).content file.write(resp) print(f‘The {i}th video segment has been downloaded!’) break except: print(f‘Download of the {i}th ts file failed. Retrying…’) continue with ThreadPoolExecutor(100) as t: for i in range(len(m3u8)): t.submit(download_ts, i) t.shutdown() #Additional download process with ThreadPoolExecutor(50) as f: for i in range(len(m3u8)): with open(f‘{i}.ts’,‘rb+’) as file: if file.read(): continue else: f.submit(download_ts, i) f.shutdown() print(‘All ts files have been downloaded.’) #Merging ts files print(‘Starting to merge ts files…’) with open(‘Luoxiaohēi_Zhanji.mp4’,‘wb’) as file: for i in range(len(m3u8)): with open(f‘{i}.ts’,‘rb’) as f: f_view = f.read() file.write(f_view) print(‘Ts files have been merged.’) #Deletion of ts files print(‘Starting to delete ts files…’) for i in range(len(m3u8)): try: os.remove(rf‘C:\Users\My_Pc\Desktop\python_learning\python_crawler_projects\vip_movie_full_solution\luoxiaohēi_series\{i}.ts’) except FileNotFoundError: continue print(‘Ts files have been deleted.’) print(‘Movie downloaded perfectly!’) end = time.time() print(f‘Total download time: {end – start}s’) |
headers**** are anti-crawling measures, which have no effect here. The program opens aThreadPoolExecutor, so a movie can be downloaded in three minutes. Without it, the download would take hours, as seen in the previous example.
In addition, due to potential errors in web crawling, both thedownload function and subsequent deletion of ts fragments use try-except to catch exceptions. A thread pool is also used for re-recording, handling any content that fails to download initially.
Thetime module is used to feedback the time taken to download the movie, though it can be omitted if desired.
In XHR, find m3u8. Usually, you can see thousands of URLs in the preview, as shown in the above figure.
Section 4. AES-encrypted m3u8 files
Modules used: requests, re, AES, os, time, concurrent (optional)
relatively more advanced crawling techniques(just a little bit more advanced)
In(III), we learned how to download m3u8 video files. However, not all m3u8 files are that straightforward. Some websites apply encryption to their files (I’ve only encountered AES encryption; though I’ve heard of other modes, but haven’t seen them). Here, we’ll focus on solving the issue with AES-encrypted videos.
1.Firstly, we need to determine if encryption is applied
2.Secondly, since encryption is in place, we need a key. Let’s obtain it.
3.Use the key to decrypt and retrieve the ts files.
#Basically, it’s just adding a decryption module to what we did in (II)
We’ll take Bilibili Premium Member (the most challenging one by far, with its arrogant attitude) and “Lu Xiaohēi Zhanji” episodes 37-40 as examples.
Firstly,we need to download the playlist for each episode (the m3u8 file)
Use the method from(III) to do so.
Bilibili Premium Member Selection Interface (It is obvious that any changes can be resolved with adjustments)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
#Import modules import requests import os from Crypto.Cipher import AES #AES decryption module import time from concurrent.futures import ThreadPoolExecutor import re start = time.time() #Prepare download path headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64’ ‘AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73’} def download_ts(i): for _ in range(10): try: with open(f‘{i}.ts’, ‘wb’) as file: response = requests.get(m3u8[i], headers=headers, timeout=15).content cryptor = AES.new(key, AES.MODE_CBC, key) file.write(cryptor.decrypt(response)) print(f‘The {i}th ts segment has been downloaded successfully!’) break except: print(f‘The {i}th ts file download failed, retrying!’) continue for _ in range(4): m3u8 = [] with open(f‘index ({_+7}).m3u8’,‘r+’,encoding=‘utf-8’) as file: r = file.read() obj = re.compile(r‘URI=”(?P<url>.*?)”‘, re.S) keyurlx = obj.finditer(r) for x in keyurlx: keyurl = x.group(‘url’) file.seek(0,0) lst = file.readlines() for i in lst: i = i.strip() if i.startswith(‘#’): continue else: m3u8.append(i) print(f‘Total target files: {len(m3u8)}’) respn = requests.get(keyurl, headers=headers) key = respn.text.encode(‘utf-8’) with ThreadPoolExecutor(100) as t: for i in range(len(m3u8)): t.submit(download_ts, i) t.shutdown() # Supplement program with ThreadPoolExecutor(50) as f: for i in range(len(m3u8)): with open(f‘{i}.ts’,‘rb+’) as file: if file.read(): continue else: f.submit(download_ts, i) f.shutdown() print(‘Ts files download completed’) # Merge ts files print(‘Ts files merging started….’) with open(f‘vip movie experiment{\_}.mp4’,‘wb’) as file: for i in range(len(m3u8)): with open(f‘{i}.ts’,‘rb’) as f: f_view = f.read() file.write(f_view) print(‘Ts files merged successfully!’) # Ts file deletion operation print(‘Starting to delete ts files….’) for i in range(len(m3u8)): try: os.remove(rf‘C:\Users\My User\Downloads\{i}.ts’) except FileNotFoundError: continue print(f‘Ts file {i} deleted.’) |
Like this.
Since this involves multiple episodes, we choose to extract the key from the downloaded m3u8 file. This requires a bit of regular expression (re) usage; however, manually extracting and pasting into the code is also possible. Nonetheless, for series with several episodes, this manual method becomes quite tedious. Regular expressions are essential when extracting each ts file’s URL. Unless you’re as skilled as I am in string manipulation using functions like startswitch, regular expressions remain necessary.
Once the key is obtained, it must be encoded into utf-8 format; otherwise, the program will throw an error (since Python can’t understand data encoded in other formats).
About Python Skill Development
Mastering Python is advantageous for both employment and side hustles. However, to learn Python effectively, one must have a structured study plan. Here, we share a comprehensive set of Python learning materials for those looking to enhance their skills!
Includes:
- Python activation keys + installation packages, Python web development, Python crawling, Python data analysis, artificial intelligence, automation office, etc.
Become proficient in Python from zero foundation with our systematic tutorials!
Learning Routes for All Python Directions
The learning routes encompass all common technical points in Python, organizing them into knowledge areas. This allows you to find corresponding resources based on the outlined topics, ensuring a comprehensive study approach. (Complete tutorial set available at the end)
Reminder: Due to space constraints, content has been bundled into a folder. For access details, refer to the end of the article.
70 Practical Python Hands-On Cases & Source Codes
Theoretical knowledge alone isn’t sufficient; applying it through hands-on practice is essential for real-world application. Engage in practical projects to reinforce learning.
Python Part-Time and Freelance Routes & Methods
Mastering Python is a great choice for both employment and part-time income. However, to effectively take on freelance projects, it’s essential to have a solid learning plan in place.
Leave a Reply
You must be logged in to post a comment.