Using explicit waits with a lambda expression:
#! python # -*- coding: utf-8 -*- # @Time : 2024/10/10 66:66 # @Author : John Gu # @File : wait_demo.py from selenium.webdriver.common.by import By from selenium.webdriver.support.wait import WebDriverWait sms_btn = WebDriverWait(driver, 30, 0.5).until(lambda dv: dv.find_element( By.XPATH, '//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]' )) sms_btn.click()
If the logic is complex, you can use a custom function:
(Some logins have image CAPTCHAs, but the src attribute of the image CAPTCHA isn’t immediately available; it appears after some time. The following method can be used):
#! python # -*- coding: utf-8 -*- # @Time : 2024/10/10 6:66 # @Author : GuHanZhe # @File : wait_demo.py import time from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.wait import WebDriverWait driver = webdriver.Chrome() driver.get('https://passport.bilibili.com/login') def func(dv): print("If there is no return value, this function will be executed once every 0.5 seconds; if there is a return value, it will be assigned to the sms_btn variable") tag = dv.find_element( By.XPATH, '//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]' ) img_src = tag.get_attribute("xxx") if img_src: return tag return sms_btn = WebDriverWait(driver, 30, 0.5).until(func) sms_btn.click() time.sleep(2.5) driver.close()
#! python # -*- coding: utf-8 -*- # @Time : 2024/10/10 6:66 # @Author : GuHanZhe # @File : baidu_demo.py from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.wait import WebDriverWait driver = webdriver.Chrome() url = 'https://www.baidu.com/' driver.get(url) WebDriverWait(driver, 20, 0.5).until( EC.presence_of_element_located( (By.LINK_TEXT, 'hao123') ) ) ''' The second parameter represents the maximum wait time of 20 seconds. The third parameter indicates a check every 0.5 seconds for the specified tag's existence. EC.presence_of_element_located( (By.LINK_TEXT, 'hao123') ) EC stands for the condition to wait for; here, it is presence_of_element_located, meaning the node should appear. Its argument is a tuple specifying the node's locator, targeting the link text content of 'hao123'. Every 0.5 seconds, it checks if the specified tag exists via link text; if found, execution continues; if not, it waits until the 20-second limit before throwing an error. ''' content = driver.find_element(By.LINK_TEXT, 'hao123').get_attribute('href') print(content)
#! python # -*- coding: utf-8 -*- # @Time : 2024/10/10 6:66 # @Author : GuHanZhe # @File : qzone_login_demo.py import time from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.wait import WebDriverWait driver = webdriver.Chrome() driver.get('https://qzone.qq.com/') # Display wait until the login iframe is located before continuing execution! locator = (By.XPATH, '//div[@class="login_wrap"]/iframe') WebDriverWait(driver=driver, timeout=5, poll_frequency=0.3, ignored_exceptions=(NoSuchElementException,)).until( EC.presence_of_element_located(locator), message='Not Found') # Switch to login iframe fr = driver.find_element(By.XPATH, '//div[@class="login_wrap"]/iframe') driver.switch_to.frame(fr) driver.find_element(By.XPATH, '//*[@id="switcher_plogin"]').click() time.sleep(1) driver.find_element(By.XPATH, '//*[@id="u"]').send_keys('QQ Number') time.sleep(1) driver.find_element(By.XPATH, '//*[@id="p"]').send_keys('Password') time.sleep(1) driver.find_element(By.ID, 'login_button').click() time.sleep(2) driver.quit()
expected_conditions is a sub-module of selenium, which contains a series of conditions that can be used to verify status. By combining these conditions with the methods of this class, you can wait flexibly based on the conditions.
Waiting Conditions | Meaning |
---|---|
title_is and title_contains | These two condition classes verify the title, checking whether the title equals or contains specific content. |
presence_of_element_located and presence_of_all_elements_located | These two conditions check if elements are present. The parameters they take are locator tuples (e.g., (By.ID, ‘kw’)). The first passes as soon as one element matching the condition is loaded; the second only passes when all such elements are loaded. |
visibility_of_element_located and invisibility_of_element_located and visibility_of | These three conditions check if an element is visible. The first two take locator tuples as parameters, while the third takes a WebElement. The first and third check node visibility; the second checks node invisibility. |
text_to_be_present_in_element and text_to_be_present_in_element_value | The first checks if the text of a node contains specific content; the second checks if the value of a node contains specific content. The former verifies the element’s text, while the latter verifies the element’s value. |
frame_to_be_available_and_switch_to_it | Load and switch: check if a frame can be switched to. Parameters can be locator tuples or direct
For more detailed parameters and usage instructions related to waiting conditions, please refer to the official documentation: (2) Implicit Wait (implicitly_wait(xx))
Practice 1: Implementing the Acquisition of Specific Elements’ Attributes on Baidu’s Homepage#! python # -*- coding: utf-8 -*- # @Time : 2024/10/10 6:66 # @Author : GuHanZhe # @File : bd_login_demo.py from selenium import webdriver from selenium.webdriver.common.by import By driver = webdriver.Chrome() url = 'https://www.baidu.com/' # Set a maximum waiting time of 10 seconds for all element
Practical Example 1: Implementing automatic downward scrolling through the Taobao web page to obtain specific element attributes# -*- coding: utf-8 -*- # @Time : 2024/10/10 # @Author : Gu Han Zhe # @File : bd_login_demo.py import time from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from selenium.webdriver.common.by import By def wait_for_element(driver, xpath, max_attempts=30, interval=0.5): """ Manually implement explicit waiting: wait for the target element to load completely or exist. :param driver: WebDriver instance :param xpath: Target element's XPATH :param max_attempts: Maximum number of attempts :param interval: Time interval between each attempt (seconds) :return: Returns the found element or None """ for attempt in range(max_attempts): try: element = driver.find_element(By.XPATH, xpath) if element.is_displayed(): print(f"[INFO] Element found: {xpath} (Attempt number: {attempt + 1})") return element except NoSuchElementException: pass time.sleep(interval) print(f"[ERROR] Element not found: {xpath} (Maximum attempts: {max_attempts})") return None def main(): driver = webdriver.Chrome() driver.get('https://www.taobao.com') target_xpath = '/html/body/div[12]/div/div/h3' element = wait_for_element(driver, target_xpath, max_attempts=30, interval=1) if element: print(f"Target element text: {element.text}") else: print("Target element not found; operation terminated.") driver.quit() if __name__ == "__main__": main() 👇🏻 Click below to follow my personal public account👇🏻 🎯 In-depth communication | 📌 Note "From CSDN" 🌟 Solve problems, expand networks, grow together! (No casual inquiries) 🚀 More than just communication—it's your technical accelerator! By.LINK_TEXT, CAPTCHA, custom function, expected_conditions, Explicit Waits, Implicit Waits, lambda expression, Mandatory Waiting, Page Loading, WebDriverWait
10 responses to “Web Scraping Essential → Selenium: Detailed Explanation (Part 2)”
|
Leave a Reply
You must be logged in to post a comment.