IT Log

Record various IT issues and difficulties.

AI Tool wiseflow: The Ultimate Web Crawler, Harvesting Industry Insights at Lightning Speed


Chief Intelligence Officer (Wiseflow)

Official Statement:

We don’t lack information; what we need is to filter out the noise from the vast amount of data so that valuable information can be revealed.

Those of you must have heard about this open-source tool from Wiseflow lately. Today, let’s proceed with the actual installation and document any issues encountered during the process.

Firstly, I summarize the functions of Wiseflow into two words: information mining and filtering; feel free to skip the introduction section and go straight to the usage guide.

In the usage guide, I have provided detailed solutions for issues such as Docker download timeout, GitHub download timeout, and building images from Dockerfile. Feel free to refer to them!

Usage Guide

1. Clone the Code Repository

2. Recommend running with Docker

3. Issues encountered during installation

3.1. Issue when downloading Docker images from China: For example, Error response from daemon: Get https://registry-1.docker.io/v2/library/hello-world/manifest

3.1.1. Root Cause:

3.1.2. Solution Methods As Follows:

3.2 During the deployment of wiseflow: failed to solve: python:3.10-slim: pulling from host docker.m.daocloud.io failed with status code [manifests 3.10-slim]: 401 Unauthorized issue

3.2.1. Problem Cause:

3.2.2. Solution as follows:

3.3 Deployment of wiseflow encountered an issue: ERROR [core] https://github.com/pocketbase/pocketbase/releases/download/v0.22.13/pocketbase_0.22.13_linux_amd64.zip problem

3.3.1. Root Causes:

3.3.2. Solutions as Follow:


Overview

In an era of information overload, the challenge we face is not the scarcity of information but how to extract valuable information from vast amounts of data. The Chief Intelligence Officer (Wiseflow) has emerged as a nimble information extraction tool that can automatically extract information based on user-defined focus points from various sources such as websites, WeChat official accounts, social platforms, etc. and classify them with tags before uploading to a database.

Wiseflow utilizes statistical learning and large language models (LLMs) to adapt to over 90% of news pages, achieving an asynchronous task architecture for efficient information processing. It employs LLMs for information extraction and tag classification; even a 9B-sized LLM can perform the tasks seamlessly. Additionally, Wiseflow integrates with downstream application projects like Awada, which is a team-based knowledge assistant built on the WeChat ecosystem. Awada helps create team-specific knowledge repositories and provides functionalities such as Q&A, document search, and writing assistance.

Strengths

The strength of Wiseflow lies in its efficient information processing capabilities, not only filtering irrelevant information but also organizing key points, thus saving users a significant amount of time. Below are some core advantages of Wiseflow:

The value of Wiseflow lies in providing a complete solution for information collection, processing, and application.

It not only helps users extract valuable data from massive information but also simplifies information management through automated tag classification and database uploads.

For enterprises and teams dealing with large volumes of information, Wiseflow is an indispensable tool.

Usage Guide

Installation: Follow the steps directly from GitHub; it’s well-documented. You can visit GitHub – TeamWiseFlow/wiseflow: Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and uploads them to the database.Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and uploads them to the database. – TeamWiseFlow/wiseflowicon-default.png?t=O83Ahttps://github.com/TeamWiseFlow/wiseflow.

Below is a list of personal deployment methods, as well as issues not mentioned on github:

1. Clone code repository

🌹 Thumbs up is a good habit 🌹

2. Recommended to use docker operation

Chinese mainland users should configure the network reasonably before using, or specify the docker hub mirror image

Notes:

At this point, please keep the container running without closing it. Open your browser to http://127.0.0.1:8090/_/, follow the prompts to create an admin account (make sure to use an email), and then enter the created admin email (again emphasizing that you must use an email) and password into the .env file. Restart the container after completing these steps.

If you wish to change the time zone and language of the container, please run the image using commands similar to the following:

3. Issues encountered during installation

Error%20response%20from%20daemon%3A%20Get%20https%3A%2F%2Fregistry-1.docker.io%2Fv2%2Flibrary%2Fhello-world%2Fmanifest%2C%20this%20indicates%20a%20timeout.
3.1.1. Cause of the issue:

Mainly due to IP or DNS resolution issues.

3.1.2. Solutions:

dig @114.114.114.114 registry-1.docker.io

vim /etc/hosts

(The similar file exists in Windows, with the same method)


3.2 Deployment of wiseflow failed: failed to solve: python:3.10-slim: pulling from host docker.m.daocloud.io failed with status code [manifests 3.10-slim]: 401 Unauthorized issue
3.2.1. Cause:

The .env file in wiseflow specifies the use of Docker Compose, which uses Dockerfiles for construction. However, large base packages like python:3.10-slim often fail to download due to issues.

This results in an unsuccessful build!

3.2.2. Solution:

First, pull down the python:3.10-slim base image in advance: docker pull python:3.10-slim;

Then re-attempt the Docker build to bypass the failed large package download issue.


3.3 Deployment of wiseflow encountered an error: ERROR [core] https://github.com/pocketbase/pocketbase/releases/download/v0.22.13/pocketbase_0.22.13_linux_amd64.zip issue
3.3.1. Root Cause:

When wiseflow uses Dockerfile to build the docker image, there is a timeout when Dockerfile attempts to download pocketbase_0.22.13_linux_amd64.zip from github!

Error message appears as shown below: failed to solve: failed to load cache key: stream error: stream ID 1; PROTOCOL_ERROR; received from peer

3.3.2. Solution as follows::

Edit Dockerfile and modify it as shown in the following figure:


By now, everyone should be able to use Wiseflow locally and improve your work efficiency!

If you have any questions, feel free to leave a message for exchange. Exchange is the shortcut to solving problems!!

🌹 If helpful, give a thumb up and save it, thank you everyone!!! 🌹


, , , ,

5 responses to “AI Tool wiseflow: The Ultimate Web Crawler, Harvesting Industry Insights at Lightning Speed”

  1. Thank you for sharing such a comprehensive guide on Wiseflow! It’s not only informative but also practical, especially with the Dockerfile tips provided.

  2. This article is a game-changer for anyone aiming to streamline their web data collection processes. The clarity and depth of the instructions make it an invaluable resource.

  3. The Ultimate Web Crawler by Wiseflow offers a lightning-fast solution that truly lives up to its name. The detailed Dockerfile modification guide is particularly helpful for setting up the tool locally.

  4. Wiseflow proves to be an indispensable resource for anyone looking to enhance their data harvesting processes. The insights provided are both timely and actionable, making this article a must-read.

  5. This article provides a clear and effective solution for improving work efficiency with the Wiseflow tool. The step-by-step guide is easy to follow, making it accessible even for those less familiar with web crawling techniques.

Leave a Reply