python核心技术与实战学习笔记（十三）：Futures多线程实现并发

最新推荐文章于 2026-02-25 01:53:59 发布

原创最新推荐文章于 2026-02-25 01:53:59 发布 · 579 阅读

1 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#并发 #多线程 #并行 #thread #asyncio

python 专栏收录该内容

17 篇文章

订阅专栏

本文深入探讨了Python中的并发编程，对比了threading和asyncio两种并发方式的特点和适用场景，详细讲解了多线程和多进程的实现方法，以及Futures模块在并发编程中的应用。

13.多线程实现并发

13.1 python中并发的两种方式：threading和asyncio
- 并发和并行的概念
- 并发与并行的对比
13.2 threading多线程实现并发编程（Futures）

13.1 python中并发的两种方式：threading和asyncio

threading：操作系统知道每个线程的所有信息，会自主在适当的时候做线程切换。
asyncio：主程序在切换任务时，必须得到此任务可以被切换的通知。

即两者的区别在于切换线程时是否得到了切换允许通知。这决定了它们各自的优点和缺点：

threading代码容易书写，程序员不需要做任何切换操作处理。但是由于python解释器并不是线程安全的，则有可能在一个语句的执行过程中，容易出现race condition的情况（竞态条件，程序得到的结果取决于进程的执行顺序。如像数据库中并发执行可能引起的三个问题丢失更新、读值不可复现、读脏数据）。asyncio则恰好相反。

如何解决threading中race condition的问题？python引入了全局解释器锁（全局解释器锁之后再记录），当某个线程被block之后，全局解释器锁会被释放，从而让另一个线程能够继续执行。这保证了同一时刻，只允许有一个线程执行。

并发和并行的概念

并发(concurrency)：在某一时刻只允许一个操作(thread/task)发生，只不过thread/task之间会相互切换，直到所有thread/task都完成。宏观上达到了多个进程同时执行的效果，但在微观上并不是同时执行的。

在这里插入图片描述

并行(parallelism)：即multi-processing，指的是同一时刻允许多个操作同时进行。如6核处理器在运行程序时可让python开6个进程，同时执行，以加快运行速度。

在这里插入图片描述
并行在多处理器系统中存在，而并发可以在单处理器和多处理器系统中都存在，并发能够在单处理器系统中存在是因为并发是并行的假象，并行要求程序能够同时执行多个操作，而并发只是要求程序假装同时执行多个操作（每个小时间片执行一个操作，多个操作快速切换执行）。

并发与并行的对比

并发通常应用于I/O操作频繁的场景。这是因为I/O操作所需要的CPU资源非常少，大部分工作是分派给DMA直接内存存取的完成的，当存取操作完成后，通过中断异常来提醒CPU。

并行更多应用于CPU heavy的场景，比如MapReduce中的并行计算，为了加快运行速度，一般会用多台机器、多个处理器来完成。

13.2 threading多线程实现并发编程（Futures）

13.2.1 单线程与多线程性能比较

以下以一个简化的任务为例，下载一些网站并打印内容，对比使用单线程和多线程的性能差异。

单线程执行任务代码如下（忽略了异常处理）：

import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    
def download_all(sites):
    for site in sites:
        download_one(site)

def main():
    sites = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()

# 输出
Read 129886 from https://en.wikipedia.org/wiki/Portal:Arts
Read 184343 from https://en.wikipedia.org/wiki/Portal:History
Read 224118 from https://en.wikipedia.org/wiki/Portal:Society
Read 107637 from https://en.wikipedia.org/wiki/Portal:Biography
Read 151021 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 157811 from https://en.wikipedia.org/wiki/Portal:Technology
Read 167923 from https://en.wikipedia.org/wiki/Portal:Geography
Read 93347 from https://en.wikipedia.org/wiki/Portal:Science
Read 321352 from https://en.wikipedia.org/wiki/Computer_science
Read 391905 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 321417 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 468461 from https://en.wikipedia.org/wiki/PHP
Read 180298 from https://en.wikipedia.org/wiki/Node.js
Read 56765 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 324039 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 2.464231112999869 seconds

以上代码的执行流程为：先遍历存储网站的列表，然后对当前网站执行下载操作，等到当前操作完成后，再对下一个网站进行同样的操作，一直到结束。

明显单线程比较简单明了，但是效率低下，因为上述程序的绝大部分时间都浪费在了I/O等待上。程序每下载一个网站，都要等待前一个网站下载完成后才能开始进行。而在实际生产环境中，我们需要下载网站的数量至少是万为单位的，不难想象这根本行不通。

多线程版本的代码实现：

import concurrent.futures
import requests
import threading
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))


def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_one, sites)

def main():
    sites = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

## 输出
Read 151021 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 129886 from https://en.wikipedia.org/wiki/Portal:Arts
Read 107637 from https://en.wikipedia.org/wiki/Portal:Biography
Read 224118 from https://en.wikipedia.org/wiki/Portal:Society
Read 184343 from https://en.wikipedia.org/wiki/Portal:History
Read 167923 from https://en.wikipedia.org/wiki/Portal:Geography
Read 157811 from https://en.wikipedia.org/wiki/Portal:Technology
Read 91533 from https://en.wikipedia.org/wiki/Portal:Science
Read 321352 from https://en.wikipedia.org/wiki/Computer_science
Read 391905 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 180298 from https://en.wikipedia.org/wiki/Node.js
Read 56765 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 468461 from https://en.wikipedia.org/wiki/PHP
Read 321417 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 324039 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 0.19936635800002023 seconds

总耗时为0.2s左右，效率一下子提升了10倍多。与单线程版本的代码相比，多线程版本的差别主要在download_all()函数中的语句：

   with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_one, sites)

这里创建了一个线程池，总共有5个线程可以分配使用
executer.map()与python中的内置函数map()函数类似，表示对sites中的每个元素并发地调用函数download_one()
download_one()函数中，使用的requests.get()方法是线程安全的(thread-safe)，因此在多线程的环境中，它也可被安全地使用，并不会出现race condition的情况

对于线程池的创建，需要注意线程数量的定义。线程并不是越多越好，因为线程的创建、维护和删除也会有一定的开销。若线程设置过多，反而可能会导致速度变慢，往往我们需要根据实际需求做一些测试，来寻找最优的线程数量。

并行执行版本

若要并行执行下载程序，则只需对原来的代码做出如下修改：

with futures.ThreadPoolExecutor(workers) as executor
=>
with futures.ProcessPoolExecutor() as executor:

即把创建线程池的函数ThreadPoolExecutor()改为创建进程池的函数ProcessPoolExecutor()，不同的是，这里省略了参数workers，因为系统会自动返回CPU的数量作为可以调用的进程数。

要注意的是，对于I/O heavy的任务，使用多进程并不会提升效率，这是因为多数时间仍被用于等待I/O完成。很多时候，因为CPU数量的限制，多进程的效率反而比多线程低。

13.2.2 Futures模块是什么

python中的Futures模块位于concurrent.futures和asyncio中，它们都表示带有延迟的操作。Futures会将处于等待状态的操作包裹起来放到队列中，这些操作的状态随时可查询(这样才能执行并发操作)，它们的结果或异常，也能够在操作完成后被获取。

我们要做的，是schedule Futures的执行。比如，Futures中的Executor类，当我们执行executor.submit(func)时，它会使得func函数执行并返回创建好的Futures实例，方便之后查询使用。

以下是Futures中一些常用函数：

done()：表示相对应的操作是否完成，True表示完成，False表示没完成。
add_done_callback(fn)：表示Futures完成后，相对应的参数函数fn，会被通知并执行调用。
result()：当futures完成后，返回其对应的结果或异常。
as_completed(fs)：针对给定的future迭代器fs(即元素为future的迭代器)，在其完成后，返回完成后的迭代器。

则上述的例子也可以写成下面的形式：

import concurrent.futures
import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        to_do = []
        for site in sites:
            future = executor.submit(download_one, site)
            to_do.append(future)
            
        for future in concurrent.futures.as_completed(to_do):
            future.result()
def main():
    sites = [
        'https://en.wikipedia.org/wiki/Portal:Arts',
        'https://en.wikipedia.org/wiki/Portal:History',
        'https://en.wikipedia.org/wiki/Portal:Society',
        'https://en.wikipedia.org/wiki/Portal:Biography',
        'https://en.wikipedia.org/wiki/Portal:Mathematics',
        'https://en.wikipedia.org/wiki/Portal:Technology',
        'https://en.wikipedia.org/wiki/Portal:Geography',
        'https://en.wikipedia.org/wiki/Portal:Science',
        'https://en.wikipedia.org/wiki/Computer_science',
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        'https://en.wikipedia.org/wiki/Java_(programming_language)',
        'https://en.wikipedia.org/wiki/PHP',
        'https://en.wikipedia.org/wiki/Node.js',
        'https://en.wikipedia.org/wiki/The_C_Programming_Language',
        'https://en.wikipedia.org/wiki/Go_(programming_language)'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

# 输出
Read 129886 from https://en.wikipedia.org/wiki/Portal:Arts
Read 107634 from https://en.wikipedia.org/wiki/Portal:Biography
Read 224118 from https://en.wikipedia.org/wiki/Portal:Society
Read 158984 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 184343 from https://en.wikipedia.org/wiki/Portal:History
Read 157949 from https://en.wikipedia.org/wiki/Portal:Technology
Read 167923 from https://en.wikipedia.org/wiki/Portal:Geography
Read 94228 from https://en.wikipedia.org/wiki/Portal:Science
Read 391905 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 321352 from https://en.wikipedia.org/wiki/Computer_science
Read 180298 from https://en.wikipedia.org/wiki/Node.js
Read 321417 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 468421 from https://en.wikipedia.org/wiki/PHP
Read 56765 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 324039 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 0.21698231499976828 seconds

上面的代码中，首先调用executor.submit()执行下载任务，将下载每一个网站的内容都放入future队列to_do，等待执行。然后使用as_completed()函数在future完成后返回结果。要注意的时，future列表中每个future的执行顺序不一定与future在队列中的顺序一致。实际的执行顺序取决于系统的调度和每个future的执行时间。

异常处理

上面的代码中触发的异常有：

request.get 会触发：ConnectionError, TimeOut, HTTPError等，所有显示抛出的异常都是继承requests.exceptions.RequestException
executor.map(download_one, urls) 会触发concurrent.futures.TimeoutError
result() 会触发Timeout，CancelledError
as_completed() 会触发TimeOutError