本网站(662p.com)打包出售,且带程序代码数据,662p.com域名,程序内核采用TP框架开发,需要联系扣扣:2360248666 /wx:lianweikj
精品域名一口价出售:1y1m.com(350元) ,6b7b.com(400元) , 5k5j.com(380元) , yayj.com(1800元), jiongzhun.com(1000元) , niuzen.com(2800元) , zennei.com(5000元)
需要联系扣扣:2360248666 /wx:lianweikj
并发编程之Futures
追忆似水年华 · 1007浏览 · 发布于2019-07-05 +关注

区分并发和并行

  并发(Concurrency).

  由于Python 的解释器并不是线程安全的,为了解决由此带来的 race condition 等问题,Python 便引入了全局解释器锁,也就是同一时刻,只允许一个线程执行。当然,在执行 I/O 操作时,如果一个线程被 block 了,全局解释器锁便会被释放,从而让另一个线程能够继续执行。所以在Python中,并发并不是指同一时刻有多个操作(thread、task)同时进行,而是同一时刻,只允许有一个线程或任务执行。

  

  并行(Parallelism)

  指多个进程完全同步同时的执行。

  

并发编程之 Futures

  单线程与多线程性能比较

  假设我们有一个任务,是下载一些网站的内容并打印。如果用单线程的方式,它的代码实现如下所示

import requestsimport timedef download_one(url):
   resp = requests.get(url)    
   print('Read {} from {}'.format(len(resp.content), url))    
def download_all(sites):    
for site in sites:
       download_one(site)def main():
   sites = [        
   'https://en.wikipedia.org/wiki/Portal:Arts',        
   'https://en.wikipedia.org/wiki/Portal:History',        
   'https://en.wikipedia.org/wiki/Portal:Society',        
   'https://en.wikipedia.org/wiki/Portal:Biography',        
   'https://en.wikipedia.org/wiki/Portal:Mathematics',        
   'https://en.wikipedia.org/wiki/Portal:Technology',        
   'https://en.wikipedia.org/wiki/Portal:Geography',        
   'https://en.wikipedia.org/wiki/Portal:Science',        
   'https://en.wikipedia.org/wiki/Computer_science',        
   'https://en.wikipedia.org/wiki/Python_(programming_language)',        
   'https://en.wikipedia.org/wiki/Java_(programming_language)',        
   'https://en.wikipedia.org/wiki/PHP',        
   'https://en.wikipedia.org/wiki/Node.js',        
   'https://en.wikipedia.org/wiki/The_C_Programming_Language',        
   'https://en.wikipedia.org/wiki/Go_(programming_language)'
   ]
   start_time = time.perf_counter()
   download_all(sites)
   end_time = time.perf_counter()    
   print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))    
if __name__ == '__main__':
   main()# 输出Read 129196 from https://en.wikipedia.org/wiki/Portal:Arts
Read 183867 from https://en.wikipedia.org/wiki/Portal:History
Read 224161 from https://en.wikipedia.org/wiki/Portal:Society
Read 114387 from https://en.wikipedia.org/wiki/Portal:Biography
Read 152871 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 156339 from https://en.wikipedia.org/wiki/Portal:Technology
Read 162872 from https://en.wikipedia.org/wiki/Portal:Geography
Read 91504 from https://en.wikipedia.org/wiki/Portal:Science
Read 323262 from https://en.wikipedia.org/wiki/Computer_science
Read 391073 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 319710 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 470754 from https://en.wikipedia.org/wiki/PHP
Read 180774 from https://en.wikipedia.org/wiki/Node.js
Read 56799 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 325451 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 67.349395015 seconds

  以上代码的流程:先是遍历存储网站的列表; 然后对当前网站执行下载操作;等到当前操作完成后,再对下一个网站进行同样的操作,一直到结束。

  接下来看多线程版本

import concurrent.futuresimport requestsimport threadingimport timedef download_one(url):    
try:
       resp = requests.get(url)        
       print('Read {} from {}'.format(len(resp.content), url))    
       except Exception as ex:        
       print(ex)def download_all(sites):
   with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
       results = executor.map(download_one, sites)    
       # with concurrent.futures.ProcessPoolExecutor() as executor:
   #     results = executor.map(download_one,sites)def main():
   sites = [        
   'https://en.wikipedia.org/wiki/Portal:Arts',        
   'https://en.wikipedia.org/wiki/Portal:History',        
   'https://en.wikipedia.org/wiki/Portal:Society',        
   'https://en.wikipedia.org/wiki/Portal:Biography',        
   'https://en.wikipedia.org/wiki/Portal:Mathematics',        
   'https://en.wikipedia.org/wiki/Portal:Technology',        
   'https://en.wikipedia.org/wiki/Portal:Geography',        
   'https://en.wikipedia.org/wiki/Portal:Science',        
   'https://en.wikipedia.org/wiki/Computer_science',        
   'https://en.wikipedia.org/wiki/Python_(programming_language)',        
   'https://en.wikipedia.org/wiki/Java_(programming_language)',        
   'https://en.wikipedia.org/wiki/PHP',        
   'https://en.wikipedia.org/wiki/Node.js',        
   'https://en.wikipedia.org/wiki/The_C_Programming_Language',        
   'https://en.wikipedia.org/wiki/Go_(programming_language)'
   ]
   start_time = time.perf_counter()
   download_all(sites)
   end_time = time.perf_counter()    
   print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
   if __name__ == '__main__':
   main()# 输出Read 114387 from https://en.wikipedia.org/wiki/Portal:Biography
Read 129196 from https://en.wikipedia.org/wiki/Portal:Arts
Read 183867 from https://en.wikipedia.org/wiki/Portal:History
Read 152871 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 224161 from https://en.wikipedia.org/wiki/Portal:Society
Read 156339 from https://en.wikipedia.org/wiki/Portal:Technology
Read 91504 from https://en.wikipedia.org/wiki/Portal:Science
Read 391073 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 162872 from https://en.wikipedia.org/wiki/Portal:Geography
Read 323262 from https://en.wikipedia.org/wiki/Computer_science
Read 56799 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 319710 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 325451 from https://en.wikipedia.org/wiki/Go_(programming_language)
Read 180774 from https://en.wikipedia.org/wiki/Node.js
Read 470754 from https://en.wikipedia.org/wiki/PHP
Download 15 sites in 10.022916933 seconds

  以上代码效率提高了6倍。使用ThreadPoolExecutor创建了一个线程池,max_workers分配了5个线程,executor.map(download_one, sites)对sites的元素并发的调用download_one函数。其中requests.get()方法是线程安全的(thread-safe),在多线程环境中可以安全地使用。线程的数量虽可以自定,但过多的线程会造成系统的开销增大。可以根据实际需求做测试,寻找最优线程数量。

  以上代码也可以用并行的方法来实现。在download_all()函数中:

with futures.ThreadPoolExecutor(workers) as executor
=>
with futures.ProcessPoolExecutor() as executor:

  对于这种IO场景,用并行的方式并不会比并发的方式效率高.

到底什么是 Futures ?

   Python 中的 Futures 模块,位于 concurrent.futures 和 asyncio 中,它们都表示带有延迟的操作。Futures 会将处于等待状态的操作包裹起来放到队列中,这些操作的状态随时可以查询,当然,它们的结果或是异常,也能够在操作完成后被获取。

import concurrent.futuresimport requestsimport timedef download_one(url):
   resp = requests.get(url)    
   print('Read {} from {}'.format(len(resp.content), url))    
   return f'download {len(resp.content)} ok'# def over(arg):#    
   print(arg)#    
   print('over')def download_all(sites):    
   #future列表中每个future完成的顺序,和它在列表中的顺序并不一定完全一致。
   #到底哪个先完成、哪个后完成,取决于系统的调度和每个future的执行时间
   with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
       to_do = []        
       for site in sites:            
       #executor.submit返回future实例
           future = executor.submit(download_one, site)
           to_do.append(future)            
           #future.add_done_callback(over)
       
       #在futures完成后打印结果
       for future in concurrent.futures.as_completed(to_do):            
       print(future.result())def main():
   sites = [        
   'https://en.wikipedia.org/wiki/Portal:Arts',        
   'https://en.wikipedia.org/wiki/Portal:History',        
   'https://en.wikipedia.org/wiki/Portal:Society',        
   'https://en.wikipedia.org/wiki/Portal:Biography',        
   'https://en.wikipedia.org/wiki/Portal:Mathematics',        
   'https://en.wikipedia.org/wiki/Portal:Technology',        
   'https://en.wikipedia.org/wiki/Portal:Geography',        
   'https://en.wikipedia.org/wiki/Portal:Science',        
   'https://en.wikipedia.org/wiki/Computer_science',        
   'https://en.wikipedia.org/wiki/Python_(programming_language)',        
   'https://en.wikipedia.org/wiki/Java_(programming_language)',        
   'https://en.wikipedia.org/wiki/PHP',        
   'https://en.wikipedia.org/wiki/Node.js',        
   'https://en.wikipedia.org/wiki/The_C_Programming_Language',        
   'https://en.wikipedia.org/wiki/Go_(programming_language)'
   ]
   start_time = time.perf_counter()
   download_all(sites)
   end_time = time.perf_counter()    
   print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
   if __name__ == '__main__':
   main()# 输出Read 129886 from https://en.wikipedia.org/wiki/Portal:Arts
Read 107634 from https://en.wikipedia.org/wiki/Portal:Biography
Read 224118 from https://en.wikipedia.org/wiki/Portal:Society
Read 158984 from https://en.wikipedia.org/wiki/Portal:Mathematics
Read 184343 from https://en.wikipedia.org/wiki/Portal:History
Read 157949 from https://en.wikipedia.org/wiki/Portal:Technology
Read 167923 from https://en.wikipedia.org/wiki/Portal:Geography
Read 94228 from https://en.wikipedia.org/wiki/Portal:Science
Read 391905 from https://en.wikipedia.org/wiki/Python_(programming_language)
Read 321352 from https://en.wikipedia.org/wiki/Computer_science
Read 180298 from https://en.wikipedia.org/wiki/Node.js
Read 321417 from https://en.wikipedia.org/wiki/Java_(programming_language)
Read 468421 from https://en.wikipedia.org/wiki/PHP
Read 56765 from https://en.wikipedia.org/wiki/The_C_Programming_Language
Read 324039 from https://en.wikipedia.org/wiki/Go_(programming_language)
Download 15 sites in 0.21698231499976828 seconds

  future列表中每个future完成的顺序,和它在列表中的顺序并不一定完全一致。到底哪个先完成、哪个后完成,取决于系统的调度和每个future的执行时间。

  并发通常用于 I/O 操作频繁的场景,而并行则适用于 CPU heavy 的场景。


相关推荐

PHP实现部分字符隐藏

沙雕mars · 1325浏览 · 2019-04-28 09:47:56
Java中ArrayList和LinkedList区别

kenrry1992 · 908浏览 · 2019-05-08 21:14:54
Tomcat 下载及安装配置

manongba · 970浏览 · 2019-05-13 21:03:56
JAVA变量介绍

manongba · 962浏览 · 2019-05-13 21:05:52
什么是SpringBoot

iamitnan · 1086浏览 · 2019-05-14 22:20:36
加载中

0评论

评论
分类专栏
小鸟云服务器
扫码进入手机网页