java使用jsoup按指定元素位置爬取网页信息

  1.  导入需要的jar包:jsoup-1.6.1.jar,github地址链接:https://github.com/zhangliqingyun/jarlist/blob/master/jsoup/jsoup-1.6.1.jar
  2. 根据需要爬取的网页链接得到连接的文档对象:

        TestJsoup t = new TestJsoup();  

         Document doc = t.getDocument(“http://www.weather.com.cn/html/weather/101280101.shtml”);  

 //根据网页链接得到结果文档

     public  Document getDocument (String url){

         try {

             return Jsoup.connect(url).get();

         } catch (IOException e) {

             e.printStackTrace();

         }

         return null;

     }

  1. 分析原网页中需要的数据所处的位置,以及应该使用的过滤条件进行过滤

  /*  

     <li class=”sky skyid lv3 on”>

     <h1>7日(今天)</h1>

     <big class=”png40 d08″></big>

     <big class=”png40 n08″></big>

     <p title=”中雨” class=”wea“>中雨</p>

     <p class=”tem“>

     <span>22</span>/<i>17℃</i>

     </p>

     <p class=”win”>

     <em>

     <span title=”北风” class=”N”></span>

     <span title=”北风” class=”N”></span>

     </em>

     <i>3-4级</i>

     </p>

     <div class=”slid”></div>

     </li>

    */  
 

4.根据得到的文档对象使用条件进行筛选   

         // 获取目标HTML代码

         Elements elements1 = doc.select(“[class=sky skyid lv3 on]”);

         // 今天

         Elements elements2 = elements1.select(“h1”);

         String today = elements2.get(0).text();

         System.out.println(“日期为:”+today);

         // 是否有雨

         Elements elements4 = elements1.select(“[class=wea]”);

         String rain = elements4.get(0).text();

         System.out.println(“是否有雨:”+rain);

         // 高的温度

         Elements elements5 = elements1.select(“span”);

         String highTemperature = elements5.get(0).text()+“°C”;

         System.out.println(“最高温度”+highTemperature);

         // 低的温度

         Elements elements7 = elements1.select(“i”);

         String lowTemperature = elements7.get(0).text();

         System.out.println(“最低温度”+lowTemperature);

         // 风力

         String wind = elements7.get(1).text();     

         System.out.println(“风力”+wind);

  1. 根据股票数据爬取股票价格数据:

import java.io.IOException;

 

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.select.Elements;

 

public class gupiaodata {

 

 //根据网页链接得到结果文档

    public  Document getDocument (String url){

        try {

            return Jsoup.connect(url).get();

        } catch (IOException e) {

            e.printStackTrace();

        }

        return null;

    }

    

    

    

    /*

     <div class=”stock-info” data-spm=”2″>

        <div class=”stock-bets”>

            <h1>

                 <a class=”bets-name” href=”/stock/sh600551.html“>

                                                    时代出版 (<span>600551</span>)

                 </a>

                 <span class=”state f-up”>已收盘 2017-11-07  14:59:59

                 </span>

           </h1>

           <div class=”price s-up “>

                 <strong  class=”_close”>13.12</strong>

                 <span>+0.01</span>

                 <span>+0.08%</span>

           </div>

           <div class=”bets-content”>

        

     */

    public static void main(String[] args) {

     gupiaodata gupiao = new gupiaodata();

     Document document = gupiao.getDocument(“https://gupiao.baidu.com/stock/sh600551.html?from=aladingpc”);

     Elements element1 = document.select(“[class=stock-bets]”);

    

     //得到股票名称

     Elements element2 = element1.select(“[class=bets-name]”);

     String name = element2.get(0).text();

     System.out.println(“股票名称”+name);

    

     //得到股票代码

     Elements element3 = element2.select(“span”);

     String code = element3.get(0).text();

     System.out.println(“股票代码”+code);

    

     //得到股票的收盘价

     Elements element4 = element1.select(“[class=_close]”);

     String price = element4.get(0).text();

     System.out.println(“股票价格”+price);

    

     //得到股票的涨幅值

     Elements element5 = element1.select(“span”);

     String floatprice = element5.get(2).text();

     String floatpercent = element5.get(3).text();

     System.out.println(“股票的涨幅为:”+floatprice+”    涨幅百分比为:”+floatpercent);

}

}

转载自:https://blog.csdn.net/ZHANGLIZENG/article/details/87854902

You may also like...