HaXml で html をパース

Haskell Haxml モジュール サンプルコード - x76789の研究メモ を参考にして、http://qb6.2ch.net/_403/madakana.cgi を見て自分が規制されているかどうか表示するコードを書いた。
Parsec もそうだけど、コンビネータを使ったパーサって使い方が面白い。ある意味直感的。


regulated.hs

module Main where
import Network.HTTP
import Text.XML.HaXml
import Text.XML.HaXml.Pretty (content)
import Data.List (isSuffixOf)
import Data.Maybe (fromMaybe)

get :: String -> IO String
get url = simpleHTTP (getRequest url) >>= getResponseBody

chomp :: String -> String
chomp str | "\n" `isSuffixOf` str = init str
          | otherwise = str

regulated :: String -> Maybe [String]
regulated html = 
  let 
    Document _ _ (Elem _ _ con) _ = htmlParse "madakana.cgi" html
    filter = deep $ path [tag "font", attrval ("color", AttValue [Left "red"]), children, tag "b", children, txt]
    list = map (chomp . show . content) $ concatMap filter con
  in if length list <= 1 then Nothing else Just $ tail list

main = get url >>= mapM_ putStrLn . fromMaybe ["not regulated!"] . regulated
  where
    url = "http://qb6.2ch.net/_403/madakana.cgi"


実行結果

% ghc --make regulated.hs
[1 of 1] Compiling Main             ( regulated.hs, regulated.o )
Linking regulated ...
% ./regulated
_BBS_news_xxxxxxxxxx

追記

path を使うより /> を使ったほうが、より直感的に見やすいかな。

--- regulated.hs
+++ regulated.hs
@@ -17,7 +17,7 @@
 regulated html = 
   let 
     Document _ _ (Elem _ _ con) _ = htmlParse "madakana.cgi" html
-    filter = deep $ path [tag "font", attrval ("color", AttValue [Left "red"]), children, tag "b", children, txt]
+    filter = deep $ tag "font" `with` attrval ("color", AttValue [Left "red"]) /> tag "b" /> txt
     list = map (chomp . show . content) $ concatMap filter con
   in if length list <= 1 then Nothing else Just $ tail list