HaXml で html をパース
Haskell Haxml モジュール サンプルコード - x76789の研究メモ を参考にして、http://qb6.2ch.net/_403/madakana.cgi を見て自分が規制されているかどうか表示するコードを書いた。
Parsec もそうだけど、コンビネータを使ったパーサって使い方が面白い。ある意味直感的。
regulated.hs
module Main where import Network.HTTP import Text.XML.HaXml import Text.XML.HaXml.Pretty (content) import Data.List (isSuffixOf) import Data.Maybe (fromMaybe) get :: String -> IO String get url = simpleHTTP (getRequest url) >>= getResponseBody chomp :: String -> String chomp str | "\n" `isSuffixOf` str = init str | otherwise = str regulated :: String -> Maybe [String] regulated html = let Document _ _ (Elem _ _ con) _ = htmlParse "madakana.cgi" html filter = deep $ path [tag "font", attrval ("color", AttValue [Left "red"]), children, tag "b", children, txt] list = map (chomp . show . content) $ concatMap filter con in if length list <= 1 then Nothing else Just $ tail list main = get url >>= mapM_ putStrLn . fromMaybe ["not regulated!"] . regulated where url = "http://qb6.2ch.net/_403/madakana.cgi"
実行結果
% ghc --make regulated.hs [1 of 1] Compiling Main ( regulated.hs, regulated.o ) Linking regulated ... % ./regulated _BBS_news_xxxxxxxxxx
追記
path を使うより /> を使ったほうが、より直感的に見やすいかな。
--- regulated.hs +++ regulated.hs @@ -17,7 +17,7 @@ regulated html = let Document _ _ (Elem _ _ con) _ = htmlParse "madakana.cgi" html - filter = deep $ path [tag "font", attrval ("color", AttValue [Left "red"]), children, tag "b", children, txt] + filter = deep $ tag "font" `with` attrval ("color", AttValue [Left "red"]) /> tag "b" /> txt list = map (chomp . show . content) $ concatMap filter con in if length list <= 1 then Nothing else Just $ tail list