Can anyone point me to a good HTML to Text Converter?
I'm sending out pages and the incoming message is HTML,
but I want to send it out as text. I want to take all the HTML
imbedded in it out. I'm looking on the web, but I can't seem
to find anything that will allow me to pipe it through the command.
Thanks.James
If this is a one time thing, open it up in lynx or some other browser and just copy/paste the text.
What are you trying to accomplish exactly? Is this for an alert message?
On 10/23/07, James Wade <jkwade at futurefrontiers.com> wrote:
Can anyone point me to a good HTML to Text Converter?
I'm sending out pages and the incoming message is HTML,
but I want to send it out as text. I want to take all the HTML
imbedded in it out. I'm looking on the web, but I can't seem
to find anything that will allow me to pipe it through the command.
Thanks…James
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
I pipe my alerts to a perl script
below is the stripping html portion - the main message is $body here :
right below here i strip out html
$body =~ s{ <! # comments begin with a `<!' # followed by 0 or more comments;
(.*?) # this is actually to eat up comments in non
# random places
( # not suppose to have any white space here
# just a quick start;
-- # each comment starts with a `--'
.*? # and includes all text up to and including
-- # the *next* occurrence of `--'
\s* # and may have trailing while space
# (albeit not leading white space XXX)
)+ # repetire ad libitum XXX should be * not +
(.*?) # trailing non comment text
> # up to a `>' }{ if ($1 || $3) { # this silliness for embedded comments in tags "<!$1 $3>"; } }gesx; # mutate into nada, nothing, and niente
$body =~ s{ < # opening angle bracket
(?: # Non-backreffing grouping paren
[^>'"] * # 0 or more things that are neither > nor ' nor "
| # or else
".*?" # a section between double quotes (stingy match)
| # or else
'.*?' # a section between single quotes (stingy match)
) + # repetire ad libitum
# hm.... are null tags <> legal? XXX
> # closing angle bracket }{}gsx; # mutate into nada, nothing, and niente
$body =~ s{ (
& # an entity starts with a semicolon
(
\x23\d+ # and is either a pound (#) and numbers
| # or else
\w+ # has alphanumunders up to a semi
)
;? # a semi terminates AS DOES ANYTHING ELSE (XXX)
)
} {
$entity{$2} # if it's a known entity use that
|| # but otherwise
$1 # leave what we'd found; NO WARNINGS (XXX)
}gex; # execute replacement -- that's code not a string
From: James Wade [mailto:jkwade at futurefrontiers.com] Sent: Tuesday, October 23, 2007 12:34 PM To: hobbit at hswn.dk Subject: [hobbit] Paging -- HTML to Text
Can anyone point me to a good HTML to Text Converter?
I'm sending out pages and the incoming message is HTML,
but I want to send it out as text. I want to take all the HTML
imbedded in it out. I'm looking on the web, but I can't seem
to find anything that will allow me to pipe it through the command.
Thanks.James
participants (3)
-
jkwade@futurefrontiers.com
-
josh@imaginenetworksllc.com
-
sclark@nyroc.rr.com