UP | HOME |

Emacs. Transform a html page to an Org file

Emacs. Transform a html page to an Org file

To transform a html (eww) page to an Org mode, the easy way is to use pandoc. The option extract-images creates a directory where images are stored.

pandoc -f html -t org -o output.org --extract-media=images  https://torres.epv.uniovi.es/centon/visualizacion-congestion-puertos.html 

The original html is: captura-puertos.png

And the final Org file is:

captura-puertos-org.png

So we use this instruction in a elisp function:

   (defun etm-eww-html-to-org (&optional url)
         "Convert a URL or a web page (eww) to org text.

It includes images, that are stored in the directory
images."
         (interactive nil eww-mode)
         (let ((url (or url (plist-get eww-data :url)))
                   (dirimages "images"))
           (switch-to-buffer (generate-new-buffer "*eww2org*"))
           (unless (executable-find "pandoc")
                 (error "The program pandoc does not exist."))
           (message "Transforming %s" url)
           (shell-command (concat "pandoc -f html -t org  --extract-media=" dirimages " "  url) (current-buffer))
           (org-mode)))