2007-04-05

Word DOC to HTML

It seems that whenever I try to find some simple piece of shareware or freeware ... especially freeware, on the internet using a search engine, I get a list of pages that invariably don't have what I want in their list of dodgy, overpriced software. I aso get for my trouble, at least 50% of the page splattered in eye wrenching advertisements that turn me off even casually browsing the descriptions of the software. Most of these pages will try to launch a pop up advertisment, and all of them will be running at least one set of tracking javascript. I loathe the authors of these pages.

There are some nice exceptions, SourceForge being a favourite. TUCOWS used to be excellent but now has joined the stinking pile of ad riddled sites.

A great idea is to get an ad blocker for your browser like AdBlock for FireFox or Safari Block for Safari.

I was looking for freeware to convert a folder full of MS Word documents to HTML. A simple task that I could have easily coded in VBScript (yes, it is useful for something). If you are any sort of HTML coder you will know that Microsoft products tend to generate the most ugly and bloated HTML code possible that will probably be least compatible with browsers available from companies other than Microsoft.

I spent 30 minutes looking for software, and found myself unwilling to download anything because if the web site looks dodgy then the software on it is probably dodgy. Phht. So then I start the search for VBScript to do it. I mean it should have been done a dozen times before and why reinvent the wheel, even if it is an easy wheel to invent.

Well I spent a while browsing, getting distracted by various other scripts and in the end just gave up and wrote it myself. Sure, this is easy, short, but since I really didn't find it anywhere else I am going to share my work in the hopes of saving some other poor slob the drudgery of creating it.

Option Explicit
'==========================================================================
' Word2HTML
'
' COMMENT:
' Converts all of the DOC files to HTML files in the specified folder and subfolders.
' DOC files are not destroyed or altered (unless MS Word messes with things)
'
' RESOURCES:
' http://www.microsoft.com/technet/scriptcenter/resources/qanda/hsgarch.mspx
' http://www.activexperts.com/activmonitor/windowsmanagement/scripts/msoffice/word/
' http://www.wizardwrx.com/CommonTypelibs/O2002_MSWORD.CSV
' http://www.helenfeddema.com/CodeSamples.htm
' http://www.pcsupportadvisor.com/Windows_scripting_host_page2.htm
'==========================================================================

Const SourceFolder = "Q:\PROJECTS\DocFolder"

' Alter the SourceFolder to point to your folder containing the DOC files
' *** Do not change the code below

' NOTE: You can Save As any of the below formats
' Const wdFormatDocument=0
' Const wdFormatDOSText=4
' Const wdFormatDOSTextLineBreaks=5
' Const wdFormatEncodedText=7
Const wdFormatHTML=8
' Const wdFormatRTF=6
' Const wdFormatTemplate=1
' Const wdFormatText=2
' Const wdFormatTextLineBreaks=3
' Const wdFormatUnicodeText=7
Dim thisFormat
myFormat = wdFormatHTML
myFormatExt = ".html"


Sub DocSaveAs (thisDocName, thisFormat)
Dim sDocFile, fso, wdo, wdocs, wdoc, sFolder, sOutFile
WScript.Echo thisDocName
sDocFile = thisDocName
Set fso = CreateObject("Scripting.FileSystemObject")
Set wdo = CreateObject("Word.Application")
sDocFile = fso.GetAbsolutePathName(sDocFile)
sFolder = fso.GetParentFolderName(sDocFile)
If Len(sFolder) > 0 And Mid(sFolder,Len(sFolder),1) <> "\" Then sFolder = sFolder + "\"
sOutFile = sFolder + fso.GetBaseName(sDocFile) + myFormatExt
Set wdoc = wdo.Documents.Open(sDocFile)
wdoc.SaveAs sOutFile, thisFormat
wdo.quit
Set wdo = Nothing
Set fso = Nothing
End Sub


Dim strComputer, objWMIService
strComputer = "." ' can be altered to a network computer name eg: sph-simonf
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")

Sub ProcessFolderHTML (thisFolder)
Dim colFolder, objFolder, strFolderName, colFiles, objFile
If Mid(thisFolder,Len(thisFolder),1) = "\" Then thisFolder = Mid(thisFolder,1,Len(thisFolder)-1)
Set colFiles = objWMIService.ExecQuery ("ASSOCIATORS OF {Win32_Directory.Name='" & _
thisFolder & "'} Where ResultClass = CIM_DataFile")
If IsNull(colFiles) Then Exit Sub
For Each objFile in colFiles
' PROCESS FILE
If objFile.Extension = "doc" Then DocSaveAs objFile.Name, myFormat
Next
' RECURSE SUB-FOLDERS
Set colFolder = objWMIService.ExecQuery ("Associators of {Win32_Directory.Name='" & _
thisFolder & "'} Where AssocClass = Win32_Subdirectory ResultRole = PartComponent")
For Each objFolder in colFolder
ProcessFolderHTML objFolder.Name
Next
End Sub


WScript.Echo "Word2HTML: Processing folder '" & SourceFolder & "'"
ProcessFolderHTML SourceFolder
WScript.Echo "Word2HTML: Conversion complete"

No comments: