Python & MS Word: Convert .doc to .docx?
I found several questions that were similar to mine, but none of the answers came close to what I need.
Specifications: I’m working with Python 3 and do not have MS Word. My programming machine is running OS X and cloud machine is linux/ubuntu too.
I’m using python-docx to extract values from a .doc file that is sent to me nightly. However, python-docx only works with .docx files, so I need to convert the file to that extension first.
So, I’ve got a .doc file that I need to convert to .docx . This script might have to run in the cloud so I can’t install any kind of Office or Office-like software. Can this be done?
3 Answers 3
You could use unoconv — Universal Office Converter. Convert between any document format supported by LibreOffice/OpenOffice.
Convert doc to txt via commandline
We’re searching a programm that allows us to convert a doc or docx document to a txt file. We’re working with linux and we want to start a website that converts user uploaded doc files. We don’t wanna use open office/libre office cause we have bad experience with that. Pandoc can’t handle doc files :/
Anyone have a idea?
3 Answers 3
You will have to use two different command-line tools, depending if you are working with .doc or .docx format.
For .doc use catdoc:
For .docx use docx2txt:
The latter will produce a file called foo.txt in the same directory as the original.
I’m not sure which Linux distribution you are using, but both catdoc and docx2txt are available from the Ubuntu repositories, for example:
Or with Homebrew on Mac:
here is a perl project which claims to do it. I have done a lot of this by hand also, using XSLT on the document.xml. the Docx file itself is just a zip file, you can unzip it and inspect the elements. I will say that this is not hard to do for specific files, but is very hard to do in the general case, because of the lack of documentation for how Word internally stores things, and the variance of internal representation.
For doc files you may use antiword, it’s available on Homebrew and Ubuntu.
Not the answer you’re looking for? Browse other questions tagged linux ms-word doc or ask your own question.
Related
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
site design / logo © 2020 Stack Exchange Inc; user contributions licensed under cc by-sa. rev 2020.9.18.37632
Convert doc to doc linux
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto | Site FAQ | Sitemap | Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.