Меню Рубрики

Convert doc to doc linux

Python & MS Word: Convert .doc to .docx?

I found several questions that were similar to mine, but none of the answers came close to what I need.

Specifications: I’m working with Python 3 and do not have MS Word. My programming machine is running OS X and cloud machine is linux/ubuntu too.

I’m using python-docx to extract values from a .doc file that is sent to me nightly. However, python-docx only works with .docx files, so I need to convert the file to that extension first.

So, I’ve got a .doc file that I need to convert to .docx . This script might have to run in the cloud so I can’t install any kind of Office or Office-like software. Can this be done?

3 Answers 3

You could use unoconv — Universal Office Converter. Convert between any document format supported by LibreOffice/OpenOffice.


Convert doc to txt via commandline

We’re searching a programm that allows us to convert a doc or docx document to a txt file. We’re working with linux and we want to start a website that converts user uploaded doc files. We don’t wanna use open office/libre office cause we have bad experience with that. Pandoc can’t handle doc files :/

Anyone have a idea?

3 Answers 3

You will have to use two different command-line tools, depending if you are working with .doc or .docx format.

For .doc use catdoc:

For .docx use docx2txt:

The latter will produce a file called foo.txt in the same directory as the original.

I’m not sure which Linux distribution you are using, but both catdoc and docx2txt are available from the Ubuntu repositories, for example:

Or with Homebrew on Mac:

here is a perl project which claims to do it. I have done a lot of this by hand also, using XSLT on the document.xml. the Docx file itself is just a zip file, you can unzip it and inspect the elements. I will say that this is not hard to do for specific files, but is very hard to do in the general case, because of the lack of documentation for how Word internally stores things, and the variance of internal representation.

For doc files you may use antiword, it’s available on Homebrew and Ubuntu.

Not the answer you’re looking for? Browse other questions tagged linux ms-word doc or ask your own question.


Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

site design / logo © 2020 Stack Exchange Inc; user contributions licensed under cc by-sa. rev 2020.9.18.37632


Convert doc to doc linux

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

Are you new to LinuxQuestions.org? Visit the following links:
Site Howto | Site FAQ | Sitemap | Register Now

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Introduction to Linux — A Hands on Guide

This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter. For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author’s experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.


Is there any GNU/Linux command line utility that converts .doc(x) files to .pdf? [closed]

Want to improve this question? Update the question so it’s on-topic for Stack Overflow.

Closed 3 years ago .

Surely I am the 100th user who is asking this but after I have searched through the similar topics here and on other websites I still cannot find what I need. I like to have a simple command line tool for my GNU/Linux which converts .doc(x) files to .pdf BUT the output looks like the same as the original. So Libre Office is not good choise for this because it does not convert good in some cases. I have found a website http://www.freepdfconvert.com/ which does the job wery well, but I cannot upload there any sensitive files since it is a big risk. I dont say they would do anything bad with them but its how it is. If I wont find any good tool maybe I will have to write one by myself.

2 Answers 2

Unfortunately there are no Linux-based guaranteed 1-to-1 convertors for Word (doc/docx) to PDF. This is because Word, a Microsoft product, uses a proprietary format that changes slightly with every release. As it was not traditionally a publicly documented format and Microsoft does not port Word/Office to Linux (nor ever will) then you must rely upon reverse engineered third party tools for older formats (doc) and proper interpretation of the Office Open XML format by third party developers.

We found the best open source solution is LibreOffice (which was forked from OpenOffice.org, which itself was called Star Office before it was open sourced). It is much more actively developed than AbiWord, as another answer suggested.

The usage from the command line is simple and well documented with plenty of examples:

Or also you can use libreoffice instead of soffice on newer versions.

There is also Pandoc .

Pandoc, mainly known for its Markdown-capable processing goodness (for outputting HTML, LaTeX, PDF, EPUB and what-not) in recent months has gained a rather well-working capability to process DOCX input files.
(NOTE: Pandoc only works for DOCX, not for DOC files.)

For its PDF output to work, it requires a working LaTeX installation (with either or all of pdflatex , lualatex and xelatex included). In this case the following simple command should work:

Note however, that the output layout and font styles now will not look at all similar to what it would look if you exported the DOCX from Word to PDF. It will be using the styles of a default LaTeX document.

You can influence the output style of the LaTeX-generated PDF by using a custom template file like this.

. but this is a feature more for Pandoc/LaTeX experts to use than for beginners.


How to convert Word (doc) to PDF in linux?

I have a set of files in .doc format, that need to be converted to .pdf format. I am using Ubuntu linux.

10 Answers 10

Then navigate to System > Administration > Printing and create a new printer, set it as a PDF file printer, and name it as «pdf».

Now you’ll find your .pdf file in

If the tetex-extra package is not available with your distribution, try texlive-base plus texlive-latex-base:

/PDF path to somewhere else ? – hd. Jan 13 ’13 at 5:05

Printing to PDF loses a lot of the document metadata (title, authorship, the headings tree that is used for navigation, and so on).

Install unoconv, convert with: unoconv -fpdf file1.doc file2.doc…

If you’re running X then you can do it through Open Office. Since you’re about to object to doing it manually, remember there’s some nice macro scripts in Open Office so you can automate it. You can do something similar with AbiWord (AbiWord —to=pdf).

If you’ve not got X then there is antiword, but that just extracts the text — doesn’t do any formatting or graphics. There’s also wvWare which I’ve used to bulk extract images from doc files, but I’ve never tried using it to convert doc files to pdfs.

Oh and .docx files may well need something different, but since they’re just zipped xml files it shouldn’t be too difficult to do something useful with them. For bulk extracting images you just unzip them and copy the images directory, but I’ve never needed to convert them in Linux.


PDF to word conversion software?

Is there any free software available on Ubuntu that can convert a pdf file to a .doc file?

5 Answers 5

openoffice (or alternatively the libreoffice fork) both have pdf import plugins and .doc export functionality. though both aspects suffer from conversion issues AFAIK. By this I mean that the conversion fidelity isn’t always 100%.

Abiword also works in a similar way, if OpenOffice doesn’t work on your system.

Download Abiword from Ubuntu Software Center or you can install it by typing following command in terminal:

Then perform the conversion:

I’ve had great success with PDF to Word online. This is not a desktop application, but a service, that works better than other things I’ve used.

Install AbiWord from Ubuntu Software Center

Open Pdf Files with it.

Use Save As.. to save pdf in Word Doc format.

I prefer converting PDF files first to HTML using pdftohtml included in the poppler-utils package, for example by means of a Nautilus Script merely consisting of this command:

Then I open the resulting HTML file in LibreOffice Writer, and (after a little editing) Save As any other document format I like.

Note: Adding -i parameter to the command above produces HTML file without images.