Difference between revisions of "Computer Productivity Hacks"

From TeleCafeWiki
Jump to navigation Jump to search
(→‎Text Extraction: Moved content to Data Analysis.)
Line 120: Line 120:
  
 
== Text Extraction ==
 
== Text Extraction ==
=== PDF Conversion ===
+
Moved content to [[Data Analysis]].
* [http://pdftotext.org/ pdftotext.org]
 
pdftotext.org is the best online service for easily extracting text from your PDF files. Conversion from PDF to TXT is really fast thanks to our in-browser conversion architecture. Your PDF files are never uploaded to the Internet, so even private PDF files are safe to convert with this service. The conversion is done locally in your browser – you can even convert when you are offline! There is no need for any registration or sign-up, and the service will always be free to use.
 
 
 
* [https://github.com/coolwanglu/pdf2htmlEX pdf2htmlEX]
 
: Convert PDF to HTML without losing text or format.
 
 
 
* [http://stackoverflow.com/questions/6187250/pdf-text-extraction PDF TEXT Extraction]
 
: Lists several options.
 
 
 
* [http://capture2text.sourceforge.net/ Capture2Text]
 
: Capture2Text enables users to do the following:
 
# Optical Character Recognition (OCR)
 
# Speech Recognition
 
 
 
* [https://sourceforge.net/projects/doctotext/ SILVERCODERS DocToText]
 
: Extracts plain text from documents in all popular formats.
 
 
 
* [https://sourceforge.net/projects/detexter/ Detexter]
 
: Detexter is an app designed to extract text from PDF files.
 
 
 
=== Data Scrape ===
 
* [https://docs.google.com/a/evolvnet.com/document/d/18Q2THQvYCG2_n6nKVsZRHlaPG9iJ9NvLezOOQbEuAJs/edit?hl=en Almost Scraping: Web Scraping for Non-Programmers]
 
: Tools and tips compiled by journalists from PBS and Omaha World-Herald.
 
 
 
* [http://blog.scrapinghub.com/2014/01/18/open-source-at-scrapinghub/ Open Source at Scrapinghub]
 
: Scrapinghub's list of open source scraping projects.
 
 
 
* [http://sitestalker.net/ Sitestalker]
 
: Monitor website links with ease. Sitestalker supervises websites and notifies you when your desired content hits the web.Stop wasting your time constantly refreshing websites.
 
: Sitestalker is great for:
 
:: Finding jobs
 
:: Searching for an apartment
 
:: Getting the best bargains
 
:: Clipping
 
 
 
* [http://www.notprovided.eu/six-tools-web-scraping-use-data-journalism-creating-insightful-content/ Six tools for web scraping – To use for data journalism & creating insightful content]
 
: Tools for gathering data from public sources.
 
 
 
=== Text Search ===
 
* [http://geekdadaji.com/ geekDadaji - A SEARCH INITIATIVE]
 
: Makes tools to search text content, including:
 
# [https://sourceforge.net/projects/falcontextsearch/ FALCON - Text Search Java Project]: JSON based text search Java Project
 
# [https://sourceforge.net/projects/hawksearch/ HAWK - PDF Text Search Java Project]: Taking initiative for Document Text Search
 
 
 
* [http://www.foolabs.com/xpdf/home.html Xpdf: A PDF Viewer for X]
 
: Xpdf is an open source viewer for Portable Document Format (PDF) files.
 
: Windows installer: [http://www.compgeom.com/~piyush/scripts/scripts.html Short Programs/Scripts] (Look for the xpdf3.exe / poppler.exe links in left sidebar.)
 
  
 
== Educate Yourself ==
 
== Educate Yourself ==

Revision as of 13:42, 14 May 2015

Command Line

Mounting shared drives and connecting to remote resources is something you can easily do from the Windows GUI. With our quick guide to the command prompt, however, you can more easily automate large tasks.
Quick super fast course in using the command line. It is intended to be done rapidly in about a day or two, and not meant to teach you advanced shell usage.

WGET

Using PowerShell (Among other useful options listed at this resource.):
WinWGet is a GUI (Graphical User Interface) for Wget. It is FREE. It will keep track of your downloads - add, clone, edit, delete jobs.
GNU Wget is a free network utility to retrieve files from the World Wide Web using HTTP and FTP, the two most widely used Internet protocols. It works non-interactively, thus enabling work in the background, after having logged off.
Say you want to backup your blog or create a local copy of an entire directory of a web site for archiving or reading later. The command: wget -m http://website.tld
Tips for mirroring specific directories, update only changed files, etc.
Download files using curl or wget. This addon generates curl/wget commands that emulate the request as though it's coming from your browser allowing you to download protected files directly to a separate machine (e.g. server).

Windows Command Line

Cmdlets are the heart-and-soul of Windows PowerShell, Microsoft's latest command shell/scripting language.
Robocopy: A robust file copy command for the Windows command line.
Each command is linked to more info about the particular command.
The xcopy command is a Command Prompt command used to copy one or more files and/or folders from one location to another location.
This question is answered in numerous ways. Many of the answers point to useful utilities to accomplish find-and-replace.

Generate File List

Example: C:\Users\me\Downloads\MyFolder> dir /b > filelist.txt
(The text in orange shows the command used once you've navigated into the directory from which you want to generate the list of file names.)
This tutorial contains several working answers for using Windows PowerShell to list files and folders.
Works with cmd.exe, but doesn't seem to work with Windows PowerShell.

Rename Files & Folders

To rename files in bulk more efficiently, you can either learn some complex shell scripting language (SED, AWK, Perl) or switch to something more simple--a spreadsheet.

Bulk Rename Files

Bottom line: use this tool: Bulk Rename Utility (Windows only.)

Rename List of Files With Batch File

It's easy to rename a group of files by running a batch file (file.bat), configured something like this:

rename "Old-File_001.jpg" "New-File_001.jpg"
rename "Old-File_002.jpg" "New-File_002.jpg"
rename "Old-File_003.jpg" "New-File_003.jpg"

Using Windows or DOS, just create your batch file using the format shown above, and then double-click it to rename the files in the directory containing both your file.bat and files with the same names as those found in the "Old-File" (left) column of your file.bat file.

See this answer: http://stackoverflow.com/a/3808074
For help with using Regular Expressions in text file, see this answer: Regular expresion in text editor (sic)

Rename Multiple Folders / Directories With Batch File

There are a couple of pretty good answers to this question; answers that explain how the regular expression written into the .bat file actually works.

File Transfer

File Transfer Clients for Windows Mac

Lists several clients recommended for interoperating with OpenSSH from Mac OS machines. Note that Mac OS X includes OpenSSH by default.
SSH Clients for Windows, Mac, and Unix
Libre FTP, SFTP, WebDAV, S3 & OpenStack Swift browser for Mac and Windows.

Folder & File Compression

Example: for /d %%X in (*) do "c:\Program Files\7-Zip\7z.exe" a "%%X.zip" "%%X\"
To compress a folder without using any particular compression software.
Getting screenshots from a Mac.

Print Screen / Screen Grabs / Screen Shots

In Microsoft Windows, pressing Prt Scr will capture the entire screen, while pressing the Alt key in combination with Prt Scr will capture the currently selected window.
Using Ubuntu Linux
Using Mac OS
Using Windows

Regular Expressions

Moved content to Data Analysis.

Text Editors

Text Extraction

Moved content to Data Analysis.

Educate Yourself

Learn to Code

The Learn to Code movement has picked up momentum worldwide and that is actually a good thing as even basic programming skills can have a major impact. If you can teach yourself how to write code, you gain a competitive edge over your peers, you can think more algorithmically and thus can tackle problems more efficiently.
Lena Groeger: When I started learning to code, I was amazed by how much was out there: introductory videos, explanatory blog posts, tips and tricks and step-by-step guides. If you're a journalist who wants to make a news app or a student interested in learning to code, you have plenty of paths to choose from.

Multiple Codes

Learn to code while building a project. Free online courses include: HTML & CSS; jQuery; JavaScript; PHP; Python; Ruby; Web Projects; APIs

R

Learn R & Become a Data Analyst

D3.js

d3Vienno features a series of video tutorials, each about 10-12 minutes long, on using D3.js.
D3 Tools
A Chrome-specific bookmarklet that extracts SVG nodes and accompanying styles from an HTML document and downloads them as an SVG file—A file which you could open and edit in Adobe Illustrator, for instance. Because SVGs are resolution independent, it’s great for when you want to use web technologies to create documents that are meant to be printed (like, maybe on newsprint). It was created with d3.js in mind, but it should work fine no matter how you choose to generate your SVG.

Data Analysis Tools

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.
Take control of your R code. RStudio is the premier integrated development environment for R. It is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or over the web with RStudio Server. Download RStudio (for Windows, Mac, or Linux).
Open Refine is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling. It is similar to spreadsheet applications (and can work with spreadsheet file formats), however, it behaves more like a database.
Wrangler allows interactive transformation of messy, real-world data into the data tables analysis tools expect. Export data for use in Excel, R, Tableau, Protovis, ...
HTSQL is designed for data analysts and other accidental programmers who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is free and open source software.
Jigsaw is a visual analytics system to help analysts and researchers better explore, analyze, and make sense of such document collections.

Investigative Reporting; Journalism; General Research

Useful IRE Resources:
- NICAR Net Tour - The National Institute for Computer-Assisted Reporting (NICAR) is an IRE program.
- Analysis jobs by NICAR - Names the media organizations NICAR has worked for, and lists kinds of analysis jobs NICAR has done.

Maintenance

Dual Boot

Boot Disks

Rufus is an utility that helps format and create bootable USB flash drives, such as USB keys/pendrives, memory sticks, etc.

Network Issues

Post reviews various "fixes" found all over the web, and which "fix" actually worked for the post's author.
Path MTU Discovery (PMTUD) in Windows just doesn’t seem to figure out the MTU for a given path. So Windows uses the default. For the most part this doesn’t affect anyone. But failure of PMTUD will result in some websites not loading correctly, having trouble connecting to normally reliable online services and general Internet weirdness.

See Also

References