Difference between revisions of "Computer Productivity Hacks"

From TeleCafeWiki
Jump to navigation Jump to search
(→‎Data Scrape: Almost Scraping: Web Scraping for Non-Programmers)
(→‎Data Scrape: Open Source at Scrapinghub)
Line 63: Line 63:
 
* [https://docs.google.com/a/evolvnet.com/document/d/18Q2THQvYCG2_n6nKVsZRHlaPG9iJ9NvLezOOQbEuAJs/edit?hl=en Almost Scraping: Web Scraping for Non-Programmers]
 
* [https://docs.google.com/a/evolvnet.com/document/d/18Q2THQvYCG2_n6nKVsZRHlaPG9iJ9NvLezOOQbEuAJs/edit?hl=en Almost Scraping: Web Scraping for Non-Programmers]
 
: Tools and tips compiled by journalists from PBS and Omaha World-Herald.
 
: Tools and tips compiled by journalists from PBS and Omaha World-Herald.
 +
 +
* [http://blog.scrapinghub.com/2014/01/18/open-source-at-scrapinghub/ Open Source at Scrapinghub]
 +
: Scrapinghub's list of open source scraping projects.
  
 
* [http://www.notprovided.eu/six-tools-web-scraping-use-data-journalism-creating-insightful-content/ Six tools for web scraping – To use for data journalism & creating insightful content]
 
* [http://www.notprovided.eu/six-tools-web-scraping-use-data-journalism-creating-insightful-content/ Six tools for web scraping – To use for data journalism & creating insightful content]

Revision as of 23:25, 8 March 2014

Command Line

Mounting shared drives and connecting to remote resources is something you can easily do from the Windows GUI. With our quick guide to the command prompt, however, you can more easily automate large tasks.
Quick super fast course in using the command line. It is intended to be done rapidly in about a day or two, and not meant to teach you advanced shell usage.

WGET

GNU Wget is a free network utility to retrieve files from the World Wide Web using HTTP and FTP, the two most widely used Internet protocols. It works non-interactively, thus enabling work in the background, after having logged off.
Say you want to backup your blog or create a local copy of an entire directory of a web site for archiving or reading later. The command: wget -m http://website.tld

Windows Command Line

Cmdlets are the heart-and-soul of Windows PowerShell, Microsoft's latest command shell/scripting language.
Robocopy: A robust file copy command for the Windows command line.
Each command is linked to more info about the particular command.
The xcopy command is a Command Prompt command used to copy one or more files and/or folders from one location to another location.

Generate File List

Example: C:\Users\me\Downloads\MyFolder> dir /b > filelist.txt
(The text in orange shows the command used once you've navigated into the directory from which you want to generate the list of file names.)
This tutorial contains several working answers for using Windows PowerShell to list files and folders.
Works with cmd.exe, but doesn't seem to work with Windows PowerShell.

Folder & File Compression

Example: for /d %%X in (*) do "c:\Program Files\7-Zip\7z.exe" a "%%X.zip" "%%X\"
To compress a folder without using any particular compression software.

Text Extraction

Capture2Text enables users to do the following:
  1. Optical Character Recognition (OCR)
  2. Speech Recognition
Lists several options.
Extracts plain text from documents in all popular formats.
Detexter is an app designed to extract text from PDF files.

Data Scrape

Tools and tips compiled by journalists from PBS and Omaha World-Herald.
Scrapinghub's list of open source scraping projects.
Tools for gathering data from public sources.

Text Search

Makes tools to search text content, including:
  1. FALCON - Text Search Java Project: JSON based text search Java Project
  2. HAWK - PDF Text Search Java Project: Taking initiative for Document Text Search
Xpdf is an open source viewer for Portable Document Format (PDF) files.
Windows installer: Short Programs/Scripts (Look for the xpdf3.exe / poppler.exe links in left sidebar.)

PDF Conversion

Convert PDF to HTML without losing text or format.

Data Analysis Tools

HTSQL is designed for data analysts and other accidental programmers who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is free and open source software.
Jigsaw is a visual analytics system to help analysts and researchers better explore, analyze, and make sense of such document collections.

Maintenance

Boot Disks

Rufus is an utility that helps format and create bootable USB flash drives, such as USB keys/pendrives, memory sticks, etc.

Network Issues

Post reviews various "fixes" found all over the web, and which "fix" actually worked for the post's author.
Path MTU Discovery (PMTUD) in Windows just doesn’t seem to figure out the MTU for a given path. So Windows uses the default. For the most part this doesn’t affect anyone. But failure of PMTUD will result in some websites not loading correctly, having trouble connecting to normally reliable online services and general Internet weirdness.

Google Drive

Select what you want to strike and click Alt+Shift+5. (Option+Shift+5 for Mac).
Ctrl+? to see other such keyboard shortcuts.

See Also