- 
                Notifications
    
You must be signed in to change notification settings  - Fork 774
 
Style Guide
        Alex Osborne edited this page Jul 4, 2018 
        ·
        2 revisions
      
    This wiki section serves as the coding and interaction/UI-process style guide for the Heritrix project and related open source web archiving projects.
Guidance may occasionally be contradictory while topics are being discussed; conflicting alternatives should each be well-described (with supporting reasoning and precedent) to assist progress towards a consensus/canonical expression.
Worthwhile references:
- Sun's Code Conventions for the Java Programming Language
 - Geosoft's Java Programming Style Guidelines
 
In the absence of guidance to the contrary below, recommendations in the above sources may always be followed.
(Some pulled from Heritrix 1.X Developer Manual .)
- Use spaces, not tabs. Tabs should not appear in project source code.
 - Indent 4 spaces per level.
 - Place the opening bracket for a code block on the same line as the declaration/test expecting a block.
 - Use brackets even when a branch/block is only a single line of code (to provide an additional visual cue, and for robustness if other lines are added later).
 - Prefer longs over ints anywhere a large count of artifacts or large-sized file/range is possible.
 - Prefer 'protected' over 'private' unless a consideration of potential subclass use suggests direct access will be dangerous.
 - (Deviation from Sun recommendations) It is permissable to declare local variables as close to first use as practical (as opposed to at the start of the enclosing block).
 - (Deviation from some recommendations) Early- and multiple- returns from methods are encouraged to minimize indention-levels, and handle simple or error cases quickly.
 - All classes and public methods should have Javadoc comments. See Sun's style guide for Javadoc for tips on good Javadoc comments.
 - Avoid broad catches (of all Exception or all Throwable) where possible. (Testing code and other all-or-nothing situations excepted.)
 - Preserve toString()
 
- Usable in Lynx
 - Reloadable
 - Use exclamation points and ALL-CAPS sparingly. (Most warnings, errors, or other failure reports don't need such emphasis.)
 
Structured Guides:
User Guide
- Introduction
 - New Features in 3.0 and 3.1
 - Your First Crawl
 - Checkpointing
 - Main Console Page
 - Profiles
 - Heritrix Output
 - Common Heritrix Use Cases
 - Jobs
 - Configuring Jobs and Profiles
 - Processing Chains
 - Credentials
 - Creating Jobs and Profiles
 - Outside the User Interface
 - A Quick Guide to Creating a Profile
 - Job Page
 - Frontier
 - Spring Framework
 - Multiple Machine Crawling
 - Heritrix3 on Mac OS X
 - Heritrix3 on Windows
 
- Responsible Crawling
 - Politeness parameters
 - BeanShell Script For Downloading Video
 - crawl manifest
 - JVM Options
 - Frontier queue budgets
 - BeanShell User Notes
 - Facebook and Twitter Scroll-down
 - Deduping (Duplication Reduction)
 - Force speculative embed URIs into single queue.
 - Heritrix3 Useful Scripts
 - How-To Feed URLs in bulk to a crawler
 - MatchesListRegexDecideRule vs NotMatchesListRegexDecideRule
 - WARC (Web ARChive)
 - When taking a snapshot Heritrix renames crawl.log
 - YouTube
 
- H3 Dev Notes for Crawl Operators
 - Development Notes
 - Spring Crawl Configuration
 - Potential Cleanup-Refactorings
 - Future Directions Brainstorming
 - Documentation Wishlist
 - Web Spam Detection for Heritrix
 - Style Guide
 - HOWTO Ship a Heritrix Release
 - Heritrix in Eclipse
 
