Is there a way to prevent an entry of a specific site in sitemap.xml, eg with a short bit of code in the controller?
There's definitely room for improvement. According to a comment in enumerate_page (https://github.com/odoo/odoo/blob/8.0/addons/website/models/website.py#L370) every template that's marked as a page will get returned. All pages returned will then be used to create the sitemap (https://github.com/odoo/odoo/blob/8.0/addons/website/controllers/main.py#L89).
So the only way to prevent pages to be listed in the sitemap right now, will be to extend the "website" model in-place (https://www.odoo.com/documentation/8.0/reference/orm.html#inheritance-and-extension). Add a property that allows you to specify what should happen with the page and change/override/extend "enumerate_pages".
P.S.: I hope markdown is the correct way to add links to a comment
edit: Is there really no way of using some kind of markup in comments/answers?!
edit2: I created a quick and dirty solution for this here: https://github.com/TechnikumWienAcademy/Odoo-Sitemap-Module
Once your sitemap is refreshed, you will see database entries for every URL in website_sitemap_extended. Change sitemap_visible to "FALSE" to prevent listing.
Supplement to the question
(It's a shame you can not really join in here without karma. Unfortunately, I still know too little of odoo to answer questions. I would like to actually extends the question with a comment. I'll do it here...)
I also look for a way to adjust the sitemap.xml.
first I'm going to Settings> Technical> user-interface> views. With the filter "sitemap" I get the templates. I assume that there changes can be made.
Problem is: I have tried many things but no matter what I do, it does not change the output. Must there somewhere a cache are deleted?
I suspect something like that could help to output not all pages:
<?xml version="1.0"?> <t t-name="website.sitemap_locs"> <t t-if="page.get('lastmod', False)"> <url t-foreach="locs" t-as="page"> <loc><t t-esc="url_root"/><t t-esc="page['loc']"/></loc> <lastmod t-esc="page['lastmod']"/> <t t-if="page.get('priority', False)"> <priority t-esc="page['priority']"/> </t> <t t-if="page.get('changefreq', False)"> <changefreq t-esc="page['changefreq']"/> </t> </url> </t> </t>
so that only pages are output that have already been edited. Surely it would be even better way. The ideal would be a setting in the metadata of the page self.
After all, such sites should never be displayed:
/crm/contactus (second Contact Page????)
/shop/product/inactive-7 (inactive product end in 403)
/partners/country/amerikanisch-samoa-12 (not used Countrys)
In order to delete the cached version of /sitemap.xml, go in the backend with technical features activated for your user, then go to Settings > Technical > Database Structure > Attachments, then in the list view, search for sitemap. Delete the /sitemap.xml record and reload the url.